Ruby on Rails Thoughts

I have been interested in Ruby on Rails for a while, and been fiddling with it in the nights and weekends. I like to stay up on new technologies and frameworks, partly because it is fun and partly because that’s some of the value I bring to the folks I work for. The more I know about the possibilities, the better decisions I can help make. Despite what people think about me, I am never interested in these things for the sheer novelty of it but in how one can do more work in less time with fewer resources. Like many, I was seduced by things like the video of using RoR to create a weblogging system in 15 minutes. I even downloaded and listened to the Ruby on Rails podcast.

By playing with it, I do agree that it is cool and remarkably efficient to create things that work, when it works. What needs to be included in the analysis is the fact that sometimes things just don’t work. When they don’t work, it’s not so easy to figure out why from the documenation. Frequently I end up googling and searching fruitlessly through docs and doing research on the part I least care about just to make weird exceptions go away. Worst of all, sometimes the exceptions are misleading and will be red herrings that send you on wild goose chases. Because I’m not that familiar with Ruby, that part is always a struggle for me. I’m picking up the language as I use it, but I don’t have a deep background in it. I’d be highly surprised if there many (or any) people who are already deeply familiar with Ruby and coming to Rails because of it. I’d guess 99% of the people are coming for Rails and learning Ruby out of necessity, just like me.

Precisely because I don’t know anything about Ruby, everything I ever do is straight cookbook out of the docs. I don’t know enough to do anything crazy, so all packages are added via gem and such. The other day, I had installed a package and tried to set up some functionality to use it, straight out of the example code. I kept getting an exception about how the package wasn’t installed, even though doing a gem list showed it. I eventually got frustrated enough to ask the question on the #rubyonrails IRC channel. The path that eventually got it working was to do a gem cleanup which changed the behavior from an exception to a crash in the dispatch.fcgi. After that, I had to uninstall and reinstall the Rails package. So far so good.

At this point, I asked in the IRC channel how I could have known to do that, essentially a pointer to the where in the docs the secret knowledge lives. That’s when the trouble started. I have used the IRC channel to get answers a few times when my frustration with the docs grew too great. Every time, I have gotten the answer I needed and every time I have left with elevated blood pressure. I use the IRC channel as the absolute last resort, so it isn’t a matter of RTFM – I’d much rather RTFM than deal with the pricklier denizens there. The problem is that I’ve never been able to get the advice without some form of ad hominem nonsense. It might be my lousy interpersonal skills, or it might be a problem with some of those folks.

This time, as I tried to figure out where I could have done it different I got a lot of weird defensiveness which included statements like “This is a 0.13 release, what do you expect” and “This is the frontier so you have to be prepared to do some work” and my favorite “You can’t expect it to be like using VB.” For the record, I hate VB and don’t expect anything to be like it if I’m going to be using it. The basic gist of all this nonsense was in stark contrast to what you’ll find on the web page nad in the various articles about it, which is “Come aboard, you can make your life easier and get more work done with Rails.” My point, which went over like a fart in church, was that these are mutually exclusive views so which is correct? Either you want people to come aboard and do real world work with it, or it is still a hobbyist hacker thing but it isn’t both simultaneously.

My view at this point is that Ruby on Rails seems fantastic, looks like a highly efficient way to get things done, and will be something I continue to pursue. It won’t be as quick in reality as you might think, because at some point you will have to stop doing productive work and research crazy failures that may be code or may be problems in your installation camouflaged as code level errors. Even if you do everything by the book, your install may shit itself at any point. If you need help and decide to try the IRC channel, hope for the best but be prepared for the worst. Expect that someone will claim it as your moral failing that you cannot decipher cryptic error cases with sparse documentation.

Update: A friend IM’d me this link to a Rails related comment thread. Read the comments and decide if you think the Rails people are being weird and defensive. I read the original post as a guy looking for information and thinking out loud, yet the Railers seem to be pissed off that he is raising questions. To paraphrase my friend, I love the technology but am not so high on the community. It will be nice when we can rationally discuss the pros and cons of the technology without raising the hackles of the Rails fundamentalists for daring to suggest there are cons.

Enclosures plugin

For those of you who have been looking for the Blosxom enclosures plugin that I use to drive my audioblog, it has now been cleaned up and released by Keith Irwin. He’s submitted it to the Blosxom plugin registry, so it should be there soon. Thanks, Keith!

Bug in 0.3

Several people have pointed out to me that there is a bug in get_enclosures 0.3 where you can get a divide by zero error on a download that happens fast enough that will come down in the same second. People are getting this running get_enclosures against my bittorrent RSS feed. I’ll release an update to this this evening, but the workaround is to not use the bittorrent feed with get_enclosures. It’s not going to do anything anyway, other than just getting down the metadata file. It doesn’t do the downloading of the actual thing behind the torrent, so it doesn’t do anything for you anyway.

Release Management

I can see that I need to do something about the get_enclosures release management. Yesterday the 0.2 version was downloaded 50 times, while the 0.3 was downloaded 11 times. That’s not good, so I really need to do something ASAP. I guess in the short term, I’ll just remove the links to old versions and make them point to the new one, but I don’t want to have to update every old post every time I release one. Someone suggested that I just link to a release directory rather than to individual files, which is a good idea. My problem is that at the top level I have my Apache setup configured via .htaccess to serve the weblog page as the index. Does anyone know how to override that at a lower level? I want to have a subdirectory not serve that out, instead giving a directory listing. I looked at the Apache docs last night, seeing how I could turn that off or make the DirectoryIndex directive in my .htaccess file not recurse, and couldn’t figure it out.

get_enclosures 0.3

Here is an updated release of get_enclosures. I have freely stolen AppleScript snippets from Ray Slakinski’s pyPodder to improve my iTunes integration. Now, iTunes will stay in the back if it is being newly started or will stay where it is if it is already running. Also, the feeds.txt file is no longer in the zip so you don’t have to worry about overwriting your subscriptions if upgrading. It will just write a starter file if it doesn’t already exist, otherwise it will just leave it be if it is already there.

Try it out and please give me whatever feedback you can. I’m looking at perhaps trying to incorporate automatic installation of the necessary Perl modules or maybe even an installer to do the things Adam discusses, setting up the modules and the cron job interactively at install time. Does anyone have any input on good free installer programs for OS X or even a multi-platform?

I Hate Applescript, I Need Help

I love most things about Macs, but I completely freaking hate Applescript. I’ve been trying to add to get_enclosures functionality someone had suggested, the ability to add the URLs from an RSS enclosure feed to a playlist as something to stream rather than download. That sounds simple enough, but after 30 minutes of farting around and reading the barely existent documentation I cannot figure out how to take the URL that I have, and add it to a playlist as a streaming entry. There is some magic syntax that I haven’t yet stumbled upon. I’ve been a professional programmer for most of a decade now, and trying to use the “user friendly” AppleScript language consistently drives me completely bonkers.

If any of you out there know how to do this, please throw me a bone and either email me or leave me the snippet in a comment. I have the URL, I have the playlist name, I just want this thing to be a streaming URL in that playlist. Note too that I’m not wanting to start it immediately, but have the entry added to the list.

Attention (and XML)

Dag, I was mentioned by name in this article in The Inquirer. Let me state one more time for the record how incredibly ironic I find it that after years and years of working for startups that I thought would change the world, of delivering systems that I thought were marvels, the most attention I have ever received as a developer is for a 150 line Perl script that I did one night while watching an episode of Dennis Leary’s Rescue Me. Life is funny, and the net is even funnier.

get_enclosures 0.2

Here’s an updated 0.2 release of the get_enclosures script. Added is the more robust caching mechanism based on the dates in the RSS item tag. This release will write out two M3U playlists in every directory that it downloads new files for, one alphabetical and one in reverse chronological order. This allows for ease of use with WinAmp or XMMS. Note that the M3U playlists will be based on everything that is in the directory at the time, so new files will be added to the existing ones. If you have deleted a file, it will not be reflected in the playlists.

You can also comment out a feed by preceding a line with #, which will keep it from being downloaded but without needing to be deleted from the file. This release should also fix issues with duplicates being added to the playlists.

If you are upgrading from a previous release, be sure not to overwrite your feeds.txt file when you unzip this. Either unzip it elsewhere or make a backup copy of feeds.txt so you don’t clobber it. Y’all probably already know this, but I just thought I’d remind.

Update: Thanks to Gordon for pointing out the boned URL. I really need to learn to not push out these things after midnight.

Comment Spammers Walk the Earth

I was sitting here working on the laptop when I got a big wad of email. It was the writeback system here telling me I got a comment, 6 of them. By examing them, I could see they were all comment spam. Since I had installed the Blosxom port of the MT anti-spam plugin I hadn’t had a problem but then hadn’t weathered many attempts either. My first thought was that my cron job that updates the blacklist had failed. I checked the blacklist and sure enough, the url in question was in the list. Curious, I thought. However, even as I examined it spam was coming in at the pace of one a minute (hilarious, a comment spammer that obeys netiquette about robot accesses of a web page.) I temporarily blocked the IP address via .htaccess while I tried to sort this out.

I went to my comments page and tried to duplicate the spammers comments, ones that should have been caught. They went through! Damn. I put some print statments into the plugin to write to STDERR what was happening, and I kept loading the page. I could see that it did correctly load the lines from the blacklist, and that it should have been caught. However, I noticed that the printout was double spaced. That ain’t right. I ended my print with a newline, but that implied that their was a newline in the actual variable. I looked at the routine that loaded the list, and sure enough he did not chomp the newline off the end. Since this value was then passed to be the pattern of a regular expression, it was only going to catch the spam if the values it was matching against had a newline after the occurrence of the url in question. I added the chomp line in the routine so that the newline disappeard, and tried to spam it again. Voila, it was caught! Since the spammer was still running amok despite being forbidden, I unblocked him via .htaccess to let the spam fighter try again. Double viola! I began getting the “we rejected comment spam” mails immediately. I don’t know if this bug has been caught in the main line of this plugin, but it certainly needs to be fixed. I’ll be contacting Doug Alcorn shortly.

Renko

Pete Prodoehl has done a script much like get_enclosures called renko. I downloaded it and it worked well. It’s easier to install than mine because he includes all the modules you need in his distribution, so if that’s an issue with you by all means go get renko.

He also points out in a post that the correct place for all this stuff to be happening is in the desktop aggregators like NetNewsWire and Shrook. It shouldn’t be too hard – have an enclosure preferences that lets you decide to download all enclosures automatically, only ones of specific MIME types, none, ask every time, etc. Set up a directory where you want them to go, click a check button if you want them automatically added to iTunes and away you go. For me, the only big issue here would be if NetNewsWire added this support. I’m already paid up on Shrook and I like it, but if this functionality gets added to a competitive product then I might switch.

This kind of thinking has occurred to me as well, which is why there is an upper limit to how much effort I’ll ultimately put into get_enclosures. The best script in the world pales to mediocre support in the desktop aggregators. There are one or two additional things I’d like to add to it, and then I’m probably going to slow or stop work, only fixing bugs as they are brought to my attention. We’ve now done our work in validating the proof of concept further, and it is time for the aggregator developers to step up next.

Changing the Caching Mechanism

I’m going to change the way the caching for the files works in get_enclosures. The way it works now is that when a file is downloaded, the current timestamp is saved. Before a file is downloaded, it is checked to see if there exists a timestamp for it. If so, it is not downloaded. I realize that this case is too simplistic, and I thought of a use case that would make this break while I was thinking of something else that I thought would be cool. But first, a digression.

In this talk of the “iPod platform”, for over two years now I’ve been saving the MP3 files from the WREK streaming archives off for specific shows. I would then burn them to CD and listen to them offline. I did this with custom scripts and Windows scheduled tasks. It occurred to me that this could easily be something that reused all this infrastructure. I realized that it would be quite simple to create a cron task that would write out an RSS feed with enclosures for the various programs on that station. Then, the get_enclosure script could just download them when it was doing its thing anyway.

Here’s where the mechanism described in paragraph 1 falls apart: every week, the URL to get the MP3 archive for that same half-hour of programming is the same. With the existing mechanism, that URL would be downloaded once and only once, the first time the script ran. All subsequent runs would find that URL as one that has already been downloaded. Damn, so close yet so far.

Here’s how that can be fixed, and how perhaps it makes things more robust in all cases. The RSS 2.0 spec defines (requires?) an element for the item, pubDate. I’ve altered the caching mechanism to use this value rather than the current timestamp. Then, when examining whether to get the file it checks the value contained in the pubDate of the item in the current feed versus the one in the cache. If the feed is newer than the cache, get it again. This allows for getting a file down like the WREK situation, where the file name and URL will be reused every week as the contents of the file are rewritten with the new week’s stream. When assembling the RSS 2.0 feed with the enclosure, the pubDate is set to the correct value for that week and everything will work out. Conceivably, this could also allow for redownloading of a file that was edited and republished with everything else the same but the pubDate updated to the new publish time. Because these are textual times, I wrote a simple function that compares two RFC 822 dates and finds out which is the earliest, so for the individual download URLs everything will be used, compared and stored with those dates from the item tag. There are better, more robust ways such as using Date::Manip, but I don’t want to require people to install any more modules than they already do. In fact, I might think about getting rid of the dependence on XML::Simple.

This updated mechanism will be part of the 0.2 release. As well, I will pick a WREK show or three to prepare these experimental feeds for. If they like it and want to do it, I’ll let them have it and they can put it on their own site.

iPodder for Windows

Via a comment, Pieter Overbeeke informs me that he has a script for downloading files and controlling iTunes for Windows XP available! This does the same stuff as get_enclosures or iPodder on the Mac, by getting the files and also adding them to the iTunes library.

In the shower this morning, I was wondering if there were COM libraries for Perl that I could use to control Windows iTunes from get_enclosurest. Now that Pieter has invented this wheel, there is no need. If you are on Windows you should definitely give his script a try. More infrastructure for the “iPod platform!”

Multiplatform it is!

I let the updated get_enclosures script run overnight on a Windows box with Active Perl installed, and it worked just fine. Right on! There were a few minor issues, but the files all came down, so that’s good.

I tried to test it against Cygwin’s version of Perl but couldn’t get the modules installed. I recently had to wipe and reinstall my Windows 2000 OS because Windows is such a fragile piece of shit that it eats itself over time, and when I did I had to start from scratch. This Cygwin is newly installed, and I get all kinds of make errors trying to install the LWP. It’s really weird because if I go into the build directories and manually run make it works. I’ve never seen this before on any Cygwin install. If some kind soul out there could test this script with Cygwin Perl and let me know how it goes, I’d highly highly appreciate it. I’m not planning on spending any time fighting with Cygwin.

Update: I did get it to work on Cygwin after all, and it seemed to work fine. If anyone has success with this on any other platforms, let me know.

get_enclosures version 0.1

Here is the updated version of get_enclosures, version 0.1. The zip now includes a changes.txt which covers the differences from the previous version. It now is no longer dependent on Mac::AppleScript, which means it will run without alteration on Linux or Cygwin, etc (it still depends on XML::Simple and LWP). It caches the RSS time so that feeds are not redownloaded unless something has changed since last time. Thanks to Brian Tol you can now get nicely formatted documentation via POD (run “perldoc get_enclosures.pl” to see it.)

Thanks to all who have given suggestions and used this. I highly recommend everyone upgrade to this if you downloaded the previous one, particularly the person who had this on a cron job to download my RSS feed every minute. Thanks, anonymous friend, you reminded me that any reasonable RSS consumer should be using Last-Modified out of etiquette.

Update: That enhancement of not fetching the RSS every time introduced a bug, because I was clearing out URLs from the cache if there weren’t in the RSS feeds. Well, when you don’t fetch the RSS feed at all, there are no URLs in it at all so it was clearing out the cache when nothing was new. For the time being, I have just turned off the cache cleanup altogether. This cache is not going to be getting large relative to an audiofile in any reasonable timescale anyway.

get_enclosures Category

Since this is taking off, and a little faster than I expected, I am creating a category for this on the blog. From here on forward, I’m posting everything about it in this category. I do ask everyone that uses this to, if you don’t subscribe to the whole blog RSS here, to at least subscribe to this category. If there is some sort of bug fix or new release, I’ll post it here and then you’ll know about it. Through the miracle of blosxom, you can automatically subscribe to the RSS for any subcategory, and the RSS feed for just this category is here. There will be a release of a 0.1 version (what is out there now I am retroactively calling 0.0) before I go to bed tonight. It will have enough new stuff, including a serious performance tweak, that all current users should upgrade. In addition, it includes what people like Gordon Smith suggested here and make this so that it will work on non Apple platforms. My original conception was that this would be specific to Mac and iTunes, but there is no reason to be that specific. Now, it will work as a downloader for anyone that can have the right Perl stuff installed, on Linux, on Windows (straight or Cygwin), etc. Cool stuff.

Top Ten Subversion Tips

Via Coop comes this cool article on top ten Subversion tips for CVS users. I’ve pretty much switched from CVS to Subversion for everything I do at the house. It’s still a bear to set up from scratch on a clean box that lacks the prerequisite libraries, but on something like Fedora core where it comes preinstalled, there is no reason not to use it. It is highly spiffy.

get_enclosures

Oddly enough, people are actually using this damn thing! I’ve seen a number of people getting the audio from this weblog with a user agent of “LWP::Simple”, so I’m assuming most or all of them are users of the script. Now the bad part is that I see all these weaknesses of the script. One kind soul has already emailed it back to me with my explanatory comments from the top in perldoc format. I also realized from looking at my logs when someone had set this up to run every minute (!) on their cron job that I really need to be checking the Last-Modified when I go to get the RSS. As it is, I give people the power to hammer the living shit out of web servers by fetching that RSS over and over. I found a great example of setting up the headers using the LWP::UserAgent to not fetch it unless new. I’ll be doing that as I can. The day job is stepping up a little, so I might be doing less evening coding for the next little bit, until I get a handle on it. There should be another release of this in the next day or two, though.

get_enclosures.pl

Normally I’m pretty spastic about releasing unpolished code. I tend to want to hang onto it until I’m proud enough of it to let it out in the wild. I’m making an exception for this thing, my Perl equivalent of Adam Curry’s iPodder AppleScript. You should only download this if you are comfortable running Perl from the command line. If you don’t know what that means, this is not for you I am afraid. For those who can handle that, and who don’t have a problem installing CPAN modules, I present my script get_enclosures.pl.

Read the beginning comments inside that script to get the basics of how to use this. I include a paste of the crontab entry I use to automate this thing firing off twice a day. Anyone can take this and do anything with this. I only ask that if you improve it, please send me a copy of your changes so that I too can benefit from your expertise. As I say, I’m not being a control freak perfectionist and waiting until it is perfect to let it ride. We’re all sharing first drafts here, so keep that perspective in mind as you look at it (ie, don’t judge me harshly by this – this is a unique situation and not indicative of my standard work product.)

Update: As Adam pointed out, the download was boned. I had to zip it up, otherwise my web server was trying to execute the file rather than just serve it out as text. Whoops. Since I’m zipping it up now anyway, I went ahead and included my feeds.txt in there as well.

Update #2: For all of you coming from Adam’s site or anyone that has downloaded this previously, see this post about how to subscribe to the get_enclosure specific feed to keep up with developments (no pun intended.) 0.1 is now available.

Mac::iTunes

Last night I got my own hand-rolled version of the same kind of script that Adam Curry did with his iPodder AppleScript. Here’s how mine is different – it is in Perl, and uses LWP::Simple, XML::Simple, and Mac::Applescript to get files, parse them, and then control iTunes. The biggest difference is that I keep a cache file that has a record of which enclosure URLs have been downloaded and when. That way, if you delete a file after having downloaded it and listened to, it will not be downloaded again. It also uses the LWP::Simple “mirror” function, so even if you delete your cache and rerun everything such that the cache file doesn’t exist but the data file does, it will not be redownloaded unless the timestamps or file size on the server is different that your copy. That solves the thing that bugs me most about iPodder – the fact that iTunes keeps being reset to an exact mirror of the RSS feed. Delete it all, and the next time the script runs it all comes back. I set my script up on cron, so we’ll see if I catch my audioblog post from a few minutes ago. The next run is at 6 PM, so it should automatically show up in my playlist then.

Last night as I was trying to hack this script together, I was installing a number of Perl modules with CPAN on this iBook. I installed a few myself, and maybe 30 or so that were prerequistes for other modules. One and only one had problems, and that was Mac::iTunes. This module is written by Brian D Foy, who is a fairly big name in the Perl community. He edits Perl writings and has a high profile. I’m surprised that the module he authored and is up on CPAN gaks so competely and totally on this system. A bunch of the tests failed – like 25% of them – and the module wouldn’t have installed except with the “ignore failed tests, force install anyway” option. You kind of expect better from the big names, especially considering that I had been installing modules all night and didn’t have failure one from any of them. As the kids say, “what’s up with that?”

Dream Job for Some Dork

LucasFilm is looking for a software engineer to build internal tools for their organization. They want C or Perl skills and DB integration, so I imagine they are doing some sort of productivity web interface stuff. This will be the one job interview where talking about how you built your own working lightsaber from discarded toasters won’t get you escorted out by security. Enjoy!