Robot Abuse

What the hell happened to netiquette in the creation of web crawling robots? I seem to remember 30 seconds being the absolute minimum interval between requests to the same domain for a well-behaved robot, and several minutes being even better. Jesus, the crawler I wrote for a project in grad school in 1996 had a mechanism to prevent hammering the same domain. You get this for free in Perl using the UserAgent class. At this stage of the game, no one has any excuse for writing a poorly behaved robot that requests pages over and over every second, or multiple requests per second, even.

And yes, I'm talking to you dumbasses at the "Internet Categorizer". Fix your goddamn robot, please. I've already forbidden your IP address. This is not a good way to make friends, as I see you've already pissed off other people besides me. As a general statement, if you are releasing a product or project based on crawling the web it is incumbent on you to use the resources of others wisely. Wouldn't you rather the public launch be about your thing, rather than on how shitty your robot coding skillz are?