Yesterday afternoon, I started getting a lot of activity from one IP address. Although the user agent identified it as a web browser, it acted like a robot. It requested lots of files that were disconnected enough that it couldn’t be a user navigating through the site, and at a speed that couldn’t be a person. This really bugged me, so I looked up the address (part of what I do at work is run a server that contains information about IP addresses) and it was a company called “Cyveillance”, based in Arlnginton, VA. That’s a spooky name, and a spooky town, so I looked into it a little more. They sell some sort of internet data monitoring tool. Now, they are running a robot that doesn’t identify itself as such (presumably to hide from people the fact that they are doing so.) It violates the rules of robot etiquette, one by not ever requesting the robots.txt file to find out what to stay away from (presumably to hide the fact that this is a robot and because they don’t care what you say to not read) and two, by requesting a large number of documents in a very short time period. There was a strech of 5 minutes where it requested 35 documents. This ain’t cool.
My first inclination was to e-mail them and complain, but my second thought was “fuck them.” So, what I did instead was to add this line to my .htaccess file for both the main directory and my cgi-bin:
Deny from 18.104.22.168/27
If you don’t want these guys’ poorly behaved robot hammering your servers, you might want to consider similar actions. I suppose that it is possible that rather than a robot, it was a pool of different people swarming my site in a coordinated fashion, but probably not and regardless, it’s odd and creepy.