I just added in the msnbot to my robots.txt as a disallowed user agent. This damn thing has been getting pages from this weblog like crazy all day today. There are thousands and thousands of possible ones if you count all the individual day views and writeback views and category views. It’s been going one every 3-5 seconds. Enough is enough. Even more infuriating is this statement in the MSNBot FAQ:
# How often will MSNBot access a web page from my web server?
MSNBot should not try to access your site more often than once every few seconds. If MSNBot determines that your site has a slow connection, it accesses it less frequently. If you find that MSNBot places too high a load on your web site, please send e-mail to email@example.com.
So, they think it is OK that it gets your pages every few seconds. As I understand crawler etiquette, one minute is a reasonable time to wait between page loads from the same server. Any robot I ever write has that as the minimum interval for reloads from the same webserver. This blithe answer “shouldn’t be more than every few seconds” is bullshit. From reading the FAQ, this bot isn’t even related to being indexed for MSN searching! It’s just some research project thing. Way to go, MSN. Load the living hell out of webpages for some bogus project that isn’t even useful. I’ve been banning ill-behaved crawlers that load pages willy nilly, and now MSNBot is on that list.