Comment Spammers

While we were sleeping after our monumental drive, a comment spammer was going to town on this weblog overnight. I was going through and cleaning up by hand, which takes forever! I found a reference to a guy who ported the MovableType blacklist to blosxom. I haven’t hooked up his real time blacklisting stuff yet, but I did use his cleanup tool to remove the spam comments.

The tool worked pretty well, but there is a downside. It goes through, divides each writeback file into individual comments, and then searches each isolated comment for occurrences of a regular expression (ie, the URL the spammers were using). If the regular expression is found the comment is dumped, otherwise it is kept. If this means there are now 0 comments for that post, the whole writeback file is removed. All good so far. The problem comes with the rewriting of the files that previously contained legitimate comments. All of them are written with a current timestamp, such that when you do the “recent writebacks”, it shows every comment that has ever occurred on this weblog. At the beginning of the year, I switched over to “writebackplus”, which writes the timestamp in the text of the writeback, so for any comment of the last 8 months the date it was written can be determined. I hacked the blog-grep.pl tool such that – if this information exists – it will do a touch to alter the timestamp on the file to that time. It worked great on all the ones for which that information exists, but for the ones older than January 2004, there is nothing that can be done. Oh well. I might change it to just timestamp all those to January 1, 2004. It won’t hurt anything else, and it will keep them from all showing up as recent.