I continue to get hammered by referer spam for a handful of scumbag URLs. It was so bad that last month I aliased one page to 404 not found just to keep them from sucking up my bandwidth. I’ve tried a more direct solution now. I unaliased that page (if you were dying to read the one day view of December 7, 2003 it is now back). At one point, I had a script running on my Linux box that would search through logs for IP addresses that either tried and failed to relay spam or tried and failed to ssh in. I would then assemble all that, run by some exemptions (like my boxes from which I might legitimately mess up a password and fail to ssh in), and generate a nice timestamped list of hosts to add to hosts.deny. I even built in the ability for these entries to time out so after six months old ones would drop off the list. This way, I had an extra level of protection from boxes that were known (or at least suspected strongly) to be trying to do things to my machine.
I resurrected that script and added in the ability to parse out HTTP Apache logs to it. I have a list of banned referral URLs (mostly involving “t**n p*ssy” or “m*lf h*nt*ng” or bullshit like that). If a box tries to get a page from my webserver and has one of those URLs in the referer string, that IP address is added to hosts.deny to prevent it from making any sort of TCP connection to my box. My guess is that this crap is coming from compromised boxes around the world, and someone is taking money form pr0n sites to drive up traffic. They have a distributed set of machines (I don’t know but if forced to hazard a guess I would guess Windows boxes afflicted by a Trojan or virus) from which this traffic is coming. It’s too many different address for it to be effective to just deny an address or two by hand. That’s why I have automated the task of denying them. A few months ago, when I got one of these I would first add the offending URL to the “never show this address” list in my blosxom referer plugin and then add that IP address to the lists of ones to deny in my .htaccess file. I can see over time that some of the ones I have forbidden previously still come back, so that suggests that this pool remains fairly consistent although growing over time.
Whatever the origin, this crap is highly annoying and now shortly before my access_log is rolled over for the day, a script on a cron job will parse out new attempts to spam forbidden URLs and add them to hosts.deny. My theory is that in a few day, I should see very little or no traffic like this on my box. Only new addresses that have recently been added to the pool will be in. At least, I hope so.