Ed Felten on Bayesian Spam Filtering

BoingBoing points to a post by Ed Felten on a possible poisoning attack that spammers could execute. In essence, he is saying that by choosing certain words to throw into the mess of words that spam frequently contains, spammers can incite people to train that word as a “spam word.” Says he:

Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users’ Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.

This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user’s filter.

I have to say that I don’t think this attack is as realistic as Felten does. In the first place, these bayesian filters don’t work off of single words. You’d have to have a hella score on that one word for the single word or even a combination of a few to be so negative that they drag the whole message down when the rest of the message is legitimate (hint, my friends occasionally use the word “viagra” in email to me and they don’t get spam filtered.) Second, even if this attack could be executed successfully, it is a short-lived one. Every Bayesian spam thing I’ve seen that lets you train messages as spam lets you train them as non-spam. After the first one gets flagged as spam erroneously and gets retrained, the words in the attack become less effective or ineffective in getting an arbitrary message categorized as spam. Third, this does point to the fact that it’s a good idea to not have your spam filtering solely dependent on a simple Bayesian word file.

I use SpamAssassin and POPFile. The latter is a simple Bayesian classifier but it gets fed into the former, which is based on many things including checking online blacklists of hosts, doing heuristic tests for porn and Nigerian phrases, etc. As tough and one-shot as the attack would be anyway, it would be that much tougher to get through this system. Even making POPFile classify as spam is far from sufficient to get a message qualified as spam for me. If it lacked any other spammy features (not listed in Razor or Pyzor, not from a spam host) it probably won’t get tagged.

This also suggests to me that Bill Yerazunis is on the right track with CRM114. Because it works on Bayesian analysis of combinations of adjacent words, it is less susceptible to things like this as well. It gets closer to filtering by meaning rather than by simple presence or absence of a word. I ran it for a while but found the performance was a little lacking – large messages would take so long that procmail would freak out. I thought the accuracy was great, though.

I’d even be willing to experiment on this with Dr. Felten. I can provide a list of words that frequently occur in my legitimate email. He can set up a fake email address and send me spam of just the kind he describes. In fact, he can take actual spam and just add in the words he is attacking me on. Ideally, this would come from similar sources to real spam. He could find some open relays in some ORBs and bounce it off that, directed at me. I’d suggest sending me about 50 of these a day – there is a little lag between me getting spam and it getting trained into spam. I can keep up with the SpamAssasin and POPfile scores on all legitimate mails that contain the attack words, and measure those over time. We can see just how it goes. My prediction is that possibly he could affect the POPfile ratings but that even trying like a mother that he can not make one of my legitimate mails get measured as spam. Dr Felten, if you are interested send me a mail. I’m confident that it won’t get filtered out.

Update: I’ve been thinking about this on and off all day. After cogitating a while, I think that pulling off this attack in the real world is even less likely than I originally thought. That’s because in most Bayesian systems like POPfile, you train on errors. For this attack to be successful in a wide variety of systems, you would have to send spam that contains the attack words, is unambiguously spam such that the user will run it back through the filter training, yet is sufficiently un-spammy that it will slip through the filter first. In other words, to successfully attack the Bayesian filters the attacker must be able to evade them at will. If they could do this, then the filters wouldn’t be useful because they wouldn’t be stopping things. I think this is an interesting idea but it just plain doesn’t translate into a real world attack because of the train-on-errors method most commonly used with Bayesian filters.