Grey Spam

Posted on February 16, 2007
Filed Under weblogs | 7 Comments

The other day I linked to this Akismet criticism. If you follow the comment thread, it seems to me the original case of the poster gets weaker and weaker as the argument continues. I know of three cases of false positives in my 37,000+ comments that Akismet has flagged. That’s better than I do by hand because when you sift through a few hundred or thousand of these, it gets more and more possible that you miss some. Comment spam that is trying to get through moderation works the same way as ack-ack did in World War I air battles. Through enough of it up there and you are bound to hit something.

There’s another phenomenon that I’m experiencing. I just had to deal with a few of these this morning. That’s comments that I legitimately cannot tell if they are spam or not. They tend to fit a pattern: they come from Europe, they have as the website link a commercial site but they have a few sentences or paragraphs that are actually applicable to the post it is in reply to. My guess is that they set up these posts, Google or Technorati search on blogs that match it and submit to all of them.

In general, these never got approved. The judgment call that comes back to me is whether I simply delete them, or mark them as spam to be learned as such in the Akismet engine. I’m not always sure what to do, because in some cases I’m not 100% sure they are spam. Ultimately my heuristic comes down to this. The benefit of a doubt is gone with comments. If I can’t tell it is legitimate, it doesn’t go on the site. If in doubt, it is not approved. If I can’t tell it is spam, I don’t report it back to Akismet. In these cases I simply delete it.

Once again, let me highlight that when you post comments on blogs it is up to you to be distinguishable from spam. Be distinctive, have a human voice. If it looks a lot like the 100 spams around it in the moderation queue, it is going into the bit bucket.

Comments

There is a posted comment policy for this blog. Please respect the rules.

7 Responses to “Grey Spam”

    Comment Permalink
  1. Andy Beard on February 16th, 2007 9:48 pm

    First up I think you might need to read the post, because it was actually more complementary to Akismet than you imply, and was more intended to make people aware of how they can help, in the same way as you might warn users to unsubscribe to an email list rather than flagging it as spam.

    Secondly, I don’t think you read the comments very clearly. Do you really think Mark Jaquith would write a plugin to help people retrieve their comments if this wasn’t a problem? He is a very experienced WordPress developer and probably the most active of people not directly employed by Automattic.

    You might also appreciate this post on the Blog Herald who I am sure get far more spam than you do, but also look on this being a problem.

    http://www.blogherald.com/2007/02/14/false-positives-and-better-akismet-spam-management/

  2. Comment Permalink
  3. dave on February 16th, 2007 11:30 pm

    Andy, having now gotten two of your comments I think I understand why some of them don’t show up on the intended blogs. It might be your lovely personality. The temptation to send you to the bit bucket certainly was there for me.

    I repeat my main point. If go through 37,000 moderated comments that are 99.97% spam, I will make mistakes and delete legitimate comments. I did at least two when I was doing it by hand. As much as I love the comments (which are my favorite part of this, even with you bringing down the average) I would never ask someone to do this shit by hand in order to make absolutely sure mine went through. If I get caught, I’ll consider it taking one for the team.

  4. Comment Permalink
  5. PJ Cabrera on February 17th, 2007 3:29 pm

    Dave,

    You didn’t heed rule #1 of online fora: Don’t feed the trolls. Don’t whine if they come back for more feedings. LOL

    PS – I humbly accept the above statement may certainly be applied to my own comments on occasion. :-)

  6. Comment Permalink
  7. dave on February 17th, 2007 4:16 pm

    PJ, you might be right. If that happens, you are off the hook and I’m on it.

    It’s not uncommon when I get really argumentative comments to make them sit in moderation a day or more. Given the right mood, I could easily have done that with Andy’s. It seemed reasonable to point that out, since his whole thing is about moderation of comments.

  8. Comment Permalink
  9. Ken Nelson on February 17th, 2007 4:38 pm

    Having been involved personally in false positives (I solely, I think, constitute at least 50% of your false positives), I can still say Akismet rocks. And I’ve retrieved comments from James and PJ from the sludge filter at my place. Don’t know how they happened there, but I check Akismet twice a day on my little site, and usually wind up with the “delete all” option.

    Looking for good stuff amongst the probably bad (with Akismet) is far easier than looking for good stuff amongst the unknown (without Akismet).

    I hope I’m at least maintaining the average comment value-add here at EGC.

    Rock on.

    -k-

  10. Comment Permalink
  11. Brendan on February 18th, 2007 6:15 pm

    Akismet isn’t perfect.

    Now we have that out of the way – suggestions that it’s therefore not perfect aren’t exactly an understatement or breaking news.

    Like any heuristics based system – it needs to be taught and it will occasionally get things wrong.

    It desperately needs a better ‘bulk’ management option, I think most people would agree on that – but it’s better than nothing and it’s certainly no worse than moderating pretty much ‘everything’ and having to manually approve most comments as a result.

    Yes, Akismet makes mistakes, even though the failure rate can be as low as .5% – the trick is, rather than throwing ones hands in the air and declaring it “crap” – one could at least ‘teach’ Akismet by un-spamming valid comments.

    That will be far more effective and improves the spam engine’s spam seeking abilities for all users.

    It’s easy to become paranoid about how “others” can screw up a collective based spam engine – but the reality is the percentages of people abusing the system are going to be as low as the failure rate..

  12. Comment Permalink
  13. Andy Beard on February 18th, 2007 7:05 pm

    Dave by that comment I think you are proving the point that some people as I suggested my deliberately leave comments in the sin bin because they didn’t agree with what was being said, or flag it as spam.
    Such inaction or action adds to the collective intelligence.

    If you link through to my blog and join the conversation, you should expect me to respond. In the comments of the first post you mentioned criticism – in this post you actually used link text “akismet criticism” – I don’t think the post was overly critical, and pointed out how users could improve Akismet by their own actions.

    There are other interface features that would be useful, and making it less binary is often suggested. Spam Karma is less binary, and can even use Akismet with another plugin as an additional variable in its calculations should I choose.

Bad Behavior has blocked 2995 access attempts in the last 7 days.