Home | Menu
MANDATE MEDIAdigital strategy for people changing the world

Toward better spam filters for blog comments

One of the challenges of running a blog is the proliferation of blog comment spam. Huh? Yeah, believe it or not, the folks who brought you email spam are now dumping info about off-shore gambling and off-shore pharmaceuticals into the comments of your favorite blogs.

Right now, responses seem to be limited to banning particular IP addresses and banning specific words. IP blocks are tough - mostly because the bad guys are able to spoof their way through thousands of different IPs (and because sometimes you'll catch innocent folks on legitimate dial-up IPs). Banning words is worse - it's easy for comment spammers to switch from, say, cialis to cia|is - not to mention the collateral damage of banning good words with bad words hiding inside them - like socialism and specialist.

The cialis/socialism problem was first noticed a couple weeks ago by Jeff Jarvis at the BuzzMachine.

I had to put "cialis" in my comment-spam filter to stay ahead of the swine. But, of course, this is stopping people from putting up legitimate words. I should fix that. But I'm kind of enjoying the discovery. First, they couldn't say "socialism" and thought I was trying to turn that into a dirty word. Now it's "specialist." Can we ask the makers of performance-enhancing drugs to please come up with names whose order of letters does not appear elsewhere in the English language?

That led to some thinking on my part about better comment spam filters. Here's my comment over at the BuzzMachine:

What I can't figure out is why the blog software providers haven't figured out some pretty obvious tools for this stuff.

We're not limited to simple word filters, registration schemes - or leaving the door wide open.

For example, a 'click to confirm' mechanism could be easily built. When a comment goes up, an email would be sent. When the emailed link is clicked, the comment goes up. To make it less annoying, you might then allow future comments with the same IP/email pair to automatically go up without clicking. The blog owner could also ban particular email addresses. (This wouldn't stop comment spam altogether, but make it harder on automated systems - and more costly, in time, to spam yours. They'd find another victim.)

Another option: Use the very power of blogs - its audience - against the comment spammers. Why not a "report this comment as spam" link? If it gets X clicks from audience members, it would get pulled and put in an approval queue for the moderator. You would, of course, stand the danger of audience-censorship of unpopular non-spam comments - but that's why you set that threshold at an appropriate point. (Different for every blog, depending on audience size.)

Finally, we could set up Bayesian filters - just like the ones people are using for email these days - to screen for spam. The system would easily and automagically distinguish the words that appear in legitimate comments from those that appear in comment-spam. It'd require some training, but my Bayesian email spam filters work fabulous now.

What are the blogging software guys waiting for?

So... my friends at Movable Type, WordPress, HaloScan, and Blogger... what gives? Let's get on it - who wants to be the new blogging industry champ?

Posted on August 4, 2005 in blogs | See full archives

Phone‪503-609-0561‬