I’ve recently been playing with my email system in an attempt to reduce spam.
Unfortunately I’m inundated with spam. I have had the same email address for over 8 years now and in the early days I posted it about the place without much care. In particular I was a regular poster to Usenet, particularly the C++ and Mac news groups. As a result I get lots and lots of spam. It’s pretty annoying.
My work email address also appears in a lot of Apache Ant documentation and source code, so I get quite a few virus emails too. In fact, since these viruses spoof the sending email address I sometimes get virus email from myself.
I have gone so far as to set up a Spamcop account. It has been easy to use, has a nice webmail interface and is reasonably effective. It is not, however, effective enough for me even when I selected every available blacklist.
While I am happy to see spammers prosecuted, ultimately a technical solution is going to be a better answer for me than after-the-fact legal action (as if I could afford the time or money for that anyway). After reading Paul Graham’s article on Spam, I decided to update my email system. I selected Bogofilter as a easy to manage Bayesian filter.
Up to now I have always popped my email direct from the pop server into Mozilla Mail. To get Bogofilter into the chain that needed to change. I installed fetchmail to pop the mail which then sends it to sendmail/procmail. I configured procmail to invoke Bogofilter and separate out the spam.
Mozilla mail does not support local mail. Actually it does support it in some sort of half-arsed fashion but I didn’t want to risk that so I have switched to KDE’s KMail. I have to say I am wrapped with KMail. It has the easiest filter config setup I have ever used especially when you use lots of mailing lists. You can keep your email system fresh by setting expiry conditions on folders, multiple identities, etc.
So far it is working well to reduce spam. I have turned off the spamcop filters to give it the stress test. I think it is catching more spam than spamcop did so that’s good. I have setup two folders – one for false positives (none yet) and the other for false negatives (a few). Bogofilter is setup to update its word lists when it processes each email. I then occasionally tell Bogofilter to reverse its thinking on the false negatives.
I have run into some problems with fetchmail and the fact that I pop email from both work and home accounts. I have developed a javamail based alternative about which I will blog a bit later (It’s getting late …)
Interesting – I have been using SpamAssassin for a while. Could you describe your setup a little more. How did you create your list of spam words in the beginning. And how do you keep updating it?
If all you wanted was a Bayesian filter, you could have upgraded to Mozilla 1.3 – this is one of its major new features.
I’m using ifile myself (yet another bayesian filter) as it fits into my MUA (Xemacs + Gnus) quite well. So far it catches almost all of the spam I receive, but I have quite a few false positives. HTML-mails and mails with attachments have a big chance to get sorted out as spam (attachments because of the insane number of viruses I receive).