Spam blocking

I’ve recently been playing with my email system in an attempt to reduce spam.

Unfortunately I’m inundated with spam. I have had the same email address for over 8 years now and in the early days I posted it about the place without much care. In particular I was a regular poster to Usenet, particularly the C++ and Mac news groups. As a result I get lots and lots of spam. It’s pretty annoying.

My work email address also appears in a lot of Apache Ant documentation and source code, so I get quite a few virus emails too. In fact, since these viruses spoof the sending email address I sometimes get virus email from myself.

I have gone so far as to set up a Spamcop account. It has been easy to use, has a nice webmail interface and is reasonably effective. It is not, however, effective enough for me even when I selected every available blacklist.

While I am happy to see spammers prosecuted, ultimately a technical solution is going to be a better answer for me than after-the-fact legal action (as if I could afford the time or money for that anyway). After reading Paul Graham’s article on Spam, I decided to update my email system. I selected Bogofilter as a easy to manage Bayesian filter.

Up to now I have always popped my email direct from the pop server into Mozilla Mail. To get Bogofilter into the chain that needed to change. I installed fetchmail to pop the mail which then sends it to sendmail/procmail. I configured procmail to invoke Bogofilter and separate out the spam.

Mozilla mail does not support local mail. Actually it does support it in some sort of half-arsed fashion but I didn’t want to risk that so I have switched to KDE’s KMail. I have to say I am wrapped with KMail. It has the easiest filter config setup I have ever used especially when you use lots of mailing lists. You can keep your email system fresh by setting expiry conditions on folders, multiple identities, etc.

So far it is working well to reduce spam. I have turned off the spamcop filters to give it the stress test. I think it is catching more spam than spamcop did so that’s good. I have setup two folders – one for false positives (none yet) and the other for false negatives (a few). Bogofilter is setup to update its word lists when it processes each email. I then occasionally tell Bogofilter to reverse its thinking on the false negatives.

I have run into some problems with fetchmail and the fact that I pop email from both work and home accounts. I have developed a javamail based alternative about which I will blog a bit later (It’s getting late …)

Gauntlet Programming

What is Gauntlet programming? It comes about like this:

Brendan: We really need an alternative to this graphing package

Me: Sure – can’t be that hard, can it?

short silence

Brendan: Well I think that is a challenge

This is the gauntlet being thrown down. I accept the challenge, of course. In fact I like a challenge – it represents, to me, a highly motivated opportunity to learn more. For me, learning with contrived examples and play problems is much less satisfying.

I have learned heaps from this particular gauntlet and it was fun too. Now onto the next challenge. I believe you need to keep learning. It might even be time to learn a new language. Any suggestions?

Of course, I should mention that my family motto is “Vincere Aut Mori” 🙂

Is Ant2 Required?

Oliver asks why Ant2 is not required? I know Oliver well (we used to sit in adjacent cubicles) and he likes to ask these curly questions, especially while keeping a straight face. Undaunted, I will attempt an answer here.

To really answer this question we need to agree what Ant2 means or meant. If we are referring to the feature set described on the Ant site then many of those things have already come to pass. Data types, multithreading, conditionals, filtersets, javac facade, etc are all in Ant now. An SSH task has just been added and Ant 1.6 will support XML files with namespaces (there is some way to go on what interpretation Ant might give to a namespace). Over time more will also be coming along. In many ways, therefore, Ant2, especially from a user’s point of view is coming along – it’s just still called Ant 1.x

From an Ant developer’s point of view, however, the internals of Ant, what we refer to as the core, are not drastically different, although not static by any means. Many of the issues we have known about, such as poor core/task separation, poor encapsulation, the classloader hierarchy and package collisions are still there.

The question is how to solve these problems. It is easy to start from a clean sheet of paper and, with the benefit of hindsight, design a new Ant core which can be quite compatible with Ant 1.x. Well maybe easy is bit of an understatement but it can be done and I have in fact done that once. Adoption of such a revolution is quite hard to achieve. It is just too much of a change for people to feel comfortable with. Pushed too hard it could be a community breaker.

So, my conclusion is that evolutionary change is probably the way to go. Joel spolsky’s article is pretty relevant here. Changes need to be achieved in incremental fashion sometimes through the use of what I would call micro revolutions. These might even entail micro breakages in compatability.

So, let me change things a little. Let me define Ant2 as the first Ant release which requires JDK 1.2+ for the core function. So, if we were to take that decision now, Ant 1.6 would be called Ant 2.0.

Is Ant2 required? No, not yet.

Ant coverage with Clover

For the last week or two, I’ve helped out on the Clover project at Cortex. We’ve added a few Ant tasks to make it easier to integrate Clover into build files. Clover 1.1 has now been released so you can check out the features yourself.

One of the things I did while testing Clover was to generate historical coverage reports for Ant. I’ve uploaded the results for the Ant core and also for the complete Ant build.

The total coverage is always a lot lower than the core primarily because the optional tasks do not have many tests and for those that do, I generally don’t have the supporting libraries to be able to run the tests. 53% in the core is not that bad, although I want to use Clover to push that higher for Ant 1.6

I generated a coverage report for the first day of every month over the last two years. It’s quite interesting to look at the coverage improving over time and also the growth in the Ant codebase over time. I’m not sure if the dip at Jan 2002 is a problem in my setup or a real decline. It may be that some tests failed in this period. Failures used to halt the tests which would significantly affect the coverage.

In a future entry I’ll describe how I went about generating the coverage information over time.

I have to say that I love the graphs, mostly because I had to write that code using a process, known at Cortex, as gauntlet programming.

Apache Maven Project

The Apache board has voted to make Maven into a top level Apache project. Congratulations to all the Maven dev team. It will be interesting to watch how Maven develops. I’m sure I’ll have more to say on this topic in future. For now, let me wish Jason, Dion and the other Maven developers lots of luck.