Subversion Property Naming

As part of developing Subversion support for FishEye, we were concerned about the ability of just anybody to hook up a FishEye instance to a public Subversion repository. If enough people attached FishEyes to a repository it might have an impact on the server load. In contrast to FishEye for CVS, FishEye for Subversion uses the standardSubversion supported network protocols to access the server. This means that a public Subversion server is available to anybody wanting to run FishEye against that server.

To give repository owners some control over this access, we added a check in FishEye for an access control property. In the upcoming beta, this will be a fairly simple check to see if FishEye is granted access or not. In the future it will probably be a little more sophisticated.

Our initial implementation was to call this property fisheye:access. It seems natural to choose a scheme which is similar to the scheme used by Subversion itself, with such properties as svn:mime-type and svn:author, etc. When you do use that scheme, however, and your repository is accessed over http, the resulting XML documents used by DAV are not well formed. In particular, you may end up with something like this:

<C:blah:test>test2</C:blah:test>
<C:blah_test>test</C:blah_test>

The first label is not well formed XML in a namespace aware XML document. This errata for XML names clarifies as follows:

” It follows that in a namespace-well-formed document … All element types and attribute names contain either zero or one colon; ”

We’ve found that when Xerces encounters these names, it maps the colon to an underscore. The above two properties, blah:test and blah_test, coalesce into a single property value blah_test.

We have now decided to change to fisheye.access to ensure we don’t run into problems for people with strict XML parsers. The native XML libraries usd by Subversion seem to handle this problem without concern. They are probably a little bit loose in their XML handling. It’s debateable whether such looseness is a good thing or not.

The Early History of Ant Development

Stefan recently noted that it is five years since the first public release of Ant as an independent project release. A while ago I was asked by someone, researching open source development, for a description of Ant’s development process. What I produced really described the early history of Ant development, which I’ve edited a bit for this entry.

I’ll start out by noting that this is based on my personal views, coloured by both time and my personal perceptions. I think it’s interesting, as it shows how an open source project can develop really cool products and also some of the stresses and strains that come about in the development.

The Ant project started as a component of the Tomcat software donation from Sun to Apache. It was authored by James Duncan Davidson. As far as I know, early incarnations of Ant were based on Java properties files and not on XML, but properties files are not powerful enough to represent the sort of constructs which were needed, so Duncan moved to XML. I guess that was a fateful decision. As Ant developed and builds become more complex, many people would argue that Ant was heading towards programming by XML. There are shades of truth in that, for sure, but it’s has endured. I’ve seen proposals to rewrite Ant using scripting languages, etc. but they never seem to get a lot of traction.

My involvement in Ant began as I was trying out the Tomcat project. This was about November 1999. I wanted to get into J2EE development and began by looking at Tomcat’s code. As with most people who become Apache committers, I began by sending patches to the tomcat-dev mailing list.

Some of the early Tomcat developers made significant early improvements to Ant and realized that Ant could be used for Java projects which were not related to Tomcat. I think some of the early Ant adopters were the Apache XML projects. At some point the tomcat developers realized that Ant was not directly related to Tomcat, was being used by other projects and needed to be its own subproject. On 13-Jan-2000 Duncan announced the creation of the Ant sub-project (of the Jakarta project).

While many of the Tomcat developers were active in the early development of Ant as a standalone project, I think that Sam Ruby really worked hard to keep it going. Over the next few months the number of active committers began to decline. Most of these committers were busy with other projects, such as Tomcat of course. This occurred at the same time as there was a significant increase in interest in Ant outside Tomcat and even from Java projects outside Apache.

The result of these two trends was an increasing backlog of patches which were causing some frustration in the nascent Ant community. Even Sam eventually began to have limited time to devote to Ant. Sam began to add committers to Ant. Arnout Kuiper was added and laid the foundation of Ant’s documentation. He would also create the ** directory matching convention, which I wondered about at the time but which is second nature now. Thomas Haas and Stefan Bodewig created the JUnit tasks and some of the fundamental types in Ant, such as <path>. Sam then nominated Stefan Bodewig and myself as committers to the Ant project. This was part of a change in the Ant committer base from people primarily involved in Ant as a tool for their other open source activities (Tomcat, XML, etc) to committers whose prime open source interest was in Ant itself.

About this time I had introduced Ant into my work environment. I was lucky that I worked for a company willing to give Ant a go. That helped me to appreciate Ant as useful beyond building open-source Apache projects.

Stefan and I went through a frenzied period of activity as we committed long outstanding patches. Some of this was code we had developed but a lot was from other people who had made great suggestions for Ant’s improvement. One of these sticks in my mind and that was the addition of Build Events provided by Matt Foemmel. I also remember Stefan’s introduction of the IntrospectionHelper which set the rules relating build element structure to underlying Java class structure. It took me a while to fully appreciate the wonder of that code.

This was around the end of June 2000. In addition to continuing code contributions, Sam now took on the role of project leader, suggesting that a release would be a good idea. As Stefan noted in his blog, he acted as the release manager for this first release. Whilst I did take over for Ant 1.2, I was amazed at the time how Stefan knew how and where to put things for the release.

As there was a release of Ant already in the wild, as part of Tomcat 3.1, Stefan called it Ant 1.1 and it was released about the middle of July, 2000. Ant 1.1 was the first official release of the Ant project and a significant amount of code had been added to Ant since the original Tomcat release. We eventually called the Tomcat release 0.3.1 as a clue to its origins and the fact that it was a pre-Ant project release.

Having an official 1.1 release just increased the visibility of the Ant project and large numbers of people began to use Ant for building all sorts of things. New tasks began to flow in, from SQL tasks to tasks that supported different SCM systems and tasks for tools such as JavaCC. I think people really saw Ant as an improvement in the way they were building their systems and wanted it cover all aspects of their build requirements. There was also huge improvements in the basic Ant infrastructure in areas such as <exec> and <java> to make sure these could be controlled and operated reliably and predictably. Nico Seessle contributed a set of Ant testcases which improved the reliability of Ant builds not breaking features.

Around the end of October, Ant 1.2 was released. The momentum of patches and changes had continued and Stefan and I committed a lot of code in this period. From 2000 to 2001, Ant’s codebase went from about 15k lines of code to 100k.

Ant 1.2 had a few backward incompatibilities which caused a few heartaches for users who were still using 0.3.1. Whilst we had flagged these by using deprecation and warnings, users were still annoyed. They were at times quite vocal about the pain these changes were causing them. I think this was the genesis of a conservative culture within the Ant project where we try hard to maintain backward compatibility. We still have some backward incompatible changes in each release but these need to be quite strongly justified before the Ant developers will accept them. The result is that, today, Ant has a pretty good reputation for just working out of the “box”.

During this period, most of the committing of patches was being done by Stefan and myself, with Stefan easily the most active of us. Sam Ruby had moved on to work on other projects, most notably Gump. Gump was, and continues to be, incredibly useful for Ant’s development as it served as a huge test bench for Ant. Each night Gump would build a huge number of projects and any break in Ant would be instantly obvious. Gump became Ant’s early warning system. We continue to catch a number of problematic changes this way, although not as many as when Ant’s core was changing so much.

While Stefan and I were very busy there were some developers submitting lots of patches and it became clear that we needed to add more committers.

In October, Glenn McAllister joined and did work on the file manipulation tasks such as <copy>, <move>, etc. Simeon Fitch joined to work on an Ant GUI, Antidote. A change in his availability would see the Ant GUI project pretty much die. It has been revived on a few occasions but it has never flown. Support for Ant in mainstream IDEs pretty much means it never will. It has recently been discontinued officially.

Peter Donald joined around the end of November 2000. Diane Holt made contributions to improve Ant’s documentation and usability. For a year Ant had been travelling along nicely but its increased exposure had a few effects, I think. People began to think about what was wrong with Ant and how it could be done better. Duncan returned to the project with a proposal to rewrite Ant, called AntEater. Peter Donald proposed a rewrite of Ant, called Myrmidon, based on Avalon components. There were some other proposals as well. There was a lot of upheaval at this time with questions abounding about who could set the direction of the future of Ant. In the end Duncan left the project. I wont go into the details but this upheaval would continue to affect Ant for another 18 months. Duncan would switch focus from Java to MacOSX development and has now authored an impressive collection of books on that subject. I still drop by his site just to check out the great photos he puts up.

The result of the proposals to rewrite Ant was an attempt to gather requirements for what would become known as Ant2. It seems to me that most successful open source projects go through a “2” phase where people decide that the 1.x style development is flawed and needs to be rewritten. In many cases they are right. It’s often called the “second system” effect and is a natural, if somewhat disruptive process. Such proposals usually involve a compatibility break and a break with the past. In Ant’s case the effort was controversial and ultimately doomed to failure, of a sort.

While people began to think about Ant2, Ant 1.3 development continued apace. There were very few problems with Ant 1.x development. Mostly we added things as requested by users and developers. For Ant 1.3, I was again release manager and I began the release cycle by creating a CVS branch for Ant 1.3. This was to become the standard way of doing Ant releases although the actual usage nowadays is a little different from what I originally envisaged. Ant 1.3 went through a number of betas and was eventually released in March.

Ant 1.x was now useful for a number of projects and still growing. It would add about 60k of code in 2001. Ant 1.4 was released in Sept 2001, followed by Ant 1.4.1 in October. Ant 1.4.1 was very stable and there would not be another Ant release for till Ant 1.5 in July 2002. Ant was maturing as a codebase and this can be seen in the longer timeframe between major releases and the introduction of point releases to correct minor bugs. The Ant 1.5 branch would reach Ant 1.5.4.

In response to the Ant2 requirements, Peter Donald refined his Avalon based proposal, Myrmidon. I was not convinced of the benefits of the additional complexity and dependence on Avalon and so I began my own implementation of Ant2, known as mutant. To cut this particular long story short, I pursued mutant as a rewriting of Ant1, using mostly the same principles but cleaning up a lot of the operation in the area of classloader structure and element configuration – separating configuration from execution. There was a lot of competition between these two proposals. In the end the Ant project did not have the processes to deal with this situation and the adoption of a new codebase. In July 2002, I decided to abandon Mutant and to withdraw a little from Ant development itself. I took that action to bring the issue of adopting an Ant2 proposal to a head. Without a competing proposal, the question would be a straight question of choosing between continuing Ant 1.x and adopting Myrmidon as Ant2.x. I’m not sure why but Myrmidon development seemed to fade at this time too.

Evolution is the way of open source projects and revolutions are difficult, sometimes really difficult. Often the major changes in direction come from outside a project, in the form of new, competing projects. While it can and does split a community, it gives users the ultimate choice. As other approaches to building projects come along, the users will eventually decide what will be the tool which is used, whether one fades and dies or both continue with their adherents. It’s the beginning of the religious wars, which you would do well to steer clear of.

Having looked at the process problems that the Ant2 concept threw up for the Ant project, it’s worth looking at how the Ant project does make decisions. Ant is an Apache project and follows what might be termed “The Apache Way”. For most people, the Apache Way refers to the concept of using +1/-1 votes to make decisions and the ability for committers to veto unacceptable code changes. There are, however, many nuances in what this really means, so I will describe how Ant works in practice.

All interaction between Ant committers and developers and all decision making occurs through the medium of the Ant-Dev mailing list. Committers are mostly loosely coupled, working alone, and choose what they want to work on, whether it is code they wish to contribute, patches they wish to apply or bugs they wish to investigate. Ideas about major features are often discussed beforehand on the dev list. Some other projects use IRC for decision making which is perhaps more dynamic. Ant has always had committers in widely dispersed time zones which makes IRC less practical. Personally I like the mailing list approach.

All code changes made to Ant (and other Apache projects) generate an email which is sent to the dev list. Thus, every committer (and others) are aware of all changes being made to the code. This results in an ongoing code review process. A committer seeing a change with which they do not agree, can -1 that change whereupon it must be rolled back. This is known as a veto. In many case, a -1 is used to indicate the code change has problems and once these are resolved, the veto is lifted. In fact, vetoes are relatively rare.

A rising number of vetoes is a sign of potential trouble brewing in a project, IMHO. It often indicates a fundamental split in the committers’ views on how the project should go forward. Some people wonder why there are things like project by-laws and other red tape in Apache projects. When a project is humming along, none of these are really needed. They only come into play during difficult times. They are the framework used to resolve disputes.

I said above that Ant2 was a failure but the reality is that it laid out a roadmap of good ideas for Ant’s development even if they could not be achieved in a big bang re-development. Many of these ideas would eventually be implemented in Ant. In fact Ant 1.6 would contain many features that were originally considered Ant2 candidates and even some features that were rejected as being inappropriate for Ant2.

Approaching the end of 2002, after taking a break, I came back to Ant as we started to consider whether to make Ant a top level project. That is, whether to move Ant out of the Apache Jakarta project into its own project. This was really just an administrative change although it meant that the Ant project could define its own way of working without worrying about Jakarta conventions and Jakarta project management. This meant Ant had to manage its own website, etc. Part of being a top level project in Apache involves the creation of the PMC, a body which is responsible to the board for the management of the project and its assets. It all sounds terribly business like but it is pretty much still business as usual. I agreed to be the chair of that PMC.

Since the change to a Top Level Apache project, and the decline in Ant2 as a viable concept, Ant has continued down the evolutionary, incremental improvement path of Ant 1.x. Over that time, committers have come and gone, each contributing something to Ant’s whole. Of course there is a huge array of developers who are not committers who have contributed to Ant as well.

Today Ant has a whole range of new committers taking Ant into new directions. I won’t name them all for fear of leaving someone out. You can easily find out who they are by lurking on the Ant-dev list for a short while. These new committers have stepped up to the mark to really manage new Ant features. These probably go beyond what was first envisaged for Ant2. Under the covers the code creaks a little here and there but it has stood the test of time.

The process issues facing Ant today are how to manage long lived development branches. Currently there is too much duplication of effort to keep the current development branch and the trunk in sync. The conventional approach is to manage this by occasional merge operations between the branch and trunk. Open source projects, however, IMHO, want to have all bug fixes on all active branches all the time, making it difficult to defer merges.

I am not so active these days, pretty much restricting myself to the dealing with the people who haven’t read the Jar Manifest specification, but I still keep in touch with Ant development. It has been a great ride so far.

Coming to America

I’ve been in San Francisco for a few days now for JavaOne. It’s always interesting to come to another country even when we speak the same language – well almost the same language. There are so many little differences in the way things work. You realise all those unwritten little conventions that guide the way you react to things.

I’ll be manning the Cenqua booth on the pavillion for the show and also putting in a few appearances at the ASF booth. If you are interested in FishEye support for Subversion, have a chat, as I’ve been working on that recently and we have started the alpha program. If you just want to drop in and say hi, you’re most welcome. Look forward to seeing you there.

Interface Evolution and Exceptions in Java

I’m firmly in the camp that favours the checked exception style used in Java as opposed to the unchecked approach used by C#. For me it’s as fundamental as explicit declaration of variables. Java is, in general, a statically typed language and, for me, checked exceptions are part of that style. It’s a different story in more dynamic languages such as Python. I do recognize, however, the fact that some people have strong opinions to the contrary and I’m not out to convert anyone here.

I have come across an issue where the explicit declaration of exceptions does cause a problem. Today, when you declare in Java that your code throws a particular exception, you can never take that away that without breaking somebody’s code. You can’t evolve the interface in a graceful way. A recent case in point involved a seemingly innocuous change to BCEL. The signature of a constructor changed from

public ClassParser(String file_name) throws IOException

to

public ClassParser(String file_name)

Unfortunately that broke the following code in Ant, since the compiler says, quite rightly, that the code does not throw an IOException.


try {
new ClassParser("force");
} catch (IOException e) {
// ignore
}

Here is the crux of the problem. If we changed Ant’s code to work with the new BCEL code, then Ant would not be compilable against the old BCEL. Ant’s and BCEL’s releases would need to be synchronized which is not a good thing.

I added the exception specification back in to the BCEL code but it’s not a very satisfying approach. The BCEL constructor doesn’t throw the exception anymore and there should be a way of saying so, of evolviong the interface gracefully. The best idea I could come up with was adding something like a @deprecated_throws indication which indicates that the method used to, but no longer, throws a particular exception. The compiler would emit a deprecation warning on any code that caught this exception. As with other deprecated usages in Java, it would not cause an error. Such a mechanism would allow Ant to compile with both the old and the new versions of the BCEL code. The Ant and BCEL releases would be decoupled and sufficient time would be available for the BCEL change to propagate through its users.

JUnit Manifest – Why no MainClass?

Time for a minor rant.

Why doesn’t the jUnit jar have a manifest with a MainClass atrtibute? Surely there is some useful default application that junit could run in this instance – I don’t know – maybe the Swing GUI test runner. It would sure save me looking up and typing the class name on the occasions I need to run a test outside Ant. Has further JUnit development stopped?

Oh well, back to those tests …

Update: OK, I can think of a few reasons 🙂