One of the reasons I had time to update my Blog software is that we, Cenqua, have recently released Clover.NET 1.2. There’s lots of good stuff in it but I won’t go into most of it here. Have a look at the announcement in our forums for more info. What I thought might be interesting to talk about are the issues in parsing VB.Net.
When we started to add VB.Net support to Clover.NET, I thought parsing VB.Net would be relatively easy. After all, on first inspection, for most statements, there is a corresponding “End” statement. I thought that the regularity that implied would make everything quite simple. If only it were so …
We started from the Microsoft VB.Net language specification and it’s a good start. Defining a grammar for a language parsr, however, is a precise art and it soon turns out that the real specification of VB.Net is the set of programs which can be compiled by vbc and not that specified by Microsoft’s grammar. Since releasing VB support, we’ve had a few reports of programs that out parser rejects. In most cases, we’ve found the program does not meet the language spec but it is accepted by vbc.
One example would be the specification of EventMemberSpecifiers. In the specifcation, the event specifier takes an Identifier. In reality, this can be either a keyword or an identifier. So, even though “select” is a keyword, the following is ok
Private Sub MenuItem1_Select(ByVal sender As Object, ByVal e As System.EventArgs) Handles MenuItem1.Select
Since the MS grammar is explicit in other areas about allowing keywords or identifiers, it makes you wonder why it does not allow that here.
The ability to use keywords as identifiers is pretty unusual in a language. I wonder if that is a result of .NET’s multiple language ideas. It certainly makes parsing a challenge.
Another example is the block construction. A block is defined as a collection of one or more labeled lines, each of which must end in a LineTerminator, which is a CR or LF one one of the Unicode variants. So, when a TryStatement is specified to take a block, this should not be legal but is:
Try : stream.Close() : Catch : stream.Close() : End Try
So, a block can be terminated by a colon too, which is unfortunate as the colon is used for a few other things too.
Anyway, you get the picture. If you are parsing VB.Net, and using the specification as your guide, you’re in for a few surprises. It’s what makes life interesting, isn’t it?