Porting Elkhound to C#

We have recently released an updated beta of Clover.NET and the major change in this snapshot is the porting of the Elkhound parser backend to C#. In this entry I describe some of the issues we encountered and some of the surprising results.

The parsing component of Clover.NET is based on Elkhound, a BSD-licensed GLR parser developed by Scott McPeak. Elkhound is implemented in C++ so originally Clover.NET used the parser as a COM component using .NET’s interop facilities. That worked well enough once you understand the necessary reference management between C# and COM but has a few downsides. In addition to the greater complexity of this arrangement, it made mono support more difficult. We’re not yet supporting Clover.NET on mono but it is a development goal.

To make Clover.NET fully managed, in Microsoft parlance, we needed to port the runtime components of Elkhound to C#. While Elkhound comes in one big piece, it really has two roles or phases. First comes grammar analysis/table generation and at runtime is the parsing component. IN the C++ code these are not distinct and some classes are involved in both apsects. For Clover.NET we only needed the parsing component in C# – we are happy to live with the original C++ based grammar analysis code.

Our first approach was to just copy the operation of the C++ code generation classes into C#. The generated code would work with a C# based version of the Elkhound runtime classes. we had two problems with the port. The C++ code generation generates parsing tables as static integer tables

// storage size: 264264 bytes
// rows: 924 cols: 143
static ActionEntry const actionTable_static[132132] = {

These are pretty big tables. We were pretty surprised to find Microsoft’s C# compiler completely fail when trying to compile the same type of construct in C#. There was a range of internal compiler errors reported. Ultimately, we had to adopt a different approach whereby the parsing tables are written out to a file and this file is included in the assembly as a resource which is loaded at runtime into the arrays.

The second problem occured in the port of the runtime support code. Ironically the problems were in the management of reference counted objects that are managed in a pool. The C++ code made a lot of use of smart pointers which adjust and release reference counts as the pointers go out of scope. When the reference count reaches zero, the object is returned to the pool. Objects going out of scope is one of those implicit operations in C++ that you can miss when moving the code to C#. You can also miss the reference count increases that occur when smart pointers are assigned. The result was that objects would be returned to the pool when they were in fact still in use. Not surprisingly, these were very hard bugs to track down since the fault and its ultimate manifestation are separated in time and space (code). Luckily we have a very large test suite for Clover.NET’s instrumentation operations so we are confident the port is now working well.

When you move C++ based code to a managed environment such as .NET you expect some performance loss. We were pleasantly surprised, therefore, to note significant speed increases and equally significant reductions in the memory profile of the Clover.NET instrumenter. We suspect that interop imposes a hefty performance cost.

We’re very happy with this change. I believe a port to Java would be straight forward from this point. I’m quite happy to have a GLR parser available.