Extending Ruby with C
Added 31 Jul 2008
I'm a big fan of the so-called agile programming languages. I think they have a huge advantage over more traditional languages like C and C++. They also have some drawbacks, among the largest being that there's an awful lot of existing code written in C and C++. It's hard to sell people on moving to something new if they have to leave all their old toys behind.
The standard response to these sort of arguments is that you can easily write an extension that bridges the gap between your old C code and your new Perl or Python, or whatever agile language is hot this week, code. Unfortunately, I've generally found that the APIs for bridging the gap between Perl and C are either cryptic (XS) or fragile (Inline::C). While Python is better in some ways, I still find its C API rather difficult to read. Tools such as SWIG can help alleviate this problem, but you still need to write a bunch of glue code to bridge the gap between the high-level agile languages and the low-level C code.
When I first looked at doing the same kind of thing for Ruby, a whole new
world opened up. The APIs are simple, to the point where I was up and running
in minutes rather than hours. All you need to know to start is in the
README.EXT file in the top level of the Ruby source tree. If you
need help with something that isn't documented there, you can't ask for a
clearer example than the Ruby source code itself. In short, I was just aching
for a test case, some C code I could wrap up in a Ruby extension to prove how
simple it is to make something that's easy to use. For my test case, I chose the
GenX library.
GenX
GenX is a simple C library for generating correct, canonical XML. It verifies that its data is valid UTF-8 and the structure of the XML document being generated is valid, and it forces you to use canonical XML--that may not mean much to you right now, but it can be significant if you need to compare two XML documents to determine their equivalence. Tim Bray wrote GenX and hosts it at GenxStatus. GenX originally attracted me because it provides a way to avoid problems related to invalid XML, which I've encountered in my own work. In addition to its usefulness, it's also perfect for an example of how to embed a C library in Ruby because it's very small, self contained, and has a well-defined API we can wrap up in Ruby without too much trouble.
Justification
At this point it's worth asking whether a Ruby extension is really the best way to make this kind of functionality available.
Using an extension means that users need to install a binary distribution precompiled for their versions of Ruby and their operating systems, or to build it themselves, which means they need access to a C compiler. Additionally, extending Ruby via C has its own set of dangers. If you screw something up in a standard Ruby module, pretty much the worst you can do is cause an exception to be thrown. It's possible for users to recover from this if they're paranoid enough about catching exceptions and structure their code correctly. In a C-based extension, an error can corrupt memory or cause a segmentation fault or any number of other problems from which recovery is difficult, and all of which have the chance to crash the underlying Ruby interpreter.
That said, in this particular case I think providing direct access to the underlying GenX library via a C extension is the way to go. The GenX library is available right now, it works, and it does its job in a very efficient manner. There's no reason to duplicate functionality unnecessarily. Even if I did rewrite this in pure Ruby, all I am likely to accomplish is slowing things down. Plus, GenX is exceptionally self contained; while using the library does require that users either use a precompiled extension or possess a C compiler, it at least doesn't bring in any other third-party requirements. Finally, the GenX API is quite straightforward. It's reasonable to assume that we'll be able to implement this extension without undue risk of crashing our Ruby interpreter due to bugs in our code.