Veering Off the Well-Worn Path

It's time to reconsider how code generators can save you from the doldrums of coding.

If a soiled shirt is placed in the opening of a vessel containing grains of wheat, the reaction of the leaven in the shirt with fumes from the wheat will, after approximately twenty-one days, transform the wheat into mice.
Jean-Baptiste Van Helmont

Oh, would that software were as easy to produce as mice! Well, sometimes it can be. At least, that's a rough summary of Jack Herrington's new book, Code Generation in Action (Manning, 2003, or click here to order). Herrington writes about code generators: custom programs that build source code, based on some sort of input file. Although some developers seem to have a "real men don't use code generators" (so they use lots of cut and paste?) attitude, it's very clear that code generators can be essential when you're trying to get a large project done on time and within budget.

Considering the usefulness of code generators, it's somewhat surprising that more developers don't depend on them. Do we actually like writing the same series of statements to initiate a database connection, read the results of a stored procedure and return it as an object property five hundred times? Are we too busy writing repetitive code to learn new tricks? Are we all secretly afraid of making it clear how easy our jobs are? I don't know the answer, but I do know that this book can provoke thought and perhaps change your working habits. And that's a good thing.

Developer Central Newsletter
Want to read more of Mike's work? Sign up for the monthly Developer Central e-newsletter, including product reviews, links to web content, and more, at

Getting Active With It
Herrington starts with a case study and then identifies two basic kinds of code generators: active and passive. A passive code generator dumps code into your project and forgets about it. Lots of wizards and other IDE tools work this way. By contrast, an active code generator (and all of the code generators in the book fall into this class) takes responsibility for maintaining the code. When you want to change the classes from an active code generator, you tweak the input file or the code generator itself; you never directly edit the output. He goes on to identify six basic types of active code generation:

  1. The code munger takes an input file, parses it, and creates an output file from some built-in or external template.
  2. The inline-code expander takes source code with some special markup and creates production code from it. Embedded-SQL generators, which allow you to drop SQL statements into C or Java (for example) code work this way.
  3. The mixed-code generator is similar to the inline-code expander, except that the results are written right back to the input file. For example, special comments might specify delegate code that needs to be created and added to the file.
  4. The partial-class generator reads some sort of abstract definition file and builds a base class source code file to implement the definition. The user can then create a derived class to get the final desired functionality.
  5. The tier generator builds an entire tier (typically, a data access tier) from an abstract definition. UML products that integrate with your IDE can fall into this calss.
  6. The full-domain language is a Turing-complete programming language created just for your problem. It gives you a general-purpose way to specify code that should be created.

The book contains examples of the first five of these, and some limited discussion of what a full-domain language can look like.

Ruby? What's a Ruby?
Nope, not the precious stone, the programming language. You'll find Ruby at You'll also find it at the heart of this book. All of the examples (and there are many!) are written in Ruby.

You may not have run across it in the past, but Ruby is a general-purpose, object-oriented scripting language with some similarities to Perl or Python. Ruby supports a number of features that are useful for code-generator writers, including built-in support for regular expressions and portable I/O coding constructs. If you've never used Ruby, fear not. The book includes a chapter on setting up Ruby and essential add-ons like an XML parser and a templating package. Herrington also builds a set of classes to parse C, C++, Java, SQL, and PostgreSQL code. And a helpful appendix provides an introduction to Ruby (starting, of course, with the venerable Hello World example).

There's More Than Code in This Stew
So far, I've been talking about code generation as a simple process that puts together some sort of source code that you can feed into a compiler. And indeed, that's one way to look at it. But if you're looking at it that way, you've got blinders on. Through examples and discussion, Herrington shows how to use the same sort of technology to come up with a variety of end products, only some of which fall into the traditional category of "code." These include:

  • Database access layers
  • User interface code, for production or test
  • Documentation (think about the XML comments in C#, which can be turned into HTML help by NDoc)
  • Unit tests (if you're going to generate code, why not generate tests for the code as well?)
  • Web services
  • Business logic layers
  • DLL wrappers for legacy code
  • Firewall configuration files

And that just covers some of the possibilities. What it boils down to is this: If you can describe an output that you'd like to get, you can likely build a code generator to take the description to the actual output. (This leaves aside the question as to whether it's more work to write the output or build the generator; making that decision is one of the topics Herrington tackles). For example, if you're building a product for Windows, it's quite feasible to think of a code generator that produces all or part of the MSI file that will feed the product to the Windows Installer service.

FUD Begone!
For a bunch of folks working on the cutting edge of technology, we developers tend to have a surprising fear of the unknown and the new. One of the most valuable features of this book is the straightforward discussion of common concerns that people new to the world of code generation often display. For example, in the chapter on generating data access layers (which is the heart of the book, though Herrington works up to this gradually through increasingly complex tasks) you'll find answers to all of these issues:

  • The code is going to be out of control.
  • I'm going to be the only one who knows what's going on.
  • Our application semantics aren't well defined yet.
  • This is going to take all the joy out of coding. ("If redundant code writing is what you consider fun, then you may find a generator is not for you.")
  • The database design is too complex to be generated.
  • The generated SQL statements will be rubbish.
  • The up-front development cost is too high.
  • I don't have all the prerequisite skills.
  • The information here is centered around web applications, what about client/server?
  • My application doesn't use a database.

If you work through the book, you'll not only end up with an appreciation of how to build a code generator (and an extensive list of existing ones that you can check out), but also with a road map to selling the concept to the rest of your team and your management. Herrington doesn't recommend misrepresenting code generation, but he does come up with good answers to the most common objections you're likely to run into.

Go Forth and Generate
Itching to get started with code generation and impatient for that copy of the book that you ordered to ship? Help is just an URL away. In addition to writing the book, Herrington also maintains the Code Generation Network Web site ( Here you'll find an extensive database of products (both commercial and free), articles on code generation, and interviews with developers who are heavily involved on this programming front.

The history of software development has some clear trends. One of them is the development of ever more abstract ("higher level") languages in an effort to save time and enable more complex development. Unless you're writing everything in machine language, you're using code generation at some level. When you spot a repetitive task in what you're typing, don't reach for the Windows clipboard—think code generation instead. It's the logical next step.

Got a code generation success story? Or are you lost in a morass of poorly-designed spaghetti code from a bad tool? I'd love to hear about your code generation experiences either way by e-mail to I'll use the most interesting comments in a future issue of Developer Central.


comments powered by Disqus

Office 365 Watch

Sign up for our newsletter.

I agree to this site's Privacy Policy.