ClojureCLR: Code gen redo preview

Monday, March 26, 2012

Code gen redo preview

Rewriting the code generation phase of a compiler is not for the faint of heart. Nor, perhaps, for the sound of mind.

I've nearly completed a rewrite of the code gen code of the ClojureCLR compiler. There are still a few things on my punch list (see below), but all the clojure.test-clojure.* tests run now. I hope an intrepid few will give it a spin before I push the changes to master. The new code can be found in the nodlr branch in the github repo.

When I wrote the ClojureCLR compiler, I was interested in seeing what kind of mileage I could get out of the Dynamic Language Runtime, specifically the DLR's expression tree mechanism. The DLR's ETs extended the ETs used in Linq by providing enhanced control flow capabilities. They are central to other dynamic language implementations on the CLR such as IronPython and IronRuby.

The first version of the ClojureCLR compiler mimicked the JVM compiler through its initial phases. The Lisp reader translates source code to Lisp data structures that are parsed to generate abstract syntax trees. The ClojureJVM compiler traverses the ASTs to produce JVM bytecodes. The ClojureCLR compiler instead generates DLR ETs from the ASTs. Those ETs are then asked to generate MSIL code.

I got a lot of mileage out of using the DLR for code generation. I got to avoid some of the hairier aspects of MSIL and the CLR-- things like value types, generic types, nullable types, for example, are handled nicely by ETs. I also found it easier to experiment. However, using ETs had at least two drawbacks. One was that going from ASTs to MSIL through ETs likely nearly doubles the work of MSIL generation. Another was that ETs were restricted to producing static methods only. Working around this restriction introduced several inefficiencies in the resulting code.

The Clojure model for functions maps each defined function to a class. For example, compiling

(defn f
([x] ... )
([x y] ... ))

yields a class named something like user$f__1295 that implements clojure.lang.IFn, with overrides for virtual methods invoke(Object) and invoke(Object,Object). (The actual value to which f would be bound would be an instance of this class.)

Note that the invoke overrides of necessity are instance methods. Recall from above that DLR ETs cannot produce instance methods. Toss in a another little problem referring to unfinished types. Shake and stir. You end up with the following abomination: Where Clojure/JVM generates one class and two methods for the example above, ClojureCLR would have to generate two classes and four methods. An invoke override method is just a passthrough to a DLR-generated static method taking the function object as a first paramenter.

For several years I hoped that the DLR team would get around to looking into class and instance method generation. This now seems unlikely. So I finally decided to rewrite the code generation phase to eliminate most uses of the DLR.

The new code gen code yields significant improvements in compilation time and code size. Compiling the bootstrap clojure core environment is roughly twice as fast. The generated assemblies are about 20% smaller. Startup time (non-NGEN'd) is 11% faster. A few benchmarks I've run show speedups ranging from 4% to 16%. This is in line with my best hopes.

One other benefit: with code generation more closely modeled after the JVM version, future maintainers will need less knowledge of the DLR.

There are drawbacks to this move. The DLR guys know a lot more about about generating MSIL code than I do. Some wonderful goodness with names like Expression.Convert and Expression.Call were my best friends They are (mostly) gone now. And, oh, the beauty of DebugView for ETs for debugging code gen--this will be missed. My new best friends are peverify and .Net Reflector, the caped duo for rooting out bad MSIL. Wonderful in their own way, but I have a sense of loss.

So, where are we? I have a little more work to do before putting this on the master branch. I plan to make one last traversal of the old code looking at all occurrences of my former best friends to make sure I've been consistent in handling the complexities they hid. I also plan to reimplement a 'light compile' variation to be used during evaluation. The current version has it. (What this is and why it matters I leave to another time.) Neither task will take long.

In the meantime, the nodlr branch is ready for a workout by those who care and dare. Have at it.

P.S. The DLR is still being used to provide CLR interop and polymorphic inline caching. Another topic-for-another-day.

19 comments:

wizzardcloudMarch 27, 2012 at 1:55 AM
Come on, Hero!
ReplyDelete
Replies
aaroncApril 13, 2012 at 5:32 PM
Mostly looks pretty good. I really like the fact that functions defined at the repl get a meaningful type name so that the stacktraces are more readable.

I'm getting the following error with one part of my code:
CompilerException System.InvalidOperationException: Can't embed object in code, maybe print-dup not defined: System.Xaml.Schema.XamlMemberInvoker
at clojure.lang.CljCompiler.Ast.ObjExpr.EmitValue(Object value, CljILGen ilg)
at clojure.lang.CljCompiler.Ast.ObjExpr.EmitConstantFieldInits(CljILGen ilg)
at clojure.lang.CljCompiler.Ast.ObjExpr.EmitStaticConstructorBody(CljILGen ilg)
at clojure.lang.CljCompiler.Ast.ObjExpr.DefineStaticConstructor(TypeBuilder fnTB)
at clojure.lang.CljCompiler.Ast.ObjExpr.Compile(Type superType, Type stubType, IPersistentVector interfaces, Boolean onetimeUse, GenContext context)
at clojure.lang.CljCompiler.Ast.FnExpr.Parse(ParserContext pcon, ISeq form, String name)
at clojure.lang.Compiler.AnalyzeSeq(ParserContext pcon, ISeq form, String name), compiling: (NO_SOURCE_PATH:50)

This is in a library I am writing for creating WPF apps in Clojure. With all the appropriate WPF assemblies (PresentationFramework, etc.) loaded and this namespace loaded https://github.com/aaronc/ClojureWpf/blob/master/ClojureWpf/core.clj, entering something like (caml (:StackPanel (:TextBlock))) at the REPL will produce the above stacktrace. The code works fine with the ClojureCLR master branch. Other than that, most things seem to load up just fine. If there is anything I can do to help with debugging this, let me know.
ReplyDelete
Replies
aaroncJune 19, 2012 at 12:32 PM
My vote is to preserve the history. Always better to be able to go back and see what was done.

I'll see if I can look at the latest commit soon - will be going on vacation in a few days so it might be tough. Btw, did you get a chance to look at any of my changes?
ReplyDelete
Replies
David MillerJune 19, 2012 at 2:00 PM
There's some pretty ugly history in there. :) It was about 30 commits before I got the new code gen to work on core.clj. Then there were the tests. But they are mostly marked by WIP on the comment, so the two people who will look at it (you and me, I'm guessing) will know to go whistling past those commits.

My plan after moving the new code gen to master is first to catch up on the last four months of commits on ClojureJVM. This will allow me to get an official 1.4 release out. Your changes are next on the list.
ReplyDelete
Replies
aaroncAugust 23, 2012 at 6:10 PM
Hey Dave, did you ever get a chance to look at my changes? Maybe there's some way we could chat about how to best approach this? I'm wondering if maybe I should create a patch for you rather than trying to merge the branches.
ReplyDelete
Replies
David MillerAugust 25, 2012 at 7:57 AM
Next in the queue.
ReplyDelete
Replies
for ict 99November 5, 2016 at 10:31 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies

Add comment