Monday, January 23, 2012

Compiling and loading in ClojureCLR

Wherein I document environment variables and other factors influencing compiling and loading files in ClojureCLR and how ClojureCLR differs from Clojure in this regard.

Compiler variables

During AOT-compilation, the following vars are consulted to control aspects of the compilation process:

Vardoc says
*compile-path*
Specifies the directory where 'compile' will write out .class files. This directory must be in the classpath for 'compile' to work. Defaults to "classes"
*unchecked-math*
While bound to true, compilations of +, -, *, inc, dec and the coercions will be done without overflow checks. Default: false.
*warn-on-reflection*
When set to true, the compiler will emit warnings when reflection is needed to resolve Java method calls or field accesses. Defaults to false.

If you compile by invoking the compile function, such as from a REPL, you will have had a chance to set these vars to appropriate values. However, when compiling from the command line by running Clojure.Compile.exe, you do not have a chance to run Clojure code to initialize these vars. Instead, you can set environment variables to initialize these vars prior to compilation.

The same is true for Clojure. In fact, ClojureCLR and Clojure used the same environment variables for these variables until just recently. Starting with the 1.4.0-alpha5 release (already in the master branch), ClojureCLR has changed the environment variable names to be strict POSIX-compliant. This is due to problems with periods in environment variable names in Cygwin's bash -- see this thread for more information. Here are the names:

Clojure & older ClojureCLRnew in ClojureCLR
clojure.compile.pathCLOJURE_COMPILE_PATH
clojure.compile.unchecked-mathCLOJURE_COMPILE_UNCHECKED_MATH
clojure.compile.warn-on-reflectionCLOJURE_COMPILE_WARN_ON_REFLECTION

BTW, ClojureCLR defaults *compile-path* to ".".  "classes" didn't seem to make sense given that ClojureCLR creates assemblies.

Locating files

For identifying libraries for loading, Clojure relates the symbol naming the library to a Java package name and uses Java's mapping of package name to a classpath-relative path. For example, evaluating (compile 'a.b.c) causes Clojure to look for a file a/b/c.clj relative to some root listed on the classpath.  The result of the compilation will be a set of classfiles, written to classes/a/b/c.

ClojureCLR follows Clojure in mapping dotted symbol names to relative paths.  Not having classpaths, ClojureCLR instead uses the value of the environment variable CLOJURE_LOAD_PATH to supply roots for the file probes. In addition, it will look (first) in the current directory and directory of the entry assembly.

The same holds for load, use, require and other lib-loading functions.

Assembly output

The Clojure compiler outputs (many) class files.  The ClojureCLR compiler outputs (not as many) assemblies.  All classes resulting from (compile 'a.b.c) will go into an assembly named a.b.c.clj.dll located in *compile-path*.

When evaluating (load "a/b/c"),  ClojureCLR will look for both <AppDomain.CurrentDomain.BaseDirectory>\a.b.c.clj.dll and <any_load_path_root>\a\b\c.clj, and load the assembly if it exists and has a timestamp newer than the .clj file (if it exists).  At the moment the same set of roots (as named above) is used for assemblies and source code.  

AppDomain.CurrentDomain.BaseDirectory is used as the root for ClojureCLR assembly probes as that is also the CLR's root for resolving assembly references.  

Too many assemblies

Each file loaded during compilation will go into its own assembly.  I find this terribly inelegant.  The distribution for ClojureCLR itself needs Clojure.Main.exe, Clojure.Compile.exe, and the DLR support assemblies, of course, but also thirty-plus assemblies resulting from compiling the Clojure source that defines the initial environment. The pprint lib alone contributes eight assemblies.  They are not really independent.   Conceivably that code all could go into one assembly.  

I've not been able to think of a way to make this work.  I know that the eight files making up pprint are related.  They get compiled because the main pprint file loads each of them, and loading a file while compiling cause that file to be compiled also.  I could very easily write the compiler to output the code into the same assembly as the parent.  However, pprint could load support code that should not be part of its assembly, that should have its own assembly.  In fact, it does;  pprint loads clojure.walk.  It happens to do this with a :use clause in its ns form, but it doesn't have to.  Without a mechanism in Clojure that allows us to distinguish these uses of load, I'm afraid we're stuck with some inelegance.

10 comments:

  1. So, I think resolving this "inelegance" is a key to making clojure-clr really attractive to the .NET community. I see one workable solution here: http://lispetc.posterous.com/consolidating-clojureclr-assemblies. I can think of a few other potential solutions.

    First of all, is there a reason why multiple Clojure namespaces could not be packaged to the same DLL or even in a DLL containing other .NET code? If there is a standard convention for naming the initializer classes for a namespace, then a global Type.GetType() could be called in RT.load() to look for the namespace in all loaded assemblies. (Possibly all assemblies listed in Assembly.GetReferencedAssemblies() could be eagerly loaded so that the standard .NET mechanism for referencing assemblies can be used to load namespaces).

    Also, RT.load() could look for embedded resources with the extensions .clj and .clj.dll in all loaded assemblies. In order to reduce load times, RT.load() could possibly look for namespaces using the quickest method first (possibly Type.GetType()).

    I think these options would allow for some elegant ways of distributing clojure-clr itself and also .NET assemblies containing clojure code.

    Anyway, let me know what you think. I'd be willing to help with some of this if there is a community consensus as to what to do.

    ReplyDelete
    Replies
    1. Ralph Moritz (the author at lispetc.posterous.com that you cite) has already posted a pull request to the ClojureCLR github with changes along the lines you and he suggest. It's a small change, easy enough to just make yourself and play with. Feedback welcome.

      I plan to get to it 'soon'. I'm smack in the middle of rewriting the whole code gen phase of the ClojureCLR compiler and don't want to get distracted from completing it -- it's a slog.

      Delete
    2. Ok, sounds good. I have been playing with his fork a little. I think I'll make some of the additional changes I'm mentioning and push them to github.

      Delete
    3. I pushed code to github which does everything I've mentioned above except for using Type.GetType() to resolve namespaces (I'm not sure how'd I'd do this or if this is even possible at this point. My fork is here: https://github.com/aaronc/clojure-clr. There's also a little utility that uses Mono.Cecil to embed the files into Clojure.dll. Anyway, when it's a good time, maybe you can take a look at this code too.

      Delete
    4. I look forward to playing with it.

      Delete
    5. Hi David, so Ralph looked at my code and suggested that we use this version instead of the one he proposed. What is the best way to do that - by doing a github pull request or by submitting an issue on JIRA? I know that Clojure, in general, discourages pull requests.

      Also, I added a few other things to my branch that make it play really nice with C# projects. I changed the name of the __Init__ classes so that the .clj.dll's can be ILMerge'd and I actually have a build of Clojure.dll this way. Also, I added a little feature that allows namespaces to be remapped to a different root directory (i.e. the MyCompany.MyProject namespace could map to the MyProject folder where a C# project is living that has the default namespace MyCompany.MyProject). This way, I can reload the .clj files directly from disk when at the repl and then have my app load the embedded .clj's from the DLL in production. Does that make sense?

      Anyway, so far ClojureCLR has been working great for me and when it's a good time, maybe we could chat a little about how I could help get the toolchain up to par. I'd love to see Clojure.dll on nuget soon and am willing to help towards that effort.

      Delete
    6. First question: do you have a CA on file with Rich?

      Delete
    7. I've been meaning to get around to that. I just put it in the mail this morning.

      Delete
    8. That's great. The time to process after receipt has been quite variable. Let me know when you're listed and I'll be able to pull in the code. (In the meantime, I will take a closer look.)

      Delete
    9. So it appears that I am now on the contributor list - didn't receive any notification, but my name is there now. Did you get a chance to look at any of the changes? By now I have made several modifications and also merged them with your nodlr branch. Would you suggest possibly submitting a pull request or creating a JIRA ticket with patches?

      Delete