Tuesday, January 3, 2012

Referring to types

The basics for referring to types for CLR interop are the same for ClojureCLR as for Clojure on the JVM. I will assume you are familiar with interop as covered in http://clojure.org/java_interop or your favorite Clojure intro.

Standard Clojure allows use of the symbols int, double, float, etc. in type hints to refer to the corresponding primitive types. ClojureCLR allows this and extends this to the numeric types present in the CLR but not in the JVM: uint, ulong, etc. Similarly, the shorthand array references such ints and doubles work, and are joined by uints, ulongs, etc.

The CLR is not C#.
Do not let the presence of int and company for type hinting put you in a C# frame of mind. When specifying generic types, you cannot use C# notation:

System.Collections.Generic.IList<int>

Instead, you must use the actual CLR type name:

System.Collections.Generic.IList`1[System.Int32]

Remember that floatbecomes System.Single; I've had to dope-slap myself on that one a few times.

Clojure uses symbols to refer to types. This works on the JVM because package-qualified class names are lexically compatible with symbols. Not so here. The backquote and square brackets in the type name shown above cannot be part of a symbol name. If you type that string of characters into the REPL, you will get

user=> System.Collections.Generic.IList`1[System.Int32]
CompilerException System.InvalidOperationException:
  Unable to resolve symbol: System.Collections.Generic.IList in this context
   at ...
1
[System.Int32]

The input string is parsed as separate entities:

System.Collections.Generic.IList
`1
[System.Int32]

Not what was intended.

In addition to backquotes and square brackets, a fully-qualified type name can contain an assembly identifier--that involves spaces and commas. In fact, CLR typenames can contain arbitrary characters. Backslashes can escape characters that do have special meaning in the typename syntax (comma, plus, ampersand, asterisk, left and right square bracket, left and right angle bracket, backslash).

To allow symbols to contain arbitrary characters, ClojureCLR extends the reader syntax using "vertical bar quoting". Vertical bars are used in pairs to surround the name or a part of the name of a symbol. Any characters between the vertical bars are taken to be part of the symbol name. For example,

|A(B)|
A|(|B|)|
A|(B)|

all mean the symbol whose name consists of the four characters A, (, B, and ).  I consider only the first one to be readable; quoting the entire name is to be preferred.  To quote the IList example above, you would write

|System.Collections.Generic.IList`1[System.Int32]|

To include a vertical bar in a symbol name that is |-quoted, use a doubled vertical bar.

|This has a vertical bar in the name ... || ...<>@#$@#$#$|

With this mechanism can we make a symbol for a fully-qualified typename, such as:

(|com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434|/DoSomething x y)

or

(reify 
 |AnInterface`2[System.Int32,System.String]| 
 (m1 [x] ...)
 I2
 (m2 [x] ...))


There are a number of things you should note about |-quoting and about generic type references.

First, what |-quoting does is to prevent characters from stopping the token scan. Checks on symbol validity that follow token scanning are still in effect. These include not starting with a digit, containing a non-intial colon, and a few others. When scanning A(B), the left parenthesis stops scanning the token that begins with A.  When scanning |A(B)|, the left and right parentheses do not stop the scan.  However, scanning |ab:| is the same as scanning ab:, a colon being a perfectly fine token constituent.  However, the colon at the end is a no-no, and so the token is rejected and the reader throws an exception.

(I could have taken more radical approach and allowed |ab:|.   One then gets into all kinds of edge cases that I didn't want to solve.  I feel that a more radical quoting approach requires consultation and agreement with the Clojure powers-that-be.)


Second, be careful with namespaces for symbols. Any / appearing |-quoted does not count as a namespace/name separator. If you have special characters in either the namespace name or the symbol name, you must |-quote either one separately. Thus,

(namespace 'ab|cd/ef|gh)    ;=> nil
(name 'ab|cd/ef|gh)         ;=> "abcd/efgh"
 
(namespace 'ab/cd|ef/gh|ij) ;=> "ab"
(name 'ab/cd|ef/gh|ij)      ;=> "cdef/ghij"

Rather than

ab/cd|ef/gh|ij

it would be more readable to write

ab/|cdef/ghij|

Third, you will usually need to fully namespace-qualify generic types and their parameters.  For example,

|System.Collections.Generic.IList`1[System.Int32]|

works as a type reference while

|System.Collections.Generic.IList`1[Int32]|

does not.

Also, aliasing via import is not of much help.  After eval'ing

(import '|System.Collections.Generic.IList`1|)

the symbol |IList`1| will refer in the current namespace to the generic type |System.Collections.Generic.IList`1|, but that is of no help in referring to instantiated IList types. You cannot then refer to

|IList`1[System.Int32]|

Perhaps someday we will introduce a more compositional approach to generic types and symbols that will accommodate this.

Fourth, if you are familiar with |-quoting in Common Lisp, the ClojureCLR mechanism is not as inclusive.   In CL you could include a literal vertical bar in a symbol name with backslash-escaping: abc\|123 has name “abc|123”. CL has  \-escaping for characters in symbol tokens; ClojureCLR does not.


Finally, note that when printing with *print-dup* true, symbols with 'bad' characters will be |-quoted.

2 comments:

  1. David,

    You couldn't have picked a better day to publish this post. It was exactly what I needed to get around passing a Dictionary parameter to a RabbitMQ method. Thanks!

    -Rob

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete