First of all, thanks so much to everyone who’s already become a patron of the project! (More info)
Now, for a few updates:
Dan Doel helped me out with implementing
let rec blocks. Dan and I worked together at Capital IQ on the Ermine language and I’ve been bouncing various insane ideas for Unison off him for a while. He’s a great person to work with, super easy-going, and has much more experience than me reading about and implementing type systems. We got together in the evening last week after work, added
let and sketched out how
let rec would work (the changes to the typechecker are somewhat subtle). I later finished up an initial (buggy) implementation, which Dan was also kind enough to help me debug. Everything looks to be working now, so I’m going to declare victory!
We also both took a look at the recent paper by Dunfield and Krishnaswami showing how to extend their earlier system (which the Unison typechecker is based on) to handle GADTs, pattern matching, and existential types. The new paper looks more complex, but still doable.
The current Unison typechecker, including the
let rec stuff just added, clocks in at 514 lines of code. Implementing the newer paper is probably going to increase that number, though the authors claim it is “only modestly more complex than our earlier paper” and it also would bring Unison’s type system and feature set up to relatively modern functional language standards. I’m hoping to coax Dan to starting tinkering on a GADTs branch while I work on migrating the Unison editor away from Elm. (If you have any words of encouragement for Dan, please do leave a comment!)
In Unison, terms, types, and type declarations are uniquely identified by a “nameless hash”. It’s “nameless” in the sense that what you call your variables doesn’t affect hashing. So given:
id1 x = x id2 y = y
id2 hash to the same thing and thus are considered the same term.
A bit more interesting is that hashing is normalized with respect to linking, so given:
id1 x = x id2 = id1
id2 will have the same hash. That is, we hash the term as if all references have already been linked into the expression.
Aside: We could also convert to eta normal form before hashing. I’m not positive that’s a good idea, but it would be simple to do.
Now, even more interesting is that hashing is normalized with respect to the order of bindings in a let rec block. In a
let rec block, every binding can reference every other binding so the order of these bindings is not significant (except perhaps as documentation). For example:
-- version 1 v1 = let rec ping x = pong (x + 1) pong y = ping (x - 1) in ping 0 -- version 2, notice order is flipped -- and variables are named differently v2 = let rec pong1 p = ping1 (p - 1) ping1 q = pong1 (q + 1) in ping1 0
In Unison, both these expressions have the same hash, even though the names of
pong are different, we use different parameter names, and the order of the bindings is different. We preserve the order of the bindings only for display purposes. The hash is what uniquely defines the term, and all references to terms are by hash, not by name.
Aside: Note that how
let rec are rendered in the editor is a separate issue from the language feature itself. We could render the bindings after the expression and the keyword
where, above, below, in a tooltip on hover, whatever! The language feature is the same.
More generally, the hashing scheme has a nice way of dealing with cycles of elements whose order is not significant (for instance, the order of constructors of a data type). I’ll write up some technical documentation on Unison’s hashing in a later post. There are some interesting implications for data types, because data types are sometimes identified by more than just their structure. (Consider a sorted
Map type, perhaps represented as a binary tree, but with additional invariants established and preserved by the functions which construct and operate on the tree.)
I hope this gives a little bit of a flavor for the Unison philosophy—anything which doesn’t affect the meaning of an entity (be it a type, term, or whatever) should generally not be part of its identity. Instead, extra information belongs as metadata, separate from identity. This greatly simplifies many many aspects of the Unison platform, since we never have to worry about two entities with different meanings colliding in the namespace. Arbitrary data types and functions can thus be transmitted between Unison nodes without worry about whether the remote recipient might have conflicting definitions for some of the dependencies referenced by the data or function sent. Much more on that in future posts.
I’ve decided I’m going to start by looking at GHCJS-based options. Why? Well, the other major contender was PureScript, which has a promising library for doing UI stuff called Halogen. PureScript’s type system is certainly powerful enough for my needs and the language seems really nice, but a non-Haskell language for the frontend isn’t really my first choice. Having an all-Haskell stack means no code duplication between the node and editor, and I can do computations wherever is most convenient or efficient rather than always preferring to run things on the node server just to avoid having to duplicate code in the editor.
So, I’ve done some cleanup and restructuring of the code to prep for experimenting with a GHCJS-based implementation of the Unison editor. I’ve split the Haskell code into two subprojects,
shared/ project has a minimal set of external dependencies. Anything in
shared/ can be referenced by either
node/ or (eventually)
editor/, and as a simple approximation, anything in
shared/ or it’s external dependencies is something that could get compiled to JS. This forces me to be very explicit about adding code or dependencies to the editor, and ensures there’s no unintentional mixing of stages. See the README for more details.
Splitting up the dependencies like this still doesn’t quite go as far as I’d like, though, because it’s pretty difficult to reason about what the dependencies are of external libraries, like stuff in base. For instance, I don’t know if this is still true, but at one point
Integer was/is used for the
Show Double instance (for some reason). It seems insane to compile all of
Integer to JS for whatever silly thing is being done in
Show Double. The user experience I’d prefer is to have to whitelist all the modules (and perhaps functions in these functions) getting compiled to JS, and get a compile error if I make use of something that transitively references anything not on the whitelist. This way I completely understand everything getting compiled to JS, and avoid accidentally doing something silly.
The issue isn’t just code size—I want to know what code is ending up in JS, especially since there’s something of an impedence mismatch between JS and the usual Haskell compilation targets and I may want to audit the result to make sure it’s what I generally expect. This isn’t really a worry when native compiling, since the Haskell runtime is (unlike JS) very much designed to efficiently execute Haskell code.
Anyway, I’d be curious how other people deal with this who have more experience with the GHCJS ecosystem.
One last update: the project also now has unit tests, made using tasty which generates nicely colored, hierarchical test output. Assuming you’ve gone through the instructions in the README the tests can be run from the
node/ directory via
cabal run tests.
For people who are interested in helping write some code for Unison, I’m sorry I’m not very organized on that front yet. But my intention is to try to organize the work into some well-defined sub-projects that interested folks besides me could work on. Please feel free to come say hello in the Gitter chat.
That’s all for now, more updates to follow!comments powered by Disqus