next-generation programming platform, currently in development
about
help fund the project
swag

Twitter . GitHub . RSS

Designing a fluent interface to a semantic editor



I’m at the point where I have all the machinery needed to start adding the ability to edit as well as view Unison expressions in the editor. However, rather than diving right into implementation (which I did before with the Elm-based version), I decided to take a step back and try to figure out a better story for fluent semantic editing.

Textual editing gives a feeling of directness and control. There’s basically a one-to-one correspondence between the editing inputs (type the ‘a’ key), and the default rendering of the model in response to those actions (the letter ‘a’ appearing in the editor). Like drawing a picture, painting, or sculpting clay, the user feels like they are creating via direct manipulation, without needing any deep training to get there. This is a good feeling to strive for even if raw textual editing isn’t the medium used.

On the other end of the spectrum we have the creation via external controls, where one manipulates various knobs, switches, sliders, etc, whose effects on the model are nonobvious and require extensive training. Think of various software synthesizers, which are rather uncreatively modeled after real-life synths, down to the detailed 3D-ish rendering of individuals knobs and controls. Just look at it, there are even fake screws in the panels!!

But the situation is actually more nuanced. A better way to look at this is that all ways of specifying information necessarily deal in symbols and encodings. According to this view, there’s no such thing as direct manipulation, it’s just that some symbolic encodings are close enough to an encoding we already know that there’s less of a learning curve! For example:

So the question isn’t “should we use arbitrary encodings”, it’s “which arbitrary encoding can efficiently encode the interactions we want, and as a bonus is similar enough to some other known encoding that it’s easier to learn”. Let’s look at a couple examples:

Optimizing purely for speed of short term learning isn’t a great goal. All else being equal, make things easier to learn, but long-term productivity is really important, especially for a tool a user is likely to invest a lot of time in.

Aside: I dispute that boxes-and-arrows style interfaces are a good way to make programming more learnable. These sorts of interfaces solve a couple problems pretty well—they eliminate a large class of cryptic compile and/or runtime errors by having a more constrained UI. That’s a good thing. But they end up creating other much worse problems. It’s better to solve the problem of cryptic compile and/or runtime errors more directly, keeping a more-or-less textual entry mode and just constraining the UI to prevent the user from specifying ill-typed or ill-formed programs.

Fluent semantic editing

I’m going to start just by listing off a set of primitive editing actions. Then I’ll explore how to expose these actions to the user in a way that feels fluent and is easy to learn. You might want to watch this old video to have some idea of what these actions could look like in the editor.

This action set is sufficient, now how to make it feel fluent? Here are some thoughts:

There might be a lot of refactoring actions. We could just assign some arbitrary keyboard combo to every action. But that’s a pretty steep learning curve. I’d give keyboard shortcuts to a couple important ones (like eval and introduce binding), and just use search for the rest. For instance, the ‘float let binding out one level’ refactoring action might not need a separate key-binding, just the ability to search for actions, or it can be triggered via a natural gesture like a click and drag while the binding is selected. Likewise for things like renaming—just editing the node where the variable is bound (like the x in the x -> ... lambda) should automatically rename.

This ability to search for actions seems to raise the need for a third mode. Or does it? Proliferation of modes is complicated and makes the UI harder to learn. Better to just co-opt the explorer for this. When the explorer comes up, applicable replacement expressions as well as actions will be shown, and filtering will search for both.

open, close, cancel and filter already feel pretty nice. Navigate around with the arrow keys or home keys, press enter to open the explorer, filter down results, enter again to accept. What isn’t so nice is all the jumping back and forth between modes. For instance, to write the expression f x (y + 1), we have to write f, select f _ _ in the explorer, then move right, then enter to open the explorer to fill in the first blank, fill in x, hit enter, move right, hit enter again… that’s a lot of mode-shifting and pressing enter. Another efficiency problem is that when the explorer is open, the user has a text box active, so hjkl-style navigation has to be disabled or accessed via a modifier of some sort. Jumping back and forth between the arrow keys and the home row for typing names of identifiers is a bit burdensome.

Switching modes is fine if you’re making isolated edits here and there, but when you’re entering in a large composite expression it gets a bit tedious. This leads to an idea: when the explorer is open, there should be an action for accept and advance, which accepts the current selection, navigates to the ‘next’ location, and reopens the explorer, all in one motion. In the old editor, I had a version of this triggered by simply typing two spaces. When it worked, it was pretty nice—you’d type enough letters of the identifier to make it the only selection (or you could keep typing out the full identifier if you wanted), then hit space twice and continue typing. The problem was that the ‘next’ location might not be the location you wanted to be, and now you had an explorer popped up in the wrong location that you needed to first close before you could navigate anywhere. Not very fluent.

Let’s go back to thinking about text for a minute. With text, you can both edit and navigate within the same mode. You type an identifier name, move a few characters over, enter a number, a comma, move back a few characters, etc. Is this important? Maybe not—in Vim, I almost always Ctrl+C (with caps lock mapped to control) to exit insert mode before navigating, even if I’m moving over just a couple characters. The mode shift is so fast that it almost doesn’t feel like anything, and many of the movements are using right-hand keys so a move left or right is a quick right-then-left tap tap. It’s all home row or close to the home row.

In contrast, pressing enter feels like a huge interruption. It’s just one key-width further, but you have to hit it with your pinky and bring your fingers off from their normal typing position.

Something else that’s annoying about the mode switching is that the explorer keeps popping open and closing. Even if the mode switches are fast (perhaps ‘;’ is mapped to the ‘cancel’ action in the explorer—as long as ‘;’ can’t be part of an identifier), visually, it’s kind of distracting to have an explorer pop up in the wrong spot with a bunch of information you don’t care about, only to be instantly dismissed.

This gives me several ideas:

Teasing this apart a bit further:

Let’s recap the full set of editor actions, which are simple enough to explain and demonstrate to just about anyone in 5-10 minutes:

Crucially, if you have a fully-formed, well-typed program in your head and need no feedback from the compiler, there is never any point at which you are forced to parse any information in the explorer. Anything you type has a predictable-in-advance effect, and you can mentally pipeline the work you have to do, much like ordinary text editing.

Sounds promising, but there’s only one way to know how it will feel, and that’s to implement it!

comments powered by Disqus