If you look through the search engine code, you might notice it’s organized differently than the typical microservice-based backend. We have functions we’ve written, and we have nodes, but we don’t have (micro)services per se. There’s actually a shift in perspective here that I think is important, and it’s somewhat subtle.
In the typical microservice-based backend, the decision of how to carve the overall software system up into microservices is an important architectural decision, and it’s often made early on in the development process. Why is that? Because microservices have a network boundary around them, and communicating across that boundary (say, calling another microservice, or the front-end calling into a microservice) frequently involves a lot of (rather boring) special-purpose code which is particular to the microservice. There’s also (rather boring) plumbing code if you’re doing things like sending JSON over HTTP. Of course, there are various RPC frameworks that try to ease the burden (I worked on one) but they don’t go far enough in making it totally seamless to transport arbitrary values (including functions with arbitrary dependencies) around between nodes. And going 80% of the way there is not sufficient because you still end up needing to write special-purpose code at service boundaries.
So in part due to technical constraints of our languages, we are somewhat forced into making these architectural decisions early on. If you get the decomposition ‘wrong’, changing it later is painful, and you lose a lot of the usual guarantees you’d have when doing other refactorings: within the same OS process, if you update a function’s signature, you get compile errors in a static language… but if you modify the signature of a microservice’s API and don’t update all use sites or keep both versions running, you get runtime errors! The internet is full of stories of people starting out with a system decomposition they eventually outgrow and later painfully migrate to a different decomposition. (For instance, think of all the stories of migrating from monolithic Rails app to more microservice-based architectures)
In a single OS process and a single programming language, you also decompose the overall system functionality, not into microservices, but into different parts of code, and hopefully this code loosely coupled, modular, well-encapsulated, etc. That is, we still have ‘boundaries’ between parts of the system, even in a single OS process, it’s just that these boundaries aren’t forced on us by a network call. They are boundaries that we as programmers create, based on what we think will be useful, what we think makes the system easy to understand, evolve, and so on. And because these boundaries are just enforced by the code, rather than any sort of physical communication barrier, the cost of changing how the code is factored within a single OS process is much less! There’s no physical walls to move around, just some code to shuffle around, and a type system to check that you’ve shuffled the code around in a way that still make sense!
Aside: Notice that even in a single OS process, there are other constraints on factoring of the code. Our choices have consequences—depending on how we factor the code, we might support more or less parallel processing, more or less copying of data, etc.
So here is the point: boundaries between bits of code are important and useful. Forcing people to write ad hoc communication code at these boundaries is not. Besides just the cost in terms of LOC and complexity, it imposes an undue cost on moving between factorings, and that makes it harder to stave off technical debt.
This is not a new idea. Let’s take a look. In this post on microkernels, we see the argument that microkernels (which enforce runtime separation) lose to ‘monolithic’ kernels that just use regular compile-time modularity:
Brilliant operating system designers have argued that microkernels can simplify software development because factoring an operating system into chunks that are isolated at runtime allows to make each component simpler. But the interesting constant when you choose between ways to factor your system and compare the resulting complexity is not the number of components, but the overall functionality that the system does or doesn’t provide. Given the desired functionality, run-time isolation vastly increases the programmer-time and run-time complexity of the overall system by introducing context switches and marshalling between chunks of equivalent functionality across the two factorings. Compile-time modularity solves the problem better; given an expressive enough static type system, it can provide much finer-grained robustness than run-time isolation, without any of the run-time or programmer-time cost. And even without such a type system, the simplicity of the design allows for much fewer bugs, whereas the absence of communication barriers allows for higher-performance strategies. Hence HURD being an undebuggable joke whereas Linux is a fast, robust system.
The highlighted sentences are fascinating, and the point goes beyond OS design—we should be wary of introducing “run-time isolation” (or perhaps more generally any sort of hard communication barrier) in an effort to achieve modularity and encapsulation. Programming languages, and especially languages with nice type systems, already give us all the tools we need to encapsulate and make systems modular.
More recently, in the cloud computing world, we’ve seen the rise of microservices. In one sense, it’s a positive development: people now understand that when you are building a multi-node system that scales, you must explicitly conceive of it as a distributed system and architect accordingly. Before that, people wanted to hope that could be avoided. Here’s a brief, biased history of cloud computing:
But, something has been lost. With the rise of microservices, we’re now sticking physical barriers between parts of our code as a way of enforcing modularity. As this post argues, echoing the point made above regarding microkernels, “you don’t need to introduce a network boundary as an excuse to write better code”:
The simple fact of the matter is that microservices, nor any approach for modeling a technical stack, are a requirement for writing cleaner or more maintainable code. It is true that since there are less pieces involved, your ability to write lazy or poorly thought out code decreases, however this is like saying you can solve crime by removing desirable items from store fronts. You haven’t fixed the problem, you’ve simply removed many of your options.
The Unison approach is to eliminate as much as possible any needless friction that might exist when defining multi-node systems or in moving between factorings of these systems, but to also have a very explicit API. Remote effects are tracked, and the programmer has full control over where data is stored and where computations take place. So we don’t pretend we are writing a single-node system, but we also don’t deal with the particulars of communication. We specify that a computation must hop to another node, but don’t get down to the level of sending network requests or parsing/serializing JSON. Any boundaries between ‘services’ are handled using ordinary compile-time encapsulation, in the Unison language.
Though the API is explicit, we can write more declarative code by introducing abstractions. We build these abstractions using ordinary Unison code that can be inspected and tweaked, not via an autoscaling (often proprietary) black box with whatever knobs Amazon or Google decides to expose.
Aside: I think there’s nothing inherently wrong with trying to use some sort of autoscaling black box. These black boxes will always have some usages they fit quite well and for which they’ll be highly productive. The risk with using them is it’s often very difficult to predict, at the outset of a project, whether everything you’ll need to do can fit conveniently inside the box, and migrating out of the box later can be extremely painful. I also somewhat dislike these black boxes because they often end up being more complicated than advertised, and require lots of special purpose knowledge to use effectively. (And often, even if you have an example that does XYZ, it’s completely unclear how to modify your code so it does something slightly different.)
The tech we use today for building distributed systems forces a very rigid structure on us. Your language can’t talk well about provisioning and deployment, so that’s always handled via a totally separate phase, using separate tools. Your language can’t conveniently move data and computations between nodes with zero friction, so you have to work within some pre-established microservice skeleton you decide on in advance. These constraints have huge costs, both in terms of the amount of boilerplate and complexity they foist on us, and in terms of the engineering costs of making changes to our software systems. Without all these constraints, you have a lot of freedom to factor your distributed systems in Unison however you like. And I have lots of questions about how to organize distributed Unison programs.
But the questions I have are ‘the good kind’ of questions: what will I do with all this newfound freedom? And: What will I spend my time building, now that I’m not mired in writing boring plumbing code?comments powered by Disqus