From ML to Rust to guji: a lineage of type systems and pattern matching
How algebraic data types, exhaustive pattern matching, and type-directed error handling travelled from OCaml's research roots through Rust's systems pragmatism into guji's text-first, one-obvious-way design.
From ML to Rust to guji: a lineage of type systems and pattern matching
Some ideas are so good that they keep getting reinvented until everyone agrees they were obvious all along. The algebraic data type — a value that is exactly one of several labelled shapes — together with exhaustive pattern matching is one of those ideas. It was born in the ML family, hardened in OCaml, smuggled into systems programming by Rust, and arrives, polished and opinionated, in guji. This is the story of that idea across three languages.
OCaml: the research bloodline
OCaml descends directly from ML, the Meta Language Robin Milner built in the early 1970s for the LCF theorem prover. Its lineage at INRIA runs Caml Light (1990) → Caml Special Light (1995) → Objective Caml 1.00, announced on 9 May 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, and Didier Rémy. From ML it inherited the two pillars that define this whole family: Hindley–Milner type inference, which lets the compiler reconstruct nearly every type without annotations, and algebraic data types read out via pattern matching.
type shape =
| Circle of float
| Rect of float * float
let area = function
| Circle r -> 3.14159 *. r *. r
| Rect (w, h) -> w *. h
Two things are quietly radical here. First, no type annotations appear, yet
area is fully, statically typed: inference does the bookkeeping. Second, the
compiler checks the match for exhaustiveness — drop the Rect arm and you
get a warning that a case is unhandled. The data and the code that takes it apart
are kept honest by the type system. OCaml also leaned hard on option instead of
null and a powerful module system from Standard ML, ideas that took the wider
industry another two decades to adopt.
Rust: the idea goes to work
Rust took the ML family's algebraic toolbox out of the research lab and into
systems programming. Graydon Hoare's project reached its 1.0 release on 15 May
2015, and its enum is a true sum type in the ML tradition — only the syntax
puts on a C-shaped coat:
enum Shape {
Circle(f64),
Rect(f64, f64),
}
fn area(s: &Shape) -> f64 {
match s {
Shape::Circle(r) => 3.14159 * r * r,
Shape::Rect(w, h) => w * h,
}
}
match is still exhaustive — forget a variant and the program will not
compile — and inference, though local rather than whole-program, still spares
you most annotations. Rust's headline contribution, ownership and borrowing,
is orthogonal to all this; what matters for our lineage is what Rust did with two
ordinary library enums. Option<T> retired the null pointer, and Result<T, E>
made fallibility a value you must acknowledge:
fn parse_age(s: &str) -> Result<i64, String> {
let n: i64 = s.parse().map_err(|_| "not a number".to_string())?;
if n >= 0 { Ok(n) } else { Err("age must be non-negative".into()) }
}
That trailing ? is the punchline. It unwraps an Ok or short-circuits the
function with the Err, turning the old chain of manual error checks into a flat,
readable line. Rust proved that ML's type-driven discipline was not a luxury for
proof assistants but a practical way to make fast software that does not crash.
guji: opinion as a feature
guji picks up the torch and adds a thesis: one obvious way. Where Perl revelled in plurality — Larry Wall, defending the slogan as far back as a 1990 Usenet post, admitted, "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something" — guji inverts the motto. For any given task there is meant to be exactly one idiomatic construct. The algebraic-data-type machinery survives the cut intact, because it earns its keep.
guji's enum and match are the family resemblance, sigils and all (bindings
carry $, @, % to declare their shape):
enum Shape {
Circle($radius: Float)
Rect($width: Float, $height: Float)
}
sub area($s: Shape): Float {
match $s {
Circle($r) { 3.14159 * $r * $r }
Rect($w, $h) { $w * $h }
}
}
Run through the v0 evaluator, area(Circle(2.0)) yields 12.56636 and
area(Rect(3.0, 4.0)) yields 12 — and, exactly as in OCaml and Rust, the
compiler rejects a non-exhaustive match, naming the case you missed. Guards
ride along on the same arms:
sub classify($n: Int): Str {
match $n {
0 { "zero" }
$x if $x < 0 { "negative" }
_ { "positive" }
}
}
Error handling reads almost like Rust's, because the good idea needs no
improving. guji has no exceptions; absence and failure are the standard sum types
Option[T] and Result[T, E], and the postfix ? propagates an early return:
sub parse_age($s: Str): Result[Int, Str] {
$n = parse_int($s)?
if $n >= 0 { Ok($n) } else { Err("age must be non-negative") }
}
Feed it "42" and you get ok: 42; feed it "nope" and the ? carries the
parser's Err straight out as err: invalid integer: nope. Same railway, same
type-checked guarantees, only the boilerplate is gone.
What makes guji more than a tidier Rust is where it points the family's tools.
In OCaml and Rust, text processing is a library afterthought; in guji it is the
signature primitive. Regular expressions are a built-in Regex type with a
match operator ~~ that yields — what else — an Option[Match], so the very same
pattern-matching reflex handles parsing:
match $line ~~ /(?<user>\w+)@(?<host>\w+)/ {
Some($m) { print("user: { $m<user>.unwrap_or('?') }") }
None { print("no match") }
}
Run against 'ada@example.com' that prints user: ada. Above regexes sit
first-class PEG grammars (grammar, rule, token), whose parse returns a
Bush parse tree you walk with — naturally — match. The ML family's discipline,
turned on the one problem those languages always left to libraries.
The through-line
Three languages, thirty years, one idea refined at each step. OCaml proved that
algebraic data types plus exhaustive matching plus inference make a coherent,
provably-sound core. Rust shipped that core into systems programming and showed
the world that Option and Result could replace null pointers and exceptions
in production. guji opinionated it — immutable by default, one obvious way, the
whole apparatus aimed squarely at text — and verified, in a v0 tree-walking
evaluator you can run today, that the old ML guarantees still hold when you bend
them toward parsing. The syntax keeps changing its clothes. The idea underneath —
make the compiler check that you have handled every case — has been right the
whole time.