From ML to Rust to Guji: a lineage of type systems and pattern matching
How algebraic data types, exhaustive pattern matching, and type-directed error handling travelled from OCaml's research roots through Haskell's pure type classes and Rust's systems pragmatism into Guji's text-first, one-obvious-way design.
Some ideas are so good that they keep getting reinvented until everyone agrees they were obvious all along. The algebraic data type - a value that is exactly one of several labelled shapes - together with exhaustive pattern matching is one of those ideas. It was born in the ML family, hardened in OCaml, kept honest by Haskell's purity, smuggled into systems programming by Rust, and arrives, polished and opinionated, in Guji. This is the story of that idea across four languages.
OCaml: the research bloodline
OCaml descends directly from ML, the Meta Language Robin Milner built in the early 1970s for the LCF theorem prover. Its lineage at INRIA runs Caml Light (1990) → Caml Special Light (1995) → Objective Caml 1.00, announced on 9 May 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, and Didier Rémy. From ML it inherited the two pillars that define this whole family: Hindley–Milner type inference, which lets the compiler reconstruct nearly every type without annotations, and algebraic data types read out via pattern matching.
type shape =
| Circle of float
| Rect of float * float
let area = function
| Circle r -> 3.14159 *. r *. r
| Rect (w, h) -> w *. h
Two things are quietly radical here. First, no type annotations appear, yet
area is fully, statically typed: inference does the bookkeeping. Second, the
compiler checks the match for exhaustiveness - drop the Rect arm and you
get a warning that a case is unhandled. The data and the code that takes it apart
are kept honest by the type system. OCaml also leaned hard on option instead of
null and a powerful module system from Standard ML, ideas that took the wider
industry another two decades to adopt.
Haskell: purity and type classes
If OCaml is the ML family's pragmatic engineer, Haskell is its uncompromising
theorist. It began as a 1987 committee effort to unify the scattered lazy
functional languages of the day; the first report appeared in 1990, the
long-lived Haskell 98 standard fixed a stable target, and Haskell 2010 is
the current revision. It shares OCaml's two pillars - Hindley–Milner inference and
algebraic data types - so the Shape example is almost a transliteration, with
each equation pattern-matching its argument directly:
data Shape
= Circle Double
| Rect Double Double
area :: Shape -> Double
area (Circle r) = 3.14159 * r * r
area (Rect w h) = w * h
Compile with -Wincomplete-patterns and dropping the Rect equation is a
warning, exactly as in OCaml. But Haskell adds two ideas the rest of this lineage
quietly inherited. The first is type classes - the disciplined answer to
"how do I write == or show for many types without overloading chaos?" A class
declares an interface; instances supply it per type; the compiler threads the
right implementation in for you. Rust's trait, the derive attribute, and the
whole bounded-generics style are type classes wearing a systems coat, and Guji's
own ad-hoc polymorphism descends from the same idea.
The second is purity. A Haskell function of type String -> Either String Int
cannot secretly read a file or throw - effects live in the type (IO), so a
signature is an honest contract. That makes Maybe and Either not just
convenient but mandatory for ordinary failure, and do-notation gives the same
flat, early-return feel that Rust spells ?:
import Text.Read (readMaybe)
parseAge :: String -> Either String Int
parseAge s = case readMaybe s of
Nothing -> Left "not a number"
Just n | n >= 0 -> Right n
| otherwise -> Left "age must be non-negative"
parseAge "42" is Right 42; parseAge "nope" is Left "not a number". Same
railway, decades before Rust gave it a one-character operator.
Rust: the idea goes to work
Rust took the ML family's algebraic toolbox out of the research lab and into
systems programming. Graydon Hoare's project reached its 1.0 release on 15 May
2015, and its enum is a true sum type in the ML tradition - only the syntax
puts on a C-shaped coat:
enum Shape {
Circle(f64),
Rect(f64, f64),
}
fn area(s: &Shape) -> f64 {
match s {
Shape::Circle(r) => 3.14159 * r * r,
Shape::Rect(w, h) => w * h,
}
}
match is still exhaustive - forget a variant and the program will not
compile - and inference, though local rather than whole-program, still spares
you most annotations. Rust's headline contribution, ownership and borrowing,
is orthogonal to all this; what matters for our lineage is what Rust did with two
ordinary library enums. Option<T> retired the null pointer, and Result<T, E>
made fallibility a value you must acknowledge:
fn parse_age(s: &str) -> Result<i64, String> {
let n: i64 = s.parse().map_err(|_| "not a number".to_string())?;
if n >= 0 { Ok(n) } else { Err("age must be non-negative".into()) }
}
That trailing ? is the punchline. It unwraps an Ok or short-circuits the
function with the Err, turning the old chain of manual error checks into a flat,
readable line. Rust proved that ML's type-driven discipline was not a luxury for
proof assistants but a practical way to make fast software that does not crash.
Guji: opinion as a feature
Guji picks up the torch and adds a thesis: one obvious way. Where Perl revelled in plurality - Larry Wall, defending the slogan as far back as a 1990 Usenet post, admitted, "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something" - Guji inverts the motto. For any given task there is meant to be exactly one idiomatic construct. The algebraic-data-type machinery survives the cut intact, because it earns its keep.
Guji's enum and match are the family resemblance, sigils and all (bindings
carry $, @, % to declare their shape):
enum Shape {
Circle($radius: Float)
Rect($width: Float, $height: Float)
}
sub area($s: Shape): Float {
match $s {
Circle($r) { 3.14159 * $r * $r }
Rect($w, $h) { $w * $h }
}
}
Run through the v0 evaluator, area(Circle(2.0)) yields 12.56636 and
area(Rect(3.0, 4.0)) yields 12 - and, exactly as in OCaml and Rust, the
compiler rejects a non-exhaustive match, naming the case you missed. Guards
ride along on the same arms:
sub classify($n: Int): Str {
match $n {
0 { "zero" }
$x if $x < 0 { "negative" }
_ { "positive" }
}
}
Error handling reads almost like Rust's, because the good idea needs no
improving. Guji has no exceptions; absence and failure are the standard sum types
Option[T] and Result[T, E], and the postfix ? propagates an early return:
sub parse_age($s: Str): Result[Int, Str] {
$n = parse_int($s)?
if $n >= 0 { Ok($n) } else { Err("age must be non-negative") }
}
Feed it "42" and you get ok: 42; feed it "nope" and the ? carries the
parser's Err straight out as err: invalid integer: nope. Same railway, same
type-checked guarantees, only the boilerplate is gone.
What makes Guji more than a tidier Rust is where it points the family's tools.
In OCaml and Rust, text processing is a library afterthought; in Guji it is the
signature primitive. Regular expressions are a built-in Regex type with a
match operator ~~ that yields - what else - an Option[Match], so the very same
pattern-matching reflex handles parsing:
match $line ~~ /(?<user>\w+)@(?<host>\w+)/ {
Some($m) { print("user: { $m<user>.unwrap_or('?') }") }
None { print("no match") }
}
Run against 'ada@example.com' that prints user: ada. Above regexes sit
first-class PEG grammars (grammar, rule, token), whose parse returns a
Bush parse tree you walk with - naturally - match. The ML family's discipline,
turned on the one problem those languages always left to libraries.
The through-line
Four languages, thirty-odd years, one idea refined at each step. OCaml proved
that algebraic data types plus exhaustive matching plus inference make a coherent,
provably-sound core. Haskell purified it - pushing effects into the type and
adding type classes for principled polymorphism, the seed of Rust's traits. Rust
shipped that core into systems programming and showed the world that Option
and Result could replace null pointers and exceptions in production. Guji
opinionated it - immutable by default, one obvious way, the
whole apparatus aimed squarely at text - and verified, in a v0 tree-walking
evaluator you can run today, that the old ML guarantees still hold when you bend
them toward parsing. The syntax keeps changing its clothes. The idea underneath -
make the compiler check that you have handled every case - has been right the
whole time.