← History

From ML to Rust to Guji: a lineage of type systems and pattern matching

How algebraic data types, exhaustive pattern matching, and type-directed error handling travelled from OCaml's research roots through Haskell's pure type classes and Rust's systems pragmatism into Guji's text-first, one-obvious-way design.

OCamlHaskellRustGuji

Some ideas are so good that they keep getting reinvented until everyone agrees they were obvious all along. The algebraic data type - a value that is exactly one of several labelled shapes - together with exhaustive pattern matching is one of those ideas. It was born in the ML family, hardened in OCaml, kept honest by Haskell's purity, smuggled into systems programming by Rust, and arrives, polished and opinionated, in Guji. This is the story of that idea across four languages.

OCaml: the research bloodline

OCaml descends directly from ML, the Meta Language Robin Milner built in the early 1970s for the LCF theorem prover. Its lineage at INRIA runs Caml Light (1990) → Caml Special Light (1995) → Objective Caml 1.00, announced on 9 May 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, and Didier Rémy. From ML it inherited the two pillars that define this whole family: Hindley–Milner type inference, which lets the compiler reconstruct nearly every type without annotations, and algebraic data types read out via pattern matching.

type shape =
  | Circle of float
  | Rect   of float * float

let area = function
  | Circle r      -> 3.14159 *. r *. r
  | Rect (w, h)   -> w *. h

Two things are quietly radical here. First, no type annotations appear, yet area is fully, statically typed: inference does the bookkeeping. Second, the compiler checks the match for exhaustiveness - drop the Rect arm and you get a warning that a case is unhandled. The data and the code that takes it apart are kept honest by the type system. OCaml also leaned hard on option instead of null and a powerful module system from Standard ML, ideas that took the wider industry another two decades to adopt.

Haskell: purity and type classes

If OCaml is the ML family's pragmatic engineer, Haskell is its uncompromising theorist. It began as a 1987 committee effort to unify the scattered lazy functional languages of the day; the first report appeared in 1990, the long-lived Haskell 98 standard fixed a stable target, and Haskell 2010 is the current revision. It shares OCaml's two pillars - Hindley–Milner inference and algebraic data types - so the Shape example is almost a transliteration, with each equation pattern-matching its argument directly:

data Shape
  = Circle Double
  | Rect Double Double

area :: Shape -> Double
area (Circle r)   = 3.14159 * r * r
area (Rect w h)   = w * h

Compile with -Wincomplete-patterns and dropping the Rect equation is a warning, exactly as in OCaml. But Haskell adds two ideas the rest of this lineage quietly inherited. The first is type classes - the disciplined answer to "how do I write == or show for many types without overloading chaos?" A class declares an interface; instances supply it per type; the compiler threads the right implementation in for you. Rust's trait, the derive attribute, and the whole bounded-generics style are type classes wearing a systems coat, and Guji's own ad-hoc polymorphism descends from the same idea.

The second is purity. A Haskell function of type String -> Either String Int cannot secretly read a file or throw - effects live in the type (IO), so a signature is an honest contract. That makes Maybe and Either not just convenient but mandatory for ordinary failure, and do-notation gives the same flat, early-return feel that Rust spells ?:

import Text.Read (readMaybe)

parseAge :: String -> Either String Int
parseAge s = case readMaybe s of
  Nothing            -> Left "not a number"
  Just n | n >= 0    -> Right n
         | otherwise -> Left "age must be non-negative"

parseAge "42" is Right 42; parseAge "nope" is Left "not a number". Same railway, decades before Rust gave it a one-character operator.

Rust: the idea goes to work

Rust took the ML family's algebraic toolbox out of the research lab and into systems programming. Graydon Hoare's project reached its 1.0 release on 15 May 2015, and its enum is a true sum type in the ML tradition - only the syntax puts on a C-shaped coat:

enum Shape {
    Circle(f64),
    Rect(f64, f64),
}

fn area(s: &Shape) -> f64 {
    match s {
        Shape::Circle(r)    => 3.14159 * r * r,
        Shape::Rect(w, h)   => w * h,
    }
}

match is still exhaustive - forget a variant and the program will not compile - and inference, though local rather than whole-program, still spares you most annotations. Rust's headline contribution, ownership and borrowing, is orthogonal to all this; what matters for our lineage is what Rust did with two ordinary library enums. Option<T> retired the null pointer, and Result<T, E> made fallibility a value you must acknowledge:

fn parse_age(s: &str) -> Result<i64, String> {
    let n: i64 = s.parse().map_err(|_| "not a number".to_string())?;
    if n >= 0 { Ok(n) } else { Err("age must be non-negative".into()) }
}

That trailing ? is the punchline. It unwraps an Ok or short-circuits the function with the Err, turning the old chain of manual error checks into a flat, readable line. Rust proved that ML's type-driven discipline was not a luxury for proof assistants but a practical way to make fast software that does not crash.

Guji: opinion as a feature

Guji picks up the torch and adds a thesis: one obvious way. Where Perl revelled in plurality - Larry Wall, defending the slogan as far back as a 1990 Usenet post, admitted, "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something" - Guji inverts the motto. For any given task there is meant to be exactly one idiomatic construct. The algebraic-data-type machinery survives the cut intact, because it earns its keep.

Guji's enum and match are the family resemblance, sigils and all (bindings carry $, @, % to declare their shape):

enum Shape {
    Circle($radius: Float)
    Rect($width: Float, $height: Float)
}

sub area($s: Shape): Float {
    match $s {
        Circle($r)   { 3.14159 * $r * $r }
        Rect($w, $h) { $w * $h }
    }
}

Run through the v0 evaluator, area(Circle(2.0)) yields 12.56636 and area(Rect(3.0, 4.0)) yields 12 - and, exactly as in OCaml and Rust, the compiler rejects a non-exhaustive match, naming the case you missed. Guards ride along on the same arms:

sub classify($n: Int): Str {
    match $n {
        0            { "zero" }
        $x if $x < 0 { "negative" }
        _            { "positive" }
    }
}

Error handling reads almost like Rust's, because the good idea needs no improving. Guji has no exceptions; absence and failure are the standard sum types Option[T] and Result[T, E], and the postfix ? propagates an early return:

sub parse_age($s: Str): Result[Int, Str] {
    $n = parse_int($s)?
    if $n >= 0 { Ok($n) } else { Err("age must be non-negative") }
}

Feed it "42" and you get ok: 42; feed it "nope" and the ? carries the parser's Err straight out as err: invalid integer: nope. Same railway, same type-checked guarantees, only the boilerplate is gone.

What makes Guji more than a tidier Rust is where it points the family's tools. In OCaml and Rust, text processing is a library afterthought; in Guji it is the signature primitive. Regular expressions are a built-in Regex type with a match operator ~~ that yields - what else - an Option[Match], so the very same pattern-matching reflex handles parsing:

match $line ~~ /(?<user>\w+)@(?<host>\w+)/ {
    Some($m) { print("user: { $m<user>.unwrap_or('?') }") }
    None     { print("no match") }
}

Run against 'ada@example.com' that prints user: ada. Above regexes sit first-class PEG grammars (grammar, rule, token), whose parse returns a Bush parse tree you walk with - naturally - match. The ML family's discipline, turned on the one problem those languages always left to libraries.

The through-line

Four languages, thirty-odd years, one idea refined at each step. OCaml proved that algebraic data types plus exhaustive matching plus inference make a coherent, provably-sound core. Haskell purified it - pushing effects into the type and adding type classes for principled polymorphism, the seed of Rust's traits. Rust shipped that core into systems programming and showed the world that Option and Result could replace null pointers and exceptions in production. Guji opinionated it - immutable by default, one obvious way, the whole apparatus aimed squarely at text - and verified, in a v0 tree-walking evaluator you can run today, that the old ML guarantees still hold when you bend them toward parsing. The syntax keeps changing its clothes. The idea underneath - make the compiler check that you have handled every case - has been right the whole time.