The ML Family
The story of ML: Robin Milner's meta-language for the LCF theorem prover, the birth of Hindley-Milner type inference, and the algebraic-data-type-plus-pattern-matching paradigm that travelled through Standard ML and OCaml into Rust, Guji, and the wider world.
Origins: Milner, LCF, and a Meta Language
How ML was born as the meta-language of a 1970s theorem prover and accidentally became a landmark in programming languages.
Origins: Milner, LCF, and a Meta Language
ML did not begin as a programming language in the usual sense. It began as a tool inside a proof assistant. In 1973 the logician and computer scientist Robin Milner arrived at the University of Edinburgh, where (with research assistants Lockwood Morris and Malcolm Newey) he set out to build LCF - Logic for Computable Functions - a system for doing machine-checked mathematical proofs. The result, Edinburgh LCF, was documented in the 1979 book Edinburgh LCF by Milner, Michael Gordon, and Christopher Wadsworth.
The problem Milner faced is the reason ML exists. In a theorem prover you want users to write proof tactics - programs that build proofs - but you absolutely cannot let those programs forge a theorem that is not actually true. The solution was an abstract type thm (theorem) whose values could only be produced by the system's sound inference rules. To express tactics, Milner needed a small, higher-order, statically typed language to act as the meta-language sitting above the object logic. He called it ML, for Meta Language.
So ML's defining features were not bolted on for elegance; each one solved a concrete need of the LCF project:
- Higher-order functions, because tactics compose other tactics.
- A static type system that guarantees soundness, because
thmmust be unforgeable. - Type inference, because nobody wants to annotate the type of every tactic by hand.
- Type-safe exceptions, because a failing proof search must recover cleanly.
- Polymorphism, because a
mapor a list-reversal should work for any element type, not be rewritten per type.
That combination - first appearing around 1973-1978 and crystallised in Edinburgh LCF (1979) - turned out to be one of the most influential designs in the history of programming languages. ML was, in effect, the first language with an automatically inferred, polymorphic, statically sound type system, and it married that to algebraic data types and pattern matching. Everything else in this track is a consequence of that starting point.
Here is the through-line in one tiny modern descendant. This is Guji, a present-day functional-first language whose enum and match are direct grandchildren of Milner's design:
enum Shape {
Circle($radius: Float)
Rect($width: Float, $height: Float)
}
sub area($s: Shape): Float {
match $s {
Circle($r) { 3.14159 * $r * $r }
Rect($w, $h) { $w * $h }
}
}
sub main(): Int {
print(area(Circle(2.0))) # 12.56636
print(area(Rect(3.0, 4.0))) # 12
0
}
The shape Shape is exactly one of two cases; the only way to read its contents is to match, which the compiler forces you to do completely. That is the ML idea in miniature, and we will trace it forward through Standard ML, OCaml, Rust, and Guji.
A historical note worth holding onto: Milner did not stop at ML. He went on to win the 1991 ACM Turing Award for three contributions - LCF, ML, and the theory of concurrency (CCS, later the pi-calculus). ML was, in a sense, the language he built to do something else, and it outlived its original purpose entirely.
Sources
Hindley-Milner: Types Without Annotations
The type-inference algorithm at the heart of the ML family, where it came from, and why it feels like dynamic typing with a static safety net.
Hindley-Milner: Types Without Annotations
The single most famous technical idea ML gave the world is Hindley-Milner type inference (HM). It lets a compiler reconstruct the type of almost every expression with no annotations at all, while still rejecting every type error at compile time. You write code that looks dynamically typed, and you get the guarantees of a static type system.
Where it came from
The ideas accreted over twenty years:
- In 1958, Haskell Curry and Robert Feys described type inference for the simply-typed lambda calculus.
- In 1969, J. Roger Hindley proved that there is always a most general ("principal") type, and gave an algorithm to find it.
- In 1978, Robin Milner - working independently, for ML - published the algorithm now universally called Algorithm W, and introduced let-polymorphism.
- In 1982, Luis Damas (with Milner) gave the full soundness-and-completeness proof. This is why you will sometimes see the system called Damas-Hindley-Milner or Damas-Milner.
The system is, precisely, the lambda calculus with parametric polymorphism, where polymorphic generalisation happens at let bindings. That last clause - let-polymorphism - is the clever restriction that keeps inference decidable while still letting a single definition be reused at many types.
How it feels in practice
Inference means the most general type is found automatically. Take the identity function. In an ML-family language you would write something like let id x = x, and the compiler infers the type 'a -> 'a: "for any type 'a, takes an 'a and returns an 'a." You never said the word "type."
Guji is statically typed with local-and-complete inference in exactly this spirit - annotations are required only on exported (pub) declarations, and are otherwise optional. Here is a fully type-checked program with no type annotations in the body at all, run on the Guji v0 evaluator:
sub main(): Int {
@nums = [1, 2, 3, 4, 5, 6]
$evens_doubled = @nums.filter({ $_ % 2 == 0 }).map({ $_ * 2 }).sum()
print($evens_doubled) # 24
@names = ["ada", "grace", "alan"]
@greetings = @names.map({ "hi $_" })
print(@greetings.join(", ")) # hi ada, hi grace, hi alan
0
}
The compiler works out that @nums is a List[Int], that the lambda { $_ % 2 == 0 } takes an Int and returns a Bool, that sum() yields an Int, and that @greetings is a List[Str]. Get any of it wrong - say, try to add a Str to an Int - and it is a compile error, not a runtime crash.
The principal-type guarantee, intuitively
The deep property HM gives you is that every well-typed expression has a single most general type, and inference always finds it. There is no guessing and no ambiguity to resolve by hand. A useful contrast: dynamic languages have no such checking. The same pipeline in Python looks similar but is only checked when it runs:
nums = [1, 2, 3, 4, 5, 6]
evens_doubled = sum(n * 2 for n in nums if n % 2 == 0)
print(evens_doubled) # 24, but a typo like n * "2" only blows up at runtime
Both print 24. Only the ML-family version proved, before running, that no type error was possible. That is the trade Milner's 1978 algorithm made permanent: the convenience of inference with the certainty of static types.
The catch HM accepts in return is that it is not fully general - let-polymorphism, not first-class (higher-rank) polymorphism, is what keeps inference decidable. Languages in this family that want more (rank-N types, GADTs) generally re-introduce a few annotations exactly where inference would otherwise become undecidable. That is a recurring theme: the ML family takes inference as far as it provably can go, and asks for help only at the edges.
Sources
The ML Paradigm I: Algebraic Data Types and Pattern Matching
Sum types, product types, and the exhaustive match that ties them together - the signature shape of ML-family programs.
The ML Paradigm I: Algebraic Data Types and Pattern Matching
If type inference is ML's most famous technical idea, then algebraic data types (ADTs) read out via exhaustive pattern matching are its most famous design idea - the one you feel in every line of ML-family code.
Two ways to combine data
An algebraic data type is built from two operations, named by analogy with algebra:
- A product type says a value holds all of these at once. A point has both an
xand ay. In the ML family this is a record or tuple; in Guji it is aclass:
class Account {
has $.owner: Str
has $.balance: Float
}
- A sum type says a value is exactly one of these. A shape is either a circle or a rectangle, never both, never neither. This is the
enum(in Standard ML and OCaml, a variant or datatype):
enum Shape {
Circle($radius: Float)
Rect($width: Float, $height: Float)
}
The power comes from nesting sums and products freely. A binary tree is a sum (Leaf or Node) whose Node is a product of two sub-trees - and it can be generic over the element type, which we will return to in the next lesson:
enum Tree[T] {
Leaf($value: T)
Node($left: Tree[T], $right: Tree[T])
}
Pattern matching: the only way in
Here is the crucial discipline. In the ML family, the only way to get the contents of a sum value back out is to match on it. You cannot ask a Shape "are you a circle?" and then blindly read a .radius that might not exist. You match, and in each arm the relevant fields are bound for you:
sub area($s: Shape): Float {
match $s {
Circle($r) { 3.14159 * $r * $r }
Rect($w, $h) { $w * $h }
}
}
Run on the Guji v0 evaluator, area(Circle(2.0)) gives 12.56636 and area(Rect(3.0, 4.0)) gives 12.
Exhaustiveness: the compiler keeps you honest
The feature that makes this safe rather than merely tidy is exhaustiveness checking. The compiler verifies that your arms cover every possible case. Delete the Rect arm above and an ML-family compiler refuses the program, telling you which case you forgot. This is the original LCF discipline - handle every case or the code does not compile - and it turns a whole category of "I forgot the new variant" bugs into compile errors.
Patterns also nest structurally and can carry guards - an extra boolean condition on an arm:
sub classify($n: Int): Str {
match $n {
0 { "zero" }
$x if $x < 0 { "negative" }
_ { "positive" }
}
}
classify(0) is "zero", classify(-3) is "negative", classify(7) is "positive". The _ wildcard catches "everything else," which is also how you satisfy exhaustiveness for open-ended types like Int.
Why mainstream languages copied this
For decades, the imperative world used null, type tags, and instanceof/switch ladders that the compiler could not verify. ADTs plus exhaustive match replace all of that with a single construct the compiler can verify. The idea was so plainly right that it kept getting re-imported: Rust's enum and match (2015), Scala's sealed trait plus pattern matching, Swift's enum, and even Python 3.10's structural pattern matching (the match/case statement, 2021) are all the same idea wearing different clothes:
# Python 3.10+ borrows the shape, but without compiler-enforced exhaustiveness
match shape:
case Circle(radius=r):
return 3.14159 * r * r
case Rect(width=w, height=h):
return w * h
The Python version runs the same way, but the compiler does not force you to cover every case - that guarantee is exactly what the ML family contributes. Algebraic data types plus exhaustive matching are the load-bearing wall of the entire paradigm.
Sources
The ML Paradigm II: Polymorphism, Functions, and Modules
Parametric polymorphism, first-class functions, and Standard ML's signatures-and-functors module system - write-once code, generic over types and over whole structures.
The ML Paradigm II: Polymorphism, Functions, and Modules
ADTs and pattern matching give the ML family its shape. Three further ingredients give it its reach: parametric polymorphism, first-class functions, and a module system powerful enough to be generic over entire structures.
Parametric polymorphism: one definition, every type
Parametric polymorphism means a single definition works uniformly for any type, because it never inspects the concrete type - it just shuffles values around. Reversing a list, taking its length, or swapping a pair is the same code whether the elements are integers or strings. The type variable ('a in ML and OCaml, T in Guji and Rust) stands for "any type."
sub swap[T](@pair: List[T]): List[T] {
[@pair[1], @pair[0]]
}
sub last[T](@xs: List[T]): Option[T] {
if @xs.is_empty() { None } else { Some(@xs[@xs.count() - 1]) }
}
sub main(): Int {
print(swap([1, 2])) # [2, 1]
print(swap(["a", "b"])) # [b, a]
print(last([10, 20, 30]).unwrap_or(0)) # 30
0
}
One swap, used at List[Int] and List[Str] with no overloading and no casting. This is the same polymorphism Milner formalised in 1978; it is what let-polymorphism (previous lesson) was invented to make inferable.
Functions are values
In the ML family, functions are first-class: you pass them, return them, and store them like any other value. This is what makes map, filter, and reduce possible - they take the operation as an argument. Guji follows the same rule, and its uniform call syntax lets you chain transformations left to right:
@nums.filter({ $_ % 2 == 0 }).map({ $_ * 2 }).sum()
That filter/map/sum pipeline is pure ML-family lineage: build the answer by transforming data through functions rather than mutating state in a loop. ML programs are typically immutable by default - you compute new values instead of changing old ones - and Guji makes that the rule rather than a style choice.
Modules: ML's quiet superpower
The ingredient outsiders most often miss is the module system, contributed chiefly by David MacQueen for Standard ML. Its three pieces:
- A structure is a module: a bundle of types and values that implement something.
- A signature is an interface: a specification a structure must satisfy. (This is the static counterpart of a structure.)
- A functor is a module parameterised by another module - a function from structures to structures, resolved entirely at compile time.
Functors let you write code generic not just over a type but over a whole structure. The classic example is building a Set generic over any ordering: Set.Make(IntOrder) produces an integer set, Set.Make(StringOrder) a string set, from one definition. That is a level of abstraction ordinary generics cannot express, and it is why ML modules are still studied as a high-water mark of language design.
Standard ML pinned all of this down formally. The Definition of Standard ML (Milner, Tofte, Harper - 1990; revised with MacQueen in 1997) is famous for giving a complete, mathematical operational and static semantics for a real language, modules and all. Very few languages have ever been specified that rigorously.
Guji is a small v0 language and does not yet have functors, but it shows the same instinct at the file level: a source file is a module, top-level declarations are private by default, and pub exports them with mandatory type annotations to form a stable interface:
import text::email
pub sub area($s: Shape): Float { ... } # exported, fully annotated
sub helper($x: Int): Int { ... } # module-private
The signatures-and-functors idea is the ambitious end of the same spectrum: control, precisely and statically, what a unit of code exposes and what it depends on.
Sources
OCaml: The ML Family Goes Industrial
How the Caml lineage at Inria turned ML from a proof-assistant tool into a fast, multi-paradigm, production language.
OCaml: The ML Family Goes Industrial
Standard ML gave the family its definition. OCaml gave it a career. Where SML stayed largely in academia, OCaml became the ML dialect that ships trading systems, proof assistants, and compilers - including, famously, the first Rust compiler.
The Caml lineage
OCaml is the current member of the Caml family, developed at Inria (France's national computer-science research institute):
- Caml (1987) - the original, named for the Categorical Abstract Machine it first targeted (the team leaned into the camel mascot anyway).
- Caml Light (1990, Damien Doligez and Xavier Leroy) - a small, portable bytecode implementation with a fast generational garbage collector, compact enough for the teaching machines of the era.
- Caml Special Light (1995) - added a native-code compiler and, crucially, a full module system with functors in the Standard ML tradition.
- Objective Caml 1.00 - announced 9 May 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, and Didier Rémy. It grafted a sound, type-inferred object-oriented layer onto the language - a genuine piece of type-theory research, because objects and Hindley-Milner inference are hard to reconcile.
- In 2011, Objective Caml was simply renamed OCaml.
From ML it inherited the two pillars of the whole family: Hindley-Milner inference and algebraic data types read out by pattern matching. The canonical example is the same area we have seen in Guji, now in OCaml's own syntax:
type shape =
| Circle of float
| Rect of float * float
let area = function
| Circle r -> 3.14159 *. r *. r
| Rect (w, h) -> w *. h
Two things are quietly radical. First, no type annotations appear, yet area is fully statically typed - inference does the bookkeeping. Second, the compiler checks the match for exhaustiveness: drop the Rect arm and you get a warning that a case is unhandled. (OCaml also uses option instead of null, an idea the mainstream took two more decades to adopt.)
Multi-paradigm without losing the core
OCaml's enduring character is that it is functional first but not functional only. On top of the pure ML core it adds, when you need them, mutable records and references, exceptions, arrays, and a sound object system - escape hatches that keep it practical. And it compiles to fast native code with a low-latency garbage collector, plus a bytecode compiler and a REPL (utop) for interactive work.
Milestones that mattered:
- OCaml 4.0 (2012) added GADTs (generalized algebraic data types) and first-class modules, letting modules be passed around as ordinary values.
- Jane Street, a trading firm, writes essentially its entire stack in OCaml and open-sourced the Core standard library and the Dune build system - the single biggest force behind the modern toolchain.
- OCaml 5.0 (December 2022) delivered the Multicore runtime rewrite: real shared-memory parallelism via domains, plus effect handlers - typed, resumable control flow that underpins lightweight concurrency without monad gymnastics. Adding parallelism and algebraic effects to a 25-year-old language while keeping old code working was a rare feat.
OCaml proved the thesis the whole track is building toward: ML's type-driven discipline is not a luxury for proof assistants; it is a practical way to write fast software that does not crash. OCaml powered the Coq/Rocq proof assistant, the Astrée analyser used to verify Airbus flight-control software, MirageOS unikernels, and - the detail that sets up the next lesson - Graydon Hoare's original Rust compiler.
Sources
The Diaspora: Rust and the ML Ideas Everywhere
How Option, Result, exhaustive enums, and inference escaped academia into Rust, Haskell, Scala, Swift, F#, and Elm.
The Diaspora: Rust and the ML Ideas Everywhere
By the 2010s the ML family's best ideas had stopped being "functional-programming features" and started being just good language design. The clearest carrier into the mainstream is Rust, whose enum, match, Option, and Result are ML genes in a systems-programming body.
Rust: ML's toolbox at work
Graydon Hoare's project reached 1.0 on 15 May 2015. Its first compiler was written in OCaml, and the inheritance shows. A Rust enum is a true ML-style sum type - only the syntax wears a C-shaped coat:
enum Shape {
Circle(f64),
Rect(f64, f64),
}
fn area(s: &Shape) -> f64 {
match s {
Shape::Circle(r) => 3.14159 * r * r,
Shape::Rect(w, h) => w * h,
}
}
match is still exhaustive - forget a variant and the program will not compile - and inference, though local rather than whole-program, still spares you most annotations.
Rust's headline contribution, ownership and borrowing, is orthogonal to all this. What matters for our lineage is what Rust did with two ordinary library enums. Option<T> retired the null pointer, and Result<T, E> made fallibility a value you must acknowledge:
fn parse_age(s: &str) -> Result<i64, String> {
let n: i64 = s.parse().map_err(|_| "not a number".to_string())?;
if n >= 0 { Ok(n) } else { Err("age must be non-negative".into()) }
}
That trailing ? is the punchline: it unwraps an Ok or short-circuits the function with the Err, collapsing a chain of manual error checks into one flat line. Both Option and Result are pure ML inheritance - they are the option and result/either sum types of Standard ML and OCaml, now load-bearing in millions of lines of production systems code.
The wider family
Rust is the loudest example, not the only one. ML's influence runs through a whole generation of languages, each taking the algebraic-data-type-plus-inference core and aiming it somewhere new:
- Haskell (1990) - took ML's types and pushed them to purity, type classes, and lazy evaluation. Same
datadeclarations, same exhaustive case analysis. - F# (2005, Microsoft) - essentially OCaml for .NET; discriminated unions, records, and inference, straight from the Caml line.
- Scala -
sealed traithierarchies plus pattern matching bring ADTs to the JVM. - Swift - Apple's
enumwith associated values and exhaustiveswitchis ML pattern matching for app developers. - Elm - a purely functional ML descendant for the browser, famous for "if it compiles, it works," powered by exhaustive
caseand inference. - TypeScript, Kotlin, even Python 3.10+ - discriminated unions and
match/caseare the same idea diffusing into languages that never set out to be functional.
The pattern across all of them is identical: make the compiler check that every case is handled, replace null with an explicit "maybe," and make failure a value rather than a hidden exception. Those three moves are Milner's 1970s discipline, finally mainstream.
Where this leaves us
Three things travelled out of the ML family and changed how everyone programs: algebraic data types, exhaustive pattern matching, and type inference, with Option/Result riding along as the practical payoff. OCaml proved the core was sound and fast; Rust shipped it into systems programming; the diaspora made it ordinary. The last lesson looks at a present-day descendant - Guji - that takes these inherited ideas and points them somewhere the family historically neglected.
Sources
Guji: An ML Descendant Pointed at Text
How Guji inherits the family's ADTs, exhaustive match, inference, and Option/Result, then adds one obvious way and a text-first thesis.
Guji: An ML Descendant Pointed at Text
To close the track, here is the lineage arriving in a language you can run today. Guji is a compiled, statically typed, functional-first language - and it is unmistakably ML family. It takes the inherited apparatus intact and then makes two deliberate choices the older members did not: one obvious way, and text as the signature concern.
What Guji inherits, unchanged
Everything in this track shows up in Guji:
- Algebraic data types -
enumfor sums,classfor products, freely nested and generic. - Exhaustive pattern matching -
matchthat the compiler rejects unless every case is covered. - Type inference - local and complete inside a function body; annotations required only on
pubexports. Option[T]andResult[T, E]as the standard sum types, with a postfix?operator for early-return error propagation - the same railway as Rust and OCaml.- Immutable by default - bindings cannot be reassigned unless marked
mut; methods that "modify" a value return a new one.
The error-handling story reads almost exactly like Rust's, because the good idea needed no improving. Run on the Guji v0 evaluator:
sub parse_age($s: Str): Result[Int, Str] {
$n = parse_int($s)?
if $n >= 0 { Ok($n) } else { Err("age must be non-negative") }
}
sub describe($s: Str): Str {
match parse_age($s) {
Ok($n) { "ok: $n" }
Err($e) { "err: $e" }
}
}
describe("42") yields ok: 42; describe("nope") yields err: invalid integer: nope (the ? carries the parser's Err straight out); describe("-5") yields err: age must be non-negative. Same type-checked guarantees as its ancestors, none of the boilerplate.
Option chaining is the family's flat-map in method-call clothes:
sub half($n: Int): Option[Int] {
if $n % 2 == 0 { Some($n / 2) } else { None }
}
# half(8)=Some(4), half(4)=Some(2), +100 -> 102
half(8).and_then(sub($x) { half($x) }).map(sub($x) { $x + 100 }).unwrap_or(-1)
What Guji sharpens: one obvious way
Where Perl's culture celebrated There's More Than One Way To Do It, Guji inverts the motto: for any given task there is meant to be exactly one idiomatic construct. The algebraic-data-type machinery survives the cut precisely because it earns its keep - there is no second, redundant way to take a sum type apart, only match. Immutability is the default rather than an opt-in. The language deliberately omits overlapping syntax. It is the ML family's discipline applied to the surface of the language, not just its type system.
What Guji adds: text as a first-class concern
Here is the genuinely new aim. In OCaml and Rust, text processing is a library afterthought. In Guji it is the signature primitive. Regular expressions are a built-in Regex type, and the match operator ~~ returns - what else - an Option[Match], so the family's pattern-matching reflex handles parsing too:
match $line ~~ /(?<user>\w+)@(?<host>\w+)/ {
Some($m) { print("user: { $m<user>.unwrap_or('?') }") }
None { print("no match") }
}
Against 'ada@example.com' that prints user: ada. Above regexes sit first-class PEG grammars (grammar, rule, token), whose parse returns a Bush parse tree you walk with - naturally - match. The ML family's whole discipline, turned on the one problem those languages always left to a library.
The through-line, in one sentence
Across roughly fifty years - Milner's LCF meta-language, the Definition of Standard ML, OCaml's industrial run, Rust's systems-programming conquest, and Guji's text-first v0 - the syntax kept changing its clothes, but the idea underneath stayed right the whole time: make the compiler check that you have handled every case, replace null with an explicit maybe, and let the types be inferred. Guji is what that idea looks like when you also insist on one obvious way and point it at text.