← History

Threads, Goroutines, async, and hatch: how these languages do concurrency

Four languages, four bargains with the same hard problem: how to do many things at once without the things tripping over each other.

GoRustPythonguji

Threads, Goroutines, async, and hatch: how these languages do concurrency

Concurrency is the art of structuring a program as several things that could happen at once; parallelism is actually running them at the same instant. Every language has to strike a bargain between the two — and between programmer convenience and the lurking horror of the data race, where two flows of control touch the same memory and the result depends on luck. Go, Rust, Python, and the in-house language guji each strike that bargain differently. The story is partly technical and partly historical, and it is worth telling in order.

Go: goroutines and channels, descended from CSP

Go's model is the oldest idea here by lineage. It descends directly from Tony Hoare's 1978 paper Communicating Sequential Processes (CSP), filtered through a chain of languages Rob Pike worked on at Bell Labs — Newsqueak, Alef, and Limbo — before he, Robert Griesemer, and Ken Thompson started Go at Google in September 2007. Go was announced in November 2009 and reached its stable 1.0 release in March 2012.

The slogan Pike repeats is "Do not communicate by sharing memory; instead, share memory by communicating." A goroutine is a function launched with the go keyword; it is multiplexed by the runtime onto a small pool of OS threads (an M:N scheduler), so you can cheaply spawn hundreds of thousands of them. Goroutines coordinate over channels, typed conduits that synchronize as they pass values.

jobs := make(chan int)
go func() {
    for n := 1; n <= 3; n++ {
        jobs <- n // send
    }
    close(jobs)
}()
for n := range jobs { // receive until closed
    fmt.Println("got", n)
}

The catch: Go lets you share mutable memory. A channel can carry a pointer, and nothing stops two goroutines from scribbling on what it points to. Go ships a runtime race detector precisely because the language cannot rule races out at compile time. Channels are the idiom, not a guarantee.

Rust: fearless concurrency, enforced by the compiler

Rust answers the same question with a different philosophy: make data races a compile error. Its ownership-and-borrowing system — the same machinery that gives memory safety without a garbage collector — is reused for threads. Two marker traits do the work. Send means a value may be moved to another thread; Sync means it may be shared by reference across threads. The borrow checker refuses to compile code that would let two threads mutate the same data without synchronization. The community calls this fearless concurrency.

use std::sync::{Arc, Mutex};
use std::thread;

let counter = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..4 {
    let c = Arc::clone(&counter);
    handles.push(thread::spawn(move || {
        let mut n = c.lock().unwrap();
        *n += 1;
    }));
}
for h in handles { h.join().unwrap(); }
println!("{}", *counter.lock().unwrap()); // 4

That is OS-thread parallelism. For high-volume I/O, Rust also has async/.await, stabilized in Rust 1.39 on November 7, 2019, after a multi-year effort begun by Aaron Turon and Alex Crichton around zero-cost futures. An async fn returns a lazy Future; you drive it with .await, and an external runtime such as Tokio polls it to completion. Rust deliberately ships no built-in runtime — you choose one — which is powerful but means more decisions up front.

Python: one lock, two answers, and a slowly opening door

Python's concurrency story is shaped by one fact: the Global Interpreter Lock, a mutex in CPython that lets only one thread execute bytecode at a time. Python has had real OS threads since the 1990s via the threading module, but the GIL means they take turns rather than truly running in parallel — fine for I/O-bound work (a thread waiting on the network releases the lock), useless for CPU-bound work.

So Python grew a second answer: cooperative concurrency in a single thread. asyncio arrived as a provisional module in Python 3.4 (March 2014), and the dedicated async/await keywords followed in Python 3.5 (2015) via PEP 492, authored by Yury Selivanov. An await marks a point where a coroutine yields control back to the event loop.

import asyncio

async def fetch(n):
    await asyncio.sleep(0.01)
    return n * n

async def main():
    results = await asyncio.gather(*(fetch(i) for i in range(5)))
    print("squares:", results)  # [0, 1, 4, 9, 16]

asyncio.run(main())

The door is now creaking open on the GIL itself. PEP 703, by Sam Gross, was accepted in October 2023; a free-threaded build shipped experimentally in 3.13 and is officially supported in 3.14, at the cost of a single-digit-percent slowdown on single-threaded code. Python is, very deliberately and very slowly, learning to run threads in parallel.

guji: Go's shape, but immutable values cannot race

guji is a statically-typed, functional-first, compiled language whose signature strength is first-class text processing. Its concurrency model is, by the spec's own admission, "Go's goroutines-and-channels (CSP), with one guji change: every value crossing a channel is immutable, so tasks share data only by communicating — there is no shared mutable state, and therefore no data races."

That one change is the whole point. Go can send a pointer and share mutable memory; guji structurally cannot, because bindings and values are immutable by default. So guji gets Go's ergonomics and Rust's no-data-races guarantee — without Rust's borrow checker — by simply removing the thing that races. You start a task with hatch, which captures only immutable bindings:

$jobs: Chan[Int] = channel()
hatch {
    for $n in [1, 2, 3] { $jobs.send($n) }
    $jobs.close()
}
for $job in $jobs {     # receive until closed and drained
    print("got $job")
}

The folds into guji's existing types are tidy. recv returns Option[T]None means "closed and empty," collapsing Go's value, ok pair into the standard sum type. Where Go panics (sending on a closed channel, double-close), guji returns Err(Closed), because guji has no exceptions. And select waits on several channel operations, choosing an arbitrary ready arm, exactly as Go does:

select {
    $v = $in.recv()  { handle($v) }      # $v : Option[T]
    $out.send($x)    { print("sent") }
    else             { print("idle") }   # optional non-blocking poll
}

There is one honest caveat. guji is at v0, a tree-walking evaluator with no task scheduler yet, so concurrency is specified but not implemented. Run a hatch block through the current binary and it tells you so, by design:

$ echo 'sub main(): Int { hatch { print("hi") } 0 }' | guji
guji: concurrency is reserved for post-v0; hatch, select, Chan[T], and channel
operations are not implemented in v0 (§17, §21)

The keywords are reserved now so the language will not change shape when the scheduler lands. Every non-concurrent snippet here was run through the v0 evaluator; the concurrency ones are from the spec.

The same problem, four bargains

Larry Wall's enduring design maxim — "Easy things should be easy and hard things should be possible" — is a useful yardstick for all four. Each makes a different trade about where the difficulty lives:

There is no free lunch in concurrency. But there is real progress visible across these four: from "share memory, and good luck," to "share memory by communicating," to "you cannot share mutable memory at all." guji's bet is that the last of those, made the default, is the one obvious way.