Learn Guji

Guji is an in-house, compiled, statically-typed, functional-first language whose signature capability is first-class text processing: regular-expression literals and PEG grammars are part of the language, not a library. Every binding wears an invariant Perl/Raku-style sigil ($ scalar, @ list, % map), bindings are immutable by default, types are inferred, and a program compiles ahead-of-time to a single self-contained native binary. As of v0.1-alpha the toolchain ships both a reference tree-walking interpreter and a native AOT compiler, kept byte-identical by a differential test gate; platform IO (stdin/stdout/stderr, files, CLI args), CSP concurrency (hatch, channels), and generics are all implemented. This track teaches Guji from the ground up - toolchain, syntax and types, functions and uniform call syntax, collections, Option/Result error handling, platform IO, the regex-and-grammar text engine that is its reason to exist, and the real concurrency model. Every code sample here was run against the v0.1-alpha reference toolchain.

Setup and the Toolchain

Build the Guji toolchain, interpret a file, and ship a native binary.

Guji is an in-house language: there is no package registry to install from and no public download. You build the toolchain from the reference implementation (written in Go) that lives in the quest repository, then point it at a .guji source file. As of v0.1-alpha the toolchain is two engines in one binary: a tree-walking interpreter and a native ahead-of-time compiler, kept byte-identical by a differential test gate.

Building the toolchain

The repository contains a Go program; build it once to produce a single guji executable - the whole toolchain (lexer, parser, tree-walking interpreter, type checker, and a native code generator):

go build -o guji ./cmd/guji

Running a program (interpret)

A Guji source file uses the .guji extension and is UTF-8. One file is one module. Put this in hello.guji:

sub main() {
    $name = "world"
    print("hello, $name")
}

Run it by passing the file to the toolchain - this interprets it directly:

guji hello.guji
# hello, world

Execution begins at sub main(). The minimal main returns Unit and needs no trailing value. If you want to set the process exit code, write sub main(): Int { ...; N } where the final Int is the exit code; otherwise a clean run exits 0. print is a prelude builtin (always in scope) that writes a value's display form followed by a newline. Functions need no return-type annotation - the type is inferred from the body.

Compiling to a native binary

Guji is also ahead-of-time compiled. The build subcommand lowers your program to a self-contained native executable:

guji build -o hello hello.guji
./hello
# hello, world
echo "exit: $?"   # exit: 0

The output is one native binary with no separate runtime to install - the same deployment story as Go. The interpreter and the native compiler are kept in lockstep: the interpreter is the reference oracle, and a fleet of fixtures asserts that each native binary reproduces the interpreter's exact stdout and exit code. So guji file.guji (interpret) and guji build -o out file.guji && ./out (native) are two ways to run the same program, and they agree.

A first taste of the language

The smallest interesting program already shows the three pillars - sigils, immutability, and method chaining:

sub main() {
    @nums = [1, 2, 3, 4, 5]
    @even = @nums.filter({ $_ % 2 == 0 }).map({ $_ * 2 })
    print("sum: { @even.sum() }")
}

Running it prints sum: 12. @nums is a list binding (note the @ sigil), { $_ % 2 == 0 } is a topic lambda whose implicit parameter is $_, and the chained .filter(...).map(...).sum() reads left to right because every function in Guji can be called method-style (covered in the functions lesson).

Where the docs live

Guji's authoritative reference is the in-repo specification, guji-spec.md. The spec header still reads "v0", but the implementation is ahead of it - when the two disagree, the compiler wins, so verify any snippet by writing a .guji file and running it. The repository README.md is a practical quick-start and language tour, and design rationale for the text engine lives in an RFC under docs/. Because Guji is not public, those files - not a website - are the documentation you cite and follow.

Reference: Guji v0.1-alpha specification, guji-spec.md §1 (Overview) and §18 (Compilation Model); repository README.md (Quick start).

Syntax, Types, and Sigils

Sigils, immutable bindings, literals, inferred static types, and interpolation.

Guji's surface syntax is brace-delimited and newline-separated, with one idea borrowed visibly from Perl and Raku: every binding wears a sigil that declares its broad shape and is part of the name forever.

Sigils

There are exactly three:

Sigil	Shape	Example
`$`	scalar (any single value)	`$count = 3`
`@`	list	`@items = [1, 2, 3]`
`%`	map	`%ages = {"ada": 30}`

A "scalar" is anything that is not a list or a map: an Int, Float, Str, Bool, a class instance, an enum value, an Option, a function value, and so on. The sigil is invariant - it never changes with how you access the value, and the sigil you bind with must match the value's shape. A List-producing call must be bound with @:

sub main() {
    @words = "one two three".split(" ")   # split returns a List, so @
    $n = @words.count()                    # count returns an Int, so $
    print("$n words")
}

Binding a list result to a $ name is an error. Accessing an element keeps the container's sigil: @items[0] and %ages{"ada"} still read with @/%.

Immutable by default

A binding is introduced by writing a sigil-name and =. It is immutable unless you opt in with mut:

sub main() {
    $x = 5            # immutable: reassigning $x is a compile-time error
    mut $total = 0
    $total = $total + 1   # allowed, $total is mut
    print("$x $total")
}

Mutability is a property of the binding, not the value. Re-introducing the same name in a nested scope (shadowing) is allowed; reassigning an immutable binding is not.

Literals

$i = 1_000_000   # Int (64-bit); also 0xFF, 0b1010, underscores as separators
$f = 6.022e23    # Float (64-bit, IEEE-754)
$s = "text"      # Str (interpolating)
$lit = 'text'    # Str (literal, no interpolation)
$b = true        # Bool
@xs = [1, 2, 3]  # List
%m = {"a": 1}    # Map
$u = ()          # Unit

Static types, mostly inferred

Every binding and expression has a type known at compile time, but you rarely write it - inference is local and complete within a function body, including function return types. When you do annotate, the form is name: Type:

$count: Int = 0
@names: List[Str] = []

Built-in types are Int, Float, Str, Bool, Unit, the compounds List[T] and Map[K, V], the standard sum types Option[T] and Result[T, E], the text types Regex/Match/Bush, the IO type Handle, the channel type Chan[T], and Never (the type of an expression that never returns, like panic or exit). The one place annotations are mandatory is on exported pub declarations, which must spell out their parameter and return types to form a stable module interface.

String interpolation

Double-quoted strings interpolate; single-quoted strings are literal. Inside "...", a sigil-name interpolates that binding and { expr } interpolates an arbitrary expression:

sub main() {
    $name = "ada"
    @scores = [10, 20]
    $line = "hi $name, scores: @scores, total: { @scores.sum() }"
    print($line)   # hi ada, scores: [10, 20], total: 30
}

Two footguns, both consequences of the sigils being live inside strings:

A literal @ in a double-quoted string starts list interpolation. Write an email as 'ada@example.com' (single-quoted) or escape it, or @example is read as a binding.
Do not nest a double-quoted string inside a { ... } interpolation: in "{ @xs.join("-") }" the inner " closes the outer string and the parse fails. Bind the inner string to a variable first: $sep = "-"; $j = @xs.join($sep).

Statements are separated by newlines; a ; is an accepted but optional separator, and newlines inside brackets are just whitespace, so expressions can span lines.

Reference: Guji v0.1-alpha specification, guji-spec.md §2 (Lexical Structure), §3 (Types), §4 (Bindings and Mutability), §12 (Strings and Interpolation).

Functions and Uniform Call Syntax

Declare subs, chain with data-first method calls, write lambdas, and reach for generics.

Functions in Guji are introduced with sub, are first-class values, and can all be called two equivalent ways - the rule that makes Guji's left-to-right method chaining work.

Declaring a function

There are two forms. A block form for several statements, and an expression form (= expr) for a single expression:

sub add($a: Int, $b: Int): Int {
    $a + $b
}

sub triple($x: Int): Int = $x * 3

The value of a block is its final expression (no return needed; return exists for early exit). Parameter and return annotations are optional except on pub (exported) declarations, which must be fully annotated - everywhere else the types are inferred.

Uniform call syntax (the data-first rule)

Any function can be called in the ordinary form or method-style, and the two are exactly equivalent:

add($a, $b)     # ordinary call
$a.add($b)      # method-style - identical to add($a, $b)

The receiver becomes the first argument. This "first argument is the receiver" convention is what every standard-library function follows, which is why method-style calls and chaining work uniformly. There is no separate pipeline operator - . is the single composition mechanism:

sub add($a: Int, $b: Int): Int { $a + $b }

sub main() {
    @nums = [1, 2, 3, 4, 5, 6]
    $r = @nums.filter({ $_ % 2 == 0 }).map({ $_ * 10 }).sum()
    print("result = $r")   # result = 120
    $u = 7.add(10)         # add(7, 10) - the receiver is the first argument
    print("uniform = $u")  # uniform = 17
}

Both lines run and print as shown.

Lambdas

Anonymous functions come in two shapes:

A topic block { ... } is a single-parameter lambda whose parameter is the implicit topic $_. It is only valid where the $_ is in scope.
A parameterized lambda sub(params) { ... } is an anonymous sub naming one or more parameters.

sub main() {
    @nums = [1, 2, 3, 4, 5, 6]
    @doubled = @nums.map({ $_ * 2 })                        # implicit single parameter
    $sum = @nums.reduce(0, sub($acc, $x) { $acc + $x })     # explicit parameters
    print("sum = $sum")   # sum = 21
}

Use the parameterized form when you need to name parameters or take more than one (a topic block always takes exactly one).

Associated functions and the `sub` family

A Pascal-named type (a class, enum, grammar, or a built-in like Regex) can expose associated functions, called on the type itself with no implicit receiver:

$a = Account.opened("ada")   # named alternative constructor (see the types lesson)
$re = Regex.compile($pattern)

The single keyword sub therefore spans every function shape, distinguished by name presence and by whether the first parameter is $self:

Form	What it is
`sub area($s) { ... }`	top-level function
`sub deposit($self, ...) { ... }`	method (data-first `.call`)
`sub opened($owner) { ... }`	associated function (`Type.call`)
`sub($a, $b) { ... }`	lambda (a function value)

Generics

Functions, classes, and enums can be generic over a type parameter written in brackets. A generic free function names its parameter after the sub name:

sub id[T]($x: T): T = $x

sub first[T](@xs: List[T]): Option[T] {
    if @xs.is_empty() { None } else { Some(@xs[0]) }
}

sub main() {
    $n = id(42)
    $s = id('hi')
    print("$n $s")            # 42 hi
    @xs = [10, 20, 30]
    $head = first(@xs).unwrap_or(0)
    print("first is $head")   # first is 10
}

The same id/first work at any element type, monomorphized per use - and the next lessons reuse this for generic class and enum definitions (a generic container, a generic tree). Because there is no return-by-default and functions are values, you tend to write small generic transformations and compose them with . rather than nesting calls inside-out.

Reference: Guji v0.1-alpha specification, guji-spec.md §7 (Functions), §3.3 (Generics), and §5 (Expressions and Operators) for precedence.

Collections and Control Flow

Lists, maps, ranges, loops, and the prelude's chainable collection functions.

Guji has two built-in compound types - List[T] and Map[K, V] - and a prelude of chainable functions for working with them. Idiomatic Guji transforms collections with these functions and uses explicit loops only for side effects.

Lists and maps

sub main() {
    @items = [10, 20, 30]
    $first = @items[0]          # 10 - indexing panics if out of bounds
    $maybe = @items.get(5)      # Option[T] - None instead of a panic
    %ages = {"ada": 30, "bob": 25}
    $a = %ages{"ada"}           # 30
    print("$first $a")
}

Indexing @xs[i] returns the element directly and panics when i is out of range; get is the checked alternative that yields Option[T].

The collection prelude

Every prelude function is data-first (the functions lesson), so all of them chain. The list staples:

sub main() {
    @nums = [1, 2, 3, 4, 5, 6]
    @doubled = @nums.map({ $_ * 2 })        # [2, 4, 6, 8, 10, 12]
    @evens = @nums.filter({ $_ % 2 == 0 })  # [2, 4, 6]
    $total = @nums.sum()                    # 21
    $found = @nums.find({ $_ > 4 })         # Some(5)
    @sorted = [3, 1, 2].sort()              # [1, 2, 3]
    print("@doubled")
    print("total = $total")
    print("@sorted")
}

Other staples include reduce(@xs, $init, $f), count, is_empty, min/max, reverse, take/drop, contains, and join(@xs, $sep). Membership ("is $x one of these?") is just contains/has_key - there is no special set operator: ["admin", "mod"].contains($role). Note that the length function is count (@xs.count()), not len/size, and it works on lists; for a string's word or line count, split first and call count on the resulting list.

Map functions are similarly data-first: get(%m, $k) -> Option[V], set, remove, keys, values, and has_key. Because values are immutable, set and remove return a new map rather than mutating in place.

Everything is an expression

if, match, and blocks all yield values. When if is used as a value, an else is required and both branches must yield the same type:

sub main() {
    $n = 7
    $label = if $n > 0 { "positive" } else { "non-positive" }
    print($label)   # positive
}

Loops and ranges

for iterates a list, map, or range; while repeats while a condition holds. Both are statements that yield (). Lists and ranges bind one loop variable; maps bind two (key, then value):

sub main() {
    for $x in [1, 2, 3] {
        print("$x")
    }
    for $name, $age in {"ada": 30, "bob": 25} {
        print("$name is $age")
    }
    mut $total = 0
    for $i in 1..5 {           # 1..5 is inclusive; 1..<5 excludes 5
        $total = $total + $i
    }
    print("total = $total")    # total = 15
}

a .. b is an inclusive range and a ..< b is half-open. Ranges are iterable in for. A map loop with only one binding is a compile-time error - the language forces you to name both key and value, in keeping with its "one obvious way" stance. Reach for map/filter/reduce to produce values, and for for/while only when you are iterating for a side effect like printing.

Reference: Guji v0.1-alpha specification, guji-spec.md §6 (Control Flow) and §15 (Collections and the Prelude).

Classes, Enums, and Pattern Matching

Product types with `class`, sum types with `enum` (including generic ones), and exhaustive `match`.

Guji models data with two complementary user types: a class holds all of several fields at once (a product type), and an enum is exactly one of several labelled variants (a sum type). You take both apart - but especially enums - with match.

Classes (product types)

Fields are declared with has and a twigil that sets visibility: $.name is public (readable from outside as $obj.name), $!name is private (reachable only inside the class body). A class is constructed by naming all fields as arguments (the bare name, no twigil):

class Account {
    has $.owner: Str
    has $.balance: Float
    has $!pin: Str

    sub opened($owner: Str): Account {              # associated function (no $self)
        Account(owner: $owner, balance: 0.0, pin: "0000")
    }

    sub deposit($self, $amount: Float): Account {   # method (first param $self)
        Account(owner: $self.owner, balance: $self.balance + $amount, pin: $!pin)
    }

    sub check_pin($self, $guess: Str): Bool {
        $!pin == $guess
    }
}

sub main() {
    $a = Account.opened("ada")
    $a2 = $a.deposit(50.0)        # == deposit($a, 50.0)
    print("balance = { $a2.balance }")   # balance = 50
    $ok = $a2.check_pin("0000")
    print("pin ok = $ok")                # pin ok = true
}

The whole thing runs and prints balance = 50 then pin ok = true. Note the key consequence of immutability: deposit does not mutate $a; it returns a new Account. Methods that "modify" an object always return a fresh instance.

Enums (sum types), including generic ones

An enum lists variants, each of which may carry its own typed fields. Variants are separated by newlines or commas; a fieldless variant is a bare name:

enum Shape {
    Circle($radius: Float)
    Rect($width: Float, $height: Float)
}

enum Direction { North, South, East, West }

Enums may be generic, which is how you build recursive containers:

enum Tree[T] {
    Leaf($value: T)
    Node($left: Tree[T], $right: Tree[T])
}

sub size[T]($t: Tree[T]): Int {
    match $t {
        Leaf($v)     { 1 }
        Node($l, $r) { size($l) + size($r) }
    }
}

sub main() {
    $t = Node(Leaf(1), Node(Leaf(2), Leaf(3)))
    print("leaves: { size($t) }")   # leaves: 3
}

You create a value by naming a variant (Circle(2.0), North, Leaf(1)), and the only way to read a variant's fields is to match on it.

Pattern matching

match is an expression that tests a value against patterns in order and evaluates the first matching arm, binding any captured fields. It must be exhaustive - the compiler rejects a match that does not cover every possible value and tells you which case is missing.

sub area($s: Shape): Float {
    match $s {
        Circle($r)   { 3.14159 * $r * $r }
        Rect($w, $h) { $w * $h }
    }
}

Patterns can be a literal (0, "x", true), a binding ($x, matches anything and binds it), a wildcard (_, matches anything and binds nothing), a variant with captures (Circle($r)), or a nested pattern (Node(Leaf($a), $b), matched structurally). An arm may add a boolean guard with if:

sub describe($n: Int): Str {
    match $n {
        0            { "zero" }
        $x if $x < 0 { "negative" }
        _            { "positive" }
    }
}

A wildcard _ arm covers all remaining cases. Because Option and Result are ordinary enums (next lesson), exhaustiveness also forces you to handle both Some/None and Ok/Err. This is Guji's single mechanism for taking labelled data apart - there is no field-by-field accessor on an enum, just match.

Reference: Guji v0.1-alpha specification, guji-spec.md §8 (Product Types - class), §9 (Sum Types - enum), §10 (Pattern Matching).

Error Handling: Option, Result, and `?`

No exceptions - model absence and failure as values, propagate with `?`, panic only for bugs.

Guji has no exceptions. Anything that can be absent or can fail returns one of two standard sum types, and you either handle the cases or propagate them. Unrecoverable bugs use panic, which is a different thing entirely.

The two carrier types

enum Option[T] { Some($value: T), None }
enum Result[T, E] { Ok($value: T), Err($error: E) }

A function that may not produce a value returns Option[T]; one that may fail with an error returns Result[T, E]. Because they are ordinary enums, you can always match on them - and exhaustiveness (previous lesson) forces you to handle both arms:

sub main() {
    match "42".parse_int() {
        Ok($n)  { print("parsed $n") }
        Err($e) { print("error: $e") }
    }
}

The `?` propagation operator

Matching every result by hand is noise. The postfix ? operator unwraps the happy path and returns early on failure. Inside a function returning Result[_, E], applying ? to a Result[T, E] yields the T on Ok, and on Err makes the enclosing function return that Err immediately. The same works for Option inside an Option-returning function:

sub parse_age($s: Str): Result[Int, Str] {
    $n = $s.parse_int()?     # unwrap, or return the Err early
    if $n >= 0 { Ok($n) } else { Err("age must be non-negative") }
}

sub main() {
    match parse_age("42") {
        Ok($n)  { print("age is $n") }   # age is 42
        Err($e) { print("error: $e") }
    }
    match parse_age("oops") {
        Ok($n)  { print("age is $n") }
        Err($e) { print("error: $e") }   # error: invalid integer: oops
    }
}

Both branches run as commented. ? keeps every possible failure visible in the type signature while removing the per-call boilerplate. It requires a matching carrier and error type: a Result[_, E] propagates only inside a sub returning Result[_, E] with the same E. To bridge types you convert first - ok_or turns an Option into a Result, and map_err adapts the error type.

The Option/Result toolkit

Both carriers come with chainable, data-first helpers so ? is the common path and match is the exception:

sub main() {
    $count = "100".parse_int().unwrap_or(0)   # the value, or 0 on failure
    $picked = ["a", "b"].get(5).unwrap_or("missing")
    print("$count $picked")                   # 100 missing
}

Handy ones: is_some/is_none/is_ok/is_err, unwrap_or, unwrap_or_else, map (transform the success value), and_then (flat-map: the function returns another carrier), or, filter, ok_or, and for Result also map_err, or_else, ok, and err. unwrap/expect are also available but they panic on None/Err, so reserve them for cases you have proven cannot fail.

panic - for bugs, not conditions

Some failures are bugs: a violated invariant, an out-of-bounds index, an unwrap of None. For these Guji has panic($message: Str): Never. It writes the message and aborts the process with a non-zero exit code - there is nothing to catch.

sub main() {
    $o = ["a"].get(9)
    print($o.unwrap())   # panic: unwrap on None  (process aborts, exit 1)
}

The runtime also panics on integer divide/modulo by zero, integer overflow, and out-of-bounds indexing. panic returns Never (the bottom type), so it can sit in any expression position, e.g. one branch of an if whose other branch yields a value. The rule of thumb: expected failure is Option/Result; a violated assumption is a panic.

Reference: Guji v0.1-alpha specification, guji-spec.md §11 (Error Handling) and §15.5 (Option and Result functions).

Platform IO: print, note, args, files, stdin, exit

Talk to the outside world - stdout/stderr, CLI args, files via Result, stdin and file Handles, and the process exit code.

Guji is a real systems-shaped language: it talks to the outside world through a small, explicit set of prelude IO builtins. There is no hidden global state - you read input, transform it, and write output. Every builtin below was verified on the v0.1-alpha toolchain, interpreter and native.

Writing output

Two printers, one per stream:

sub main() {
    print("goes to stdout")   # value + newline on stdout
    note("goes to stderr")    # value + newline on stderr
}

print($v) writes a value's display form and a newline to stdout; note($v) does the same to stderr. Use note for diagnostics and progress so they do not pollute a program's real output - redirecting 2>/dev/null drops the notes and keeps the prints.

Command-line arguments

args() returns List[Str] - the arguments that follow the source path (or follow the binary, for a native build):

sub main() {
    @a = args()
    print("argc: { @a.count() }")
    for $x in @a {
        print("arg: $x")
    }
}

Run as guji prog.guji alpha beta (or ./prog alpha beta) and @a is ["alpha", "beta"].

Reading a whole file

read_file($p: Str) returns Result[Str, Str] - the file's contents on success, an error message on failure - so you must handle the failure path:

sub main() {
    match read_file("notes.txt") {
        Ok($text) {
            @lines = $text.lines()
            print("read { @lines.count() } lines")
        }
        Err($e) { note("could not read: $e") }
    }
}

Handles: streaming files and stdin

For streaming, open($p: Str): Result[Handle, Str] gives you a file Handle, and the global stdin is a Handle (note: no parentheses - stdin, not stdin()). A Handle offers two readers: .slurp(): Str reads the rest in one go, and .lines(): Chan[Str] yields the chomped lines (newlines stripped) as a channel you drain with for:

sub main() {
    match open("notes.txt") {
        Ok($h) {
            for $line in $h.lines() {
                print("got: $line")
            }
        }
        Err($e) { note("open error: $e") }
    }
}

Reading from stdin is the same shape - here a line filter that drops lines containing ERROR to stderr and passes the rest through:

sub main() {
    mut $errors = 0
    $rows = stdin.lines()           # bind the channel to a local first
    for $line in $rows {
        match $line ~~ /ERROR/ {
            Some($m) { $errors = $errors + 1 ; note("dropped: $line") }
            None     { print($line) }
        }
    }
    print("errors seen: $errors")
}

Pipe input in with printf 'ok\nERROR x\nok2\n' | guji filter.guji. One native-build wrinkle to know: when iterating a channel (such as stdin.lines()), the native code generator requires the channel to be a plain local variable, so bind it - $rows = stdin.lines() then for $line in $rows { ... } - rather than looping over the call directly. The interpreter accepts both; binding first keeps the program native-clean.

Setting the exit code

exit($code: Int): Never stops the process immediately with the given code. Because it returns Never, it type-checks in any position, including one arm of an if:

sub fail($msg: Str): Never {
    note($msg)
    exit(2)
}

sub main() {
    $args = args()
    if $args.is_empty() { fail("usage: prog <name>") } else { print("ok") }
}

With no arguments this writes usage: prog <name> to stderr and exits 2. As the toolchain lesson noted, you can also set the exit code structurally by giving main an Int return: sub main(): Int { ...; 0 }. Use that for the normal completion code and exit for early termination.

Reference: Guji v0.1-alpha specification, guji-spec.md §15.4 (Platform IO).

The Signature Feature: Regex and PEG Grammars

First-class regex literals and reusable PEG grammars - the reason Guji exists.

Text processing is why Guji exists. Regular expressions and grammars are built-in types, not a library - and they form a deliberate two-layer model: flat regex for non-recursive matching, and PEG grammars for recursive, structured input.

Regex literals and the `~~` operator

A regex is written between slashes and is a first-class Regex value. The match operator ~~ tests a Str against a regex and yields Option[Match], so you match on the result:

sub main() {
    $line = 'ada@example.com'   # single-quoted: '@' is literal, not interpolation
    match $line ~~ /(?<user>\w+)@(?<host>\w+)/ {
        Some($m) {
            $u = $m<user>.unwrap_or("?")
            $h = $m<host>.unwrap_or("?")
            print("user $u at host $h")   # user ada at host example
        }
        None { print("no match") }
    }
}

That program runs and prints user ada at host example. A Match exposes .text, positional captures $m[0] (whole match), $m[1], ..., and named captures $m<name>. Every capture is Option[Str] - a group that did not participate yields None, which is why unwrap_or is idiomatic above. For regex literals the compiler statically checks that a named or positional capture exists.

You can also build a regex at runtime: Regex.compile($pattern) returns Result[Regex, Str] (a Regex on success, a compile error otherwise), and the resulting value matches with ~~ exactly like a literal:

sub main() {
    match Regex.compile('(?<d>\d+)') {
        Ok($re) {
            match "n42" ~~ $re {
                Some($m) {
                    $d = $m<d>.unwrap_or("?")
                    print("d = $d")   # d = 42
                }
                None { print("no") }
            }
        }
        Err($e) { note($e) }
    }
}

The regex dialect is rich and Unicode-aware by default (\w, \d, \s follow Unicode properties; (?a) restricts them to ASCII): character classes, \p{...} properties, anchors, greedy/lazy/possessive quantifiers, capturing/non-capturing/atomic groups, named captures and backreferences, lookaround, and inline modifiers (?imsxa). Replacement uses the prelude: .replace(re, template) substitutes via a template where $<name> and $1 reference captures:

sub main() {
    $out = "ada lovelace".replace(/(?<w>\w+)/, '[$<w>]')
    print($out)   # [ada] [lovelace]
}

Crucially, recursion and conditionals are intentionally not part of regex - the compiler rejects (?R), (?1), and (?(1)...) and points you at grammars. Flat patterns stay flat.

Grammars - the recursive layer

A grammar is a named, reusable, structured parser with ordered-choice (PEG) semantics. It contains named productions of three kinds: a token matches a pattern with no implicit whitespace, a rule is like a token but matches whitespace automatically between adjacent terms, and a regex production is a pure regex body. The entry production is named TOP:

grammar Email {
    rule  TOP    { <user> '@' <domain> }
    token user   { \w+ }
    token domain { \w+ '.' \w+ }
}

sub main() {
    match Email.parse('ada@example.com') {
        Some($b) {
            match $b<user> {
                Some($u) { print($u.text) }   # prints "ada"
                None     { print("no user") }
            }
        }
        None { print("invalid") }
    }
}

This runs and prints ada. Inside a production, <name> references another production and captures it under that name, quoted text matches a literal, and the full regex syntax is available. GrammarName.parse($input) returns Option[Bush] - a parse tree, not flat text.

A Bush node is where grammars differ from regex: $b<name> reaches the sub-node captured by that production as Option[Bush] (checked at compile time, since production names are known statically), and $b.text is the Str span the node matched. So a Match capture is flat text (Option[Str]) while a Bush capture is a sub-tree (Option[Bush]) whose text you read with .text. You descend with match or and_then.

A grammar is a pure recognizer: it builds the parse tree, and semantic processing - evaluating, validating, building a typed value - is a separate pass that walks the Bush with match. There are no embedded actions and no side effects during matching, which keeps a grammar a declarative description and makes context-sensitive concerns ("this name must already be declared") a job for the semantic pass.

Reference: Guji v0.1-alpha specification, guji-spec.md §13 (Regular Expressions) and §14 (Grammars); design rationale in the text-processing RFC under docs/.

Concurrency, Modules, and the Ecosystem

Real CSP concurrency - hatch, channels, select - plus modules, visibility, and tooling.

This lesson covers Guji's concurrency model (real and runnable as of v0.1-alpha) and the module/visibility/tooling story around it.

Concurrency is real

Guji's concurrency follows Go's CSP - lightweight tasks and typed channels - with one twist the immutability story makes free: every value crossing a channel is immutable, so tasks share data only by communicating. There is no shared mutable state and therefore no data races are even expressible (where Go could send a pointer and share memory, Guji structurally cannot). This is implemented and runs in both the interpreter and the native compiler.

A task is started with hatch { ... }, which runs the block concurrently and returns immediately. Channels are built with channel() (unbuffered) or channel(n) (buffered) and carry one type. A producer task fills a channel and closes it; the consumer drains it with for, which ranges until the channel is closed and emptied:

sub main() {
    $jobs: Chan[Int] = channel()       # unbuffered; channel(16) for buffered
    hatch {
        for $n in [1, 2, 3] { $jobs.send($n) }
        $jobs.close()
    }
    for $job in $jobs {                # ranges until the channel is closed and drained
        print("got $job")
    }
}

That prints got 1, got 2, got 3 - and the native build (guji build -o jobs jobs.guji && ./jobs) produces the identical output, since the differential gate enforces it. The channel methods are data-first: $ch.send($v) puts a value, $ch.recv(): Option[T] takes one (None once the channel is closed and drained), and $ch.close() closes it. Reach for recv when you want explicit control instead of a for drain:

sub main() {
    $c: Chan[Str] = channel()
    hatch {
        $c.send("hello")
        $c.close()
    }
    match $c.recv() {
        Some($v) { print("recv: $v") }   # recv: hello
        None     { print("closed") }
    }
}

A select statement waits on several channel operations and runs the first ready arm, binding each arm's recv() result as an Option[T] you then match; with an else arm it becomes a non-blocking poll. A hatch block may capture only immutable bindings, so a task can never observe another mutating shared data - the compiler enforces it.

Modules and visibility

A source file is a module; its path is the file path from the project root with / written as ::. You bring a module into scope with import and reach its members with :::

import geometry::circle
sub main() {
    $a = circle::area(2.0)
    print("circle area: $a")
}

Top-level declarations (sub, class, enum, grammar) are private to their module by default. Prefix pub to export - and pub declarations must fully annotate their parameter and return types, forming the stable interface:

pub sub area($radius: Float): Float { 3.14159 * $radius * $radius }   # visible to importers
sub helper($x: Float): Float { $x * $x }                             # module-private

Visibility is governed by two independent mechanisms: pub controls what a module exposes, and field twigils ($. / $!) control what a class exposes. Identifier casing (snake_case for bindings/subs/modules, PascalCase for types/variants) is a convention, never a visibility rule.

Tooling and the implementation

The v0.1-alpha toolchain is a single Go binary you build yourself (the setup lesson): a lexer, recursive-descent parser, tree-walking interpreter + prelude, a type checker (inference, match exhaustiveness, ?-operator rules), and a native code generator. The interpreter is the reference oracle: every test fixture is a .guji program paired with a golden output, and the native build must reproduce the interpreter's exact stdout and exit code - a differential gate of hundreds of fixtures keeps the two engines byte-identical. The quality gate is plain Go tooling - gofmt, go vet, go build, go test.

The implementation grew through independently-testable stages - lexer, parser, interpreter, type checker, native codegen, first-class regex, grammars, platform IO, generics, and concurrency - and these are all in place at v0.1-alpha. Deferred-but-planned features include traits/interfaces, lazy sequences, refinement constraints, and selective imports - recorded so the implementation never accidentally precludes them.

Reference: Guji v0.1-alpha specification, guji-spec.md §16 (Visibility and Modules), §17 (Concurrency), §18 (Compilation Model); repository README.md (Project layout, Development).