guji 2026
Text is a first-class citizen: regex and PEG grammars built into a typed, functional core.
Influenced by: Go OCaml Raku Rust Perl
guji is a compiled, statically-typed, functional-first language whose signature feature is first-class text processing: regular expressions (/\w+/) and PEG grammars (grammar … { rule … }) are part of the language itself, not a library. Bindings are immutable by default, types are inferred, every value carries a Perl-style sigil ($ scalar, @ list, % map), and a program compiles ahead-of-time to a single self-contained native executable. Its guiding rule is "one obvious way" — the language deliberately omits redundant syntax so each task has exactly one idiomatic form.
What makes it distinctive
- Text processing is a first-class language primitive, not a library: regex literals (
/\w+/) and PEG grammars (grammar … { rule … }) are built-in types alongsideIntandList. - A two-layer text model with a clear division of labour: flat regex for non-recursive matching (the compiler rejects regex recursion), and ordered-choice PEG grammars whose
parsereturns anOption[Bush]parse tree for recursive, structured input. - Perl/Raku-style sigils are part of every name and invariant:
$scalar,@list,%map — and class fields use twigils for visibility ($.public,$!private). - Immutable by default: bindings can't be reassigned without
mut, methods that 'modify' return a new instance, and everything that produces a value is an expression. - No exceptions —
Option[T]/Result[T, E]plus the postfix?propagation operator, withmatchexhaustively checked so the compiler names any missing case. - Statically typed with pervasive local inference; only exported
pubdeclarations must annotate their parameter and return types. - Data-first uniform call syntax:
$x.f($a)is exactlyf($x, $a), so every function chains left-to-right with.and there is no separate pipeline operator. - Go-style CSP concurrency (
hatch,Chan[T],select) where every value crossing a channel is immutable — sharing by communicating means no data races are even expressible (designed in §17; a post-v0 milestone). - 'One obvious way': redundant syntax is omitted on purpose, and an ahead-of-time compile produces a single self-contained native binary with no runtime to install.
History
guji is an in-house language designed in 2026 around a single, opinionated thesis: that text — matching it, parsing it, transforming it — is a primary concern of programming and therefore belongs in the language rather than in a library. Where most languages bolt regular expressions on as string methods and push real parsing into external parser generators, guji makes regex literals and PEG grammars built-in types alongside Int, List, and Map. The v0 specification (guji-spec.md) is the single source of truth, and the reference implementation is written in Go.
The design rests on five principles. One obvious way: for any task there is exactly one idiomatic construct, and overlapping or redundant syntax is omitted on purpose. Functional-first: bindings are immutable by default, data is transformed rather than mutated, functions are first-class values, and control constructs (if, match, blocks) are expressions that yield values. Inferred static types: every binding has a type known at compile time, but annotations are rarely required — local inference fills them in, and only exported pub declarations must annotate their interface. Text as a first-class concern: regexes and grammars are the language's signature capability. Compiles to a single binary: the output is one native executable with no external runtime to install.
The surface syntax visibly draws on Raku (formerly Perl 6): every binding wears an invariant sigil that declares its shape ($count, @items, %ages), class fields use Raku-style twigils for visibility ($.public, $!private), topic lambdas use the implicit $_ topic variable, and — most tellingly — grammars are a named, reusable, structured form of pattern, exactly the role grammars play in Raku. guji even permits emoji as whole identifiers in the snake_case class, with the sigil keeping $🚀 unambiguous.
From the ML tradition (OCaml) guji takes its functional-first stance: Hindley–Milner-style local type inference, sum types via enum, exhaustively-checked match as the way to take values apart, and immutable bindings as the default rather than the exception. From Rust it borrows the no-exceptions error model — Option[T] and Result[T, E] as ordinary enums, the postfix ? operator that propagates None/Err to the caller, exhaustiveness checking that names the missing cases, and panic (returning the bottom type Never) reserved strictly for unrecoverable bugs.
The concurrency design follows Go's CSP model — lightweight tasks started with hatch { … }, typed channels (Chan[T]) created with channel(), and a select statement that waits on several channel operations — but with one guji twist that the immutability story makes free: every value crossing a channel is immutable, so tasks share data only by communicating and the language structurally cannot have a data race. The data-first uniform call convention ($x.f($y) is exactly f($x, $y)) and the single-self-contained-binary deployment model are also Go-flavoured.
The two text layers are deliberately complementary. Regular expressions (§13) handle flat, non-recursive matching: Unicode-aware shorthand classes, named captures returning Option[Str], the ~~ match operator yielding Option[Match], dynamic construction via Regex.compile, and <{ … }> splicing to compose Regex values. The compiler explicitly rejects regex recursion and conditionals, pointing the programmer at grammars instead. Grammars (§14) are the recursive, structured layer: ordered-choice PEG parsers built from token, rule, and regex productions, with a TOP entry point, whose parse returns an Option[Bush] parse tree rather than flat text — a grammar is a pure recognizer, and semantic processing is a separate match pass over the Bush.
The v0 implementation roadmap is built in seven independently-testable stages — lexer, parser, tree-walking evaluator, type checker + inference, native code generation, first-class regex, and finally grammars (the signature feature, built atop the regex engine). The tree-walking evaluator is the reference oracle against which the native compiler is validated, fixture-by-fixture, with sanitizer-clean native binaries required to match the interpreter's stdout and exit code. Concurrency (§17) is fully specified but is a post-v0 milestone: the hatch/select keywords and Chan[T] type are reserved now so the language never changes shape when the scheduler lands.