Forth and Stack Machines
Forth is Black Square radical. An approach to programming that strips away almost everything other languages consider essential — no syntax, no redundancy, no typing, no parentheses, no indentation, no files, no operating system. Chuck Moore, its creator, wrote VLSI design tools in about 500 lines of source code. Cadence and Mentor Graphics do the same thing with millions of lines. Whether you find that inspiring or terrifying says a lot about your philosophy of programming.1
The Self-Building Language
What sets Forth apart from every other language family isn't the postfix notation — that's a surface detail, a consequence rather than a cause. What's radical is that Forth is a language that builds itself from almost nothing. The very first thing defined in pForth's bootstrap file is the comment word:1
: ( 41 word drop ; immediate
( Now we can add comments to what we are doing! )
"Now. We. Can. Add. Comments." A few lines later, the conditional primitives — IF, THEN, BEGIN, UNTIL — are defined the same way, as words that manipulate the compiler's code array at compile time. These aren't primitives in the usual sense. They're defined here, in the language itself, by writing branch instructions and resolving forward references into the code being compiled.
The entire execution model fits in a small interpreter loop: an instruction pointer, a data stack, a return stack, and a switch statement over a handful of built-in operations (arithmetic, stack manipulation, literals, branches, return). Compiling Forth is basically WYSIWYG — the word COMPILE SOMETHING appends the address of SOMETHING to the code array. A colon definition just starts recording addresses; a semicolon appends a RETURN. The mapping from source to executable is transparent in a way that makes C's "close to the metal" claim look like marketing.1
CREATE/DOES> is where it gets genuinely mind-bending. CREATE defines what happens when you're defining a new word; DOES> defines what happens when you use that word. Between them, they let you create defining words — abstractions that generate other abstractions. CONSTANT, VARIABLE, even CLASS can all be defined this way, each in a few lines. It's metaprogramming pushed to its logical extreme: the language not only extends itself, it extends the mechanisms by which it extends itself.1
The Curse of Radicalism
There's a catch, and it's fatal for mainstream adoption. Forth's implicit parameter passing via the stack means that looking at a word definition, you often can't tell what it does without knowing the stack state. 2DUP UP RIGHT DOWN LEFT — what are those two values that 2DUP duplicates? Stack comments help (: DRAW-RECTANGLE ( width height -- ) ...), but they're documentation, not enforced contracts. Yosefk, who designed a production Forth processor's instruction set, doesn't consider himself a competent Forth programmer. That's telling.1
This is the same tension that runs through language design philosophy more broadly. Forth is the purest expression of the "right thing" taken to an extreme that goes past elegance into asceticism. The Lisp Curse applies doubly: if Lisp's expressive power makes every programmer an island, Forth's makes every programmer a hermit. Chuck Moore's VLSI tools are masterworks, but they exist in a universe of one.
The Linear Logic Connection
Henry Baker's 1993 paper reveals something beautiful hiding inside Forth's stack discipline: it's linear logic made concrete. In a linear language, every bound variable is used exactly once — reads are destructive. You can't silently duplicate or discard values; copying requires explicit dup, destruction requires explicit drop. This is exactly how Forth's data stack works.2
Baker shows that a "Linear Lisp" — Lisp with the constraint that every variable occurs exactly once — maps naturally onto a permutation stack machine, which is an abstract version of Forth. The identity function compiles to no code at all (the argument is already on top of the stack). square compiles to [dup *] — duplicate the argument, multiply. The quadratic formula compiles to a sequence of roll, dup, and arithmetic operations where the stack permutations correspond exactly to the points where the mathematical formula reuses a variable.2
This is not just a cute analogy. Linear logic avoids the need for garbage collection (explicit construction and destruction), prevents dangling references (the only occurrence of a name is used to invoke its destructor), eliminates the need for most synchronization (only one path to an object at any time), and enables automatic parallelism (subexpressions can always execute in parallel because linearity guarantees no read/write conflicts). These are precisely the properties that substructural type systems like Rust's ownership model are designed to provide — and Baker identified them twenty years before Rust existed, hiding in Forth.2
The "Fortran fallacy," as Baker calls it, is the belief that mathematical expressions are the natural way to program computers. After 40 years of trying to make that metaphor efficient, computer science has essentially proved by contradiction that it doesn't work. The linear/conservative object-oriented metaphor — where objects have true identity and are conserved, like physical things — maps much more naturally onto both hardware and human intuition. A 12-year-old understands that you need to copy b before using it twice in the quadratic formula. Computer scientists spent 40 years trying to eliminate that one trivial task while greatly increasing the cost of everything else.2
Why It Matters
Forth won't come back as a mainstream language. The tradeoff — extreme power and minimalism for extreme opacity and isolation — doesn't work for teams, for large codebases, or for the kind of software that dominates our world. But Forth's ideas keep resurfacing in disguise. Stack-based virtual machines (JVM, CPython, WebAssembly) use the same execution model. Rust's ownership system is Baker's linear types with better ergonomics. PostScript is literally a Forth. The demoscene loves Forth for the same reason Moore loves it: when you need to do the impossible in impossibly little space, stripping away everything inessential is the only strategy that works.
The deeper lesson is about what "close to the metal" actually means. C programmers claim it, but their language requires 200,000 lines of compiler transforms to run fast. Forth programmers can trace every token to its compiled representation. The mapping is transparent, not because the hardware is simple, but because the abstraction is honest about what it is and what it costs.
Footnotes
Linked from
- Compiler Bootstrapping
But if every token is space-delimited (borrowing from Forth), and you run each token through `atoi()`, the result serves triple duty: integer literals produce their numeric value, keywords produce deterministic "enum" values (since `atoi()` on `"if"`…
- Curry Howard Correspondence
Linear logic, where every resource must be used exactly once, yields linear types — and maps naturally onto stack machines, where Forth's `dup` and `drop` are the explicit contraction and weakening operators of the logic.
- Programming Languages Overview
Learning a second family — Forth's stack discipline, Logic Programming's relational bidirectionality, ML's type-driven design — changes how you think even when you're writing Python.
- Programming Languages Overview
Linear logic gives Forth's stack discipline — Baker's 1993 paper showed that Forth's `dup` and `drop` are literally the contraction and weakening operators of substructural logic, anticipating Rust by two decades.
- Reflective Towers
The connection to Forth is worth noting.
- Transparency As Practice
Forth is the most transparent programming language ever designed.
- Transparency As Practice
The Forth programmer, the meditator, the demoscener, and the DES participant are all doing the same thing: temporarily stripping away the abstraction layers that normally make life tractable, in order to see the machinery that normally runs invisibly…