Goodnight Wiki / Forth and Stack Machines

Forth and Stack Machines

Forth is Black Square radical. An approach to programming that strips away almost everything other languages consider essential — no syntax, no redundancy, no typing, no parentheses, no indentation, no files, no operating system. Chuck Moore, its creator, wrote VLSI design tools in about 500 lines of source code. Cadence and Mentor Graphics do the same thing with millions of lines. Whether you find that inspiring or terrifying says a lot about your philosophy of programming.1

The Self-Building Language

What sets Forth apart from every other language family isn't the postfix notation — that's a surface detail, a consequence rather than a cause. What's radical is that Forth is a language that builds itself from almost nothing. The very first thing defined in pForth's bootstrap file is the comment word:1

: (   41 word drop ; immediate
( Now we can add comments to what we are doing! )

"Now. We. Can. Add. Comments." A few lines later, the conditional primitives — IF, THEN, BEGIN, UNTIL — are defined the same way, as words that manipulate the compiler's code array at compile time. These aren't primitives in the usual sense. They're defined here, in the language itself, by writing branch instructions and resolving forward references into the code being compiled.

The entire execution model fits in a small interpreter loop: an instruction pointer, a data stack, a return stack, and a switch statement over a handful of built-in operations (arithmetic, stack manipulation, literals, branches, return). Compiling Forth is basically WYSIWYG — the word COMPILE SOMETHING appends the address of SOMETHING to the code array. A colon definition just starts recording addresses; a semicolon appends a RETURN. The mapping from source to executable is transparent in a way that makes C's "close to the metal" claim look like marketing.1

CREATE/DOES> is where it gets genuinely mind-bending. CREATE defines what happens when you're defining a new word; DOES> defines what happens when you use that word. Between them, they let you create defining words — abstractions that generate other abstractions. CONSTANT, VARIABLE, even CLASS can all be defined this way, each in a few lines. It's metaprogramming pushed to its logical extreme: the language not only extends itself, it extends the mechanisms by which it extends itself.1

The Curse of Radicalism

There's a catch, and it's fatal for mainstream adoption. Forth's implicit parameter passing via the stack means that looking at a word definition, you often can't tell what it does without knowing the stack state. 2DUP UP RIGHT DOWN LEFT — what are those two values that 2DUP duplicates? Stack comments help (: DRAW-RECTANGLE ( width height -- ) ...), but they're documentation, not enforced contracts. Yosefk, who designed a production Forth processor's instruction set, doesn't consider himself a competent Forth programmer. That's telling.1

This is the same tension that runs through language design philosophy more broadly. Forth is the purest expression of the "right thing" taken to an extreme that goes past elegance into asceticism. The Lisp Curse applies doubly: if Lisp's expressive power makes every programmer an island, Forth's makes every programmer a hermit. Chuck Moore's VLSI tools are masterworks, but they exist in a universe of one.

The Linear Logic Connection

Henry Baker's 1993 paper reveals something beautiful hiding inside Forth's stack discipline: it's linear logic made concrete. In a linear language, every bound variable is used exactly once — reads are destructive. You can't silently duplicate or discard values; copying requires explicit dup, destruction requires explicit drop. This is exactly how Forth's data stack works.2

Baker shows that a "Linear Lisp" — Lisp with the constraint that every variable occurs exactly once — maps naturally onto a permutation stack machine, which is an abstract version of Forth. The identity function compiles to no code at all (the argument is already on top of the stack). square compiles to [dup *] — duplicate the argument, multiply. The quadratic formula compiles to a sequence of roll, dup, and arithmetic operations where the stack permutations correspond exactly to the points where the mathematical formula reuses a variable.2

This is not just a cute analogy. Linear logic avoids the need for garbage collection (explicit construction and destruction), prevents dangling references (the only occurrence of a name is used to invoke its destructor), eliminates the need for most synchronization (only one path to an object at any time), and enables automatic parallelism (subexpressions can always execute in parallel because linearity guarantees no read/write conflicts). These are precisely the properties that substructural type systems like Rust's ownership model are designed to provide — and Baker identified them twenty years before Rust existed, hiding in Forth.2

The "Fortran fallacy," as Baker calls it, is the belief that mathematical expressions are the natural way to program computers. After 40 years of trying to make that metaphor efficient, computer science has essentially proved by contradiction that it doesn't work. The linear/conservative object-oriented metaphor — where objects have true identity and are conserved, like physical things — maps much more naturally onto both hardware and human intuition. A 12-year-old understands that you need to copy b before using it twice in the quadratic formula. Computer scientists spent 40 years trying to eliminate that one trivial task while greatly increasing the cost of everything else.2

Why It Matters

Forth won't come back as a mainstream language. The tradeoff — extreme power and minimalism for extreme opacity and isolation — doesn't work for teams, for large codebases, or for the kind of software that dominates our world. But Forth's ideas keep resurfacing in disguise. Stack-based virtual machines (JVM, CPython, WebAssembly) use the same execution model. Rust's ownership system is Baker's linear types with better ergonomics. PostScript is literally a Forth. The demoscene loves Forth for the same reason Moore loves it: when you need to do the impossible in impossibly little space, stripping away everything inessential is the only strategy that works.

The deeper lesson is about what "close to the metal" actually means. C programmers claim it, but their language requires 200,000 lines of compiler transforms to run fast. Forth programmers can trace every token to its compiled representation. The mapping is transparent, not because the hardware is simple, but because the abstraction is honest about what it is and what it costs.

Footnotes

  1. My history with Forth & stack machines by Yossi Kreinin — source 2 3 4 5

  2. Linear Logic and Permutation Stacks — The Forth Shall Be First by Henry G. Baker — source 2 3 4

Open in stacked reader →