Goodnight Wiki / Bootstrapping and Trusting Trust

Bootstrapping and Trusting Trust

How do you get a computer from nothing to something? Not philosophically — literally. You have a CPU that understands raw binary, and you want to run an operating system. Somewhere between those two endpoints, someone has to write a compiler in a language that doesn't exist yet, bootstrap a toolchain from source code that can't be compiled, and trust binary blobs that nobody alive fully understands. The bootstrapping problem is one of computing's most underappreciated rabbit holes. (For the trust angle specifically — Thompson's attack, Bootstrappable Builds, Zig's wasm solution — see Compiler Bootstrapping.)

The Bootstrap Chain

The Bootstrappable Builds project and Ekaitz Zarraga's RISC-V GCC bootstrap work reveal the full vertigo of the problem. To compile GCC, you need a C++ compiler. To get a C++ compiler, you need an older GCC (one that only needs C). To compile that older GCC, you need a C compiler. To get a C compiler, you need... well, something simpler.1

The chain goes all the way down. GNU Mes is a small C compiler written in a garbage-collected Lisp. TinyCC (Tiny C Compiler) is a minimal C compiler that can be compiled by GNU Mes. An older GCC that only requires C89 can be compiled by TinyCC. And that older GCC can compile a modern GCC that requires C++98. Each stage is simple enough that, in principle, a human could verify the binary by inspection — until suddenly it isn't, and you're trusting the output of the previous stage.

Zarraga's specific challenge was bootstrapping GCC for RISC-V, which is extra painful because the first version of GCC that supports RISC-V already requires C++98 to build. You can't just build GCC on RISC-V; you need to backport RISC-V support to an ancient GCC that only needs C, compile that with TinyCC, then use it to build the modern version. He spent over a year on this.12

The reason this matters beyond the bootstrapping community is Ken Thompson's 1984 demonstration that a compiler can be rigged to insert backdoors into specific programs — including into future compilers — invisibly. If you can't bootstrap from inspectable source all the way to your running binaries, you're trusting a chain of binary blobs stretching back decades. The Bootstrappable Builds project's starting point is a 280-byte hex monitor. Everything built on top of it is, at least in theory, verifiable.

Actually Portable Executables

Justine Tunney's Cosmopolitan Libc project takes a different angle on the bootstrap problem: what if your executable ran everywhere? Actually Portable Executables (APE) are binaries that are simultaneously valid as Windows PE files, ELF files (Linux), Mach-O files (macOS), and shell scripts. You compile once and the same file runs on every major operating system and architecture.3

The GCC patch to support this is about 2,000 lines, and the trick is polyglot binary headers — the first bytes of the file are arranged so that each operating system's loader interprets them differently, jumping to the right entry point for that platform. The objcopy step strips the ELF and replaces it with the APE format, and a zip archive is appended to carry assets.

This sounds like a party trick, but it has real implications for bootstrapping. If you can create a single self-contained C compiler that runs on any platform — the pts-tcc project combines TCC and uClibc into a 350KB executable that needs no external files — you have a bootstrapping tool that doesn't depend on the host system. You can email someone a compiler.4

Booting from Nothing

The minimalist systems projects in this cluster are all exploring the same question: how small can a useful computer be?

Phil Pearl's experiment of running Go as PID 1 in a scratch VM — no shell, no init system, no libc, just a Go binary as the only userspace process on top of a minimal Linux kernel — demonstrates how much of a modern OS is actually optional. The Go runtime provides its own memory allocator, scheduler, and garbage collector; it makes syscalls directly without libc. The resulting system boots in under a second and runs a complete web server.5

On the other end of the historical spectrum, the vinyl record boot project physically modulates a FreeDOS boot disk image as an analog audio signal, cuts it to a 10-inch vinyl record, and plays it back through the IBM PC 5150's cassette interface. The bootloader in ROM demodulates the audio, loads the 64K disk image into memory, and boots FreeDOS. It takes six minutes at 45 RPM. The entire system — ROM bootloader, analog modulation, record cutting, demodulation — is a kind of extreme bootstrapping: proving that a PC can boot from literally any medium that can carry a modulated signal.6

And the 486-from-floppy project shows what it takes to run a modern Linux kernel on hardware from 1993. The make tinyconfig kernel option produces a bare-minimum kernel small enough to fit on a 1.44MB floppy, but it's useless without enabling specific drivers for the 486's hardware. The BIOS can only see 8GB of hard drive space (and sometimes claims it's only 504MB), while Linux sees the full capacity. The gap between what the BIOS understands and what the OS understands is itself a bootstrapping problem — the BIOS is the first stage bootloader, and its limitations constrain everything that follows.7

Emulation as Preservation

The Darkstar project — a full software emulator of the Xerox Star (the 1981 machine that introduced the desktop metaphor) — illustrates how emulation becomes a form of digital archaeology. The Star's Central Processor is microcoded, and different operating systems (Mesa, Interlisp-D, Smalltalk) each loaded their own microcode to execute their own custom bytecodes. The hardware documentation had substantial gaps ("Still waiting for these sections to be written..." reads one note from the 1982 hardware manual), requiring cross-referencing schematics, ROMs, and the machine's own self-diagnostic codes to reverse-engineer the missing details.8

Bellard's JSLinux takes emulation further by compiling an entire x86 emulator to JavaScript/WebAssembly with Emscripten. The irony of the development process is wonderful: he wrote JSLinux as hand-coded asm.js, then wrote TinyEMU in C, then converted JSLinux from asm.js back to C, then compiled it to JavaScript with Emscripten, and the compiled version was faster than the hand-coded version. The emulator supports nested virtualization — you can run QEMU inside the emulated Linux, which uses AMD SVM extensions that the emulator implements, to run Windows NT inside Linux inside JavaScript inside your browser.9

The CheerpX approach to preserving Flash content takes the most ambitious path: rather than reimplementing Flash (which Gnash, Shumway, Ruffle, and Lightspark all attempted with varying degrees of failure), they run the original unmodified Adobe Flash plugin inside a WebAssembly x86 virtual machine. The lesson from previous reimplementation attempts is clear: Flash is too complex and too poorly documented to faithfully reimplement. The only reliable way to preserve Flash content is to preserve the exact binary that originally ran it, sandboxed inside an emulation layer.10

The Compiler Bootstraps Itself

The QEMU Tiny Code Generator reveals the inner loop of how a modern emulator actually translates code. QEMU doesn't interpret guest instructions one at a time; it translates blocks of guest code into an intermediate representation (TCG IR), then translates that IR into host machine code. The guest code runs at near-native speed because the hot paths are just native machine code, recompiled from the guest ISA.11

The TCG is itself a small compiler — it does register allocation, instruction selection, and code generation. But unlike a normal compiler, it runs at runtime and must be fast enough that the compilation overhead is amortized over execution. This is the same JIT compilation strategy that JavaScript engines use, but applied to machine code rather than bytecode.

Porting as Bootstrapping

Drew DeVault's account of porting Alpine Linux to RISC-V shows that porting an operating system to a new architecture is itself a bootstrapping problem. You need a cross-compiler (GCC targeting RISC-V, built on x86). You need libc (musl, whose RISC-V port was untested at scale — DeVault found and fixed several bugs). Then you cross-compile the essential packages: the native toolchain, the package manager, tar, patch, openssl. This is the bootstrapping phase — it ends when the system is self-hosting, able to compile itself on the target hardware.12

Once self-hosting is achieved, you drop the cross-compiler and do native builds, even though the RISC-V hardware is slower. The tradeoff is worth it: many packages require extensive patching to cross-compile, and the time saved by avoiding that work more than compensates for slower builds. After unlocking the programming languages — C, Python, Perl, Ruby — most open source software is portable across architectures, and the remaining porting work can be automated.

Pyodide takes a related approach in a completely different domain: running the scientific Python stack (NumPy, pandas, scikit-learn) in the browser by compiling CPython and its C extensions to WebAssembly via Emscripten. This is cross-compilation from x86 to WASM, and it faces all the same bootstrapping problems — header files, linker flags, C extensions that assume they're running on a POSIX system — plus the additional constraint that the browser sandbox doesn't provide a filesystem, process creation, or shared memory in the ways that Python extensions expect.13

What Bootstrapping Reveals

What ties all of these together — the GCC bootstrap chain, the APE polyglot binaries, the vinyl bootloader, the emulators, the OS ports — is the question of what "running a program" actually requires. Each project strips away another assumption. You don't need an operating system (Go as PID 1). You don't need the right hardware (QEMU). You don't need the right operating system (Cosmopolitan). You don't need a hard drive (vinyl boot). You don't even need the original software to still exist (Flash preservation via emulation). The minimal requirement is a CPU that can execute instructions, and everything else is a tower of bootstrapped conventions that we've collectively agreed to maintain.

Footnotes

  1. Bootstrapping GCC in RISC-V by Ekaitz Zarraga — source 2

  2. Milestone: Bootstrapping path discovered by Ekaitz Zarraga — source

  3. Patching GCC to build Actually Portable Executables by ahgamut — source

  4. Tiny, self-contained C compiler using TCC + uClibc by pts — source

  5. Go in a scratch VM by Phil Pearl — source

  6. Booting from a vinyl record by Bogin Jr. — source

  7. Booting a 486 From Floppy with the Most Up-to-Date Stable Linux Kernel by FozzTexx — source

  8. Introducing Darkstar: A Xerox Star Emulator by Josh D. — source

  9. Javascript PC Emulator - Technical Notes by Fabrice Bellard — source

  10. Preserving Flash content with WebAssembly done right by Alessandro Pignotti — source

  11. A deep dive into QEMU: The Tiny Code Generator (TCG), part 2 by Airbus SecLab — source

  12. Porting Alpine Linux to RISC-V by Drew DeVault — source

  13. Pyodide: Bringing the scientific Python stack to the browsersource

Open in stacked reader →