The C Abstract Machine
C is not a low-level language. This is a statement that sounds wrong to anyone who's programmed in C, but David Chisnall's argument is precise and devastating: a low-level language is one whose abstract machine maps easily onto the target platform's actual hardware. C's abstract machine mapped onto the PDP-11 beautifully — sequential execution, flat memory, pre/post-increment operators lining up with addressing modes. That was 1972. It hasn't been true since.1
Your computer is not a fast PDP-11. A modern Intel processor has up to 180 instructions in flight simultaneously, speculating past 25 branch predictions to keep its pipeline full. Register renaming, one of the largest consumers of die area and power, exists solely to extract instruction-level parallelism from code that was written to execute one instruction at a time. The cache hierarchy (typically three levels between registers and main memory) is completely invisible to C's flat-memory model, yet efficient cache usage is one of the most important determinants of performance. C programmers must rely on knowing implementation details — that two 64-byte-aligned values might land in the same cache line, for instance — to write efficient code, despite the abstract machine having no concept of caches at all.
The Compiler Conspiracy
The claim that C is fast because it's "close to the metal" requires an uncomfortable qualifier: it's fast because of incredibly complex compiler transforms, not because of simple translation. Clang/LLVM is around 2 million lines of code. The analysis and transform passes alone — the part that makes C run quickly — add up to nearly 200,000 lines. This is not what you'd expect for a language that supposedly maps directly to hardware.1
Consider vectorization. Processing a large array in C means writing a sequential loop. To run this optimally, the compiler must first prove that loop iterations are independent (C's restrict keyword helps, but provides far less information than Fortran, which is why C has never displaced Fortran in HPC). Then it must vectorize the loop, because modern processors get four to eight times the throughput in vector code. Then it must fight C's memory layout guarantees — the language requires specific padding rules, forbids structure field reordering, and guarantees no padding in arrays, all of which constrain the optimiser.
Loop unswitching transforms a loop containing a conditional into a conditional with a loop in each branch. This changes the program's flow control — contradicting the idea that a programmer knows what code executes when. Worse, it can transform dead code (a loop body that never executes) into Undefined Behavior (a branch on an uninitialized variable). The compiler is sound according to the C specification, but the optimisation is unsound according to any intuitive model of what the program "does."
The result is that C programs frequently behave in ways their authors did not intend and could not have predicted. A 2015 survey of C programmers, compiler writers, and standards committee members found that 36% were sure that zeroing a struct and then setting some fields would leave the padding as zeros — and they were wrong, depending on the compiler and optimization level. When you introduce pointers and provenance, the situation gets worse. GCC and Clang disagree on whether a pointer that is cast to an integer and back retains its provenance. This isn't an edge case — it affects real security vulnerabilities, including CVE-2009-1897, where the compiler inferred that a pointer couldn't be null because dereferencing it (which happened before the null check) would be undefined behavior.
The Hardware Conspiracy
Processor architects have been building fast PDP-11 emulators for decades, and the cost has been enormous. Spectre and Meltdown were the most visible consequences, but the deeper issue is energy. The register rename engine cannot be turned off while any instructions are running. In a "dark silicon" era where transistors are cheap but powered transistors are expensive, this is a serious problem. GPUs avoid the whole mess by relying on thread-level parallelism instead of instruction-level parallelism — no register renaming needed, no speculative execution, no branch prediction.
Chisnall sketches what a processor designed for speed rather than C compatibility might look like: large numbers of hardware threads (so you can suspend threads waiting for memory and fill execution units with others), wide vector units with programmer-specified parallelism (ARM's SVE approach, where hardware maps available parallelism to available execution units), and a much simpler cache coherency protocol (possible if objects are either thread-local or immutable, as in Erlang). Such a processor would be fast and energy-efficient — but running C code on it would be miserable.
bunnie's Rust Retrospective
Andrew "bunnie" Huang's experience writing 100K+ lines of Rust for the Xous microkernel OS offers a practitioner's view of what it takes to build a real system in a post-C language.2 Xous is a microkernel message-passing OS for the Precursor security device — exactly the kind of system that would traditionally be written in C.
bunnie's complaints about Rust are revealing: dense syntax ("line noise"), the language isn't finished (const generics only recently arrived; the Default trait still can't handle arrays larger than 32 elements), the supply chain attack surface through build.rs scripts, and irreproducible builds. These are real friction points. But his strongest praise is for Rust's refactoring story. In C, changing a data structure deep in a codebase is an exercise in terror — you fix the compilation errors, run the tests, and pray. In Rust, the type system and borrow checker guarantee that every reference to the changed structure will be flagged. You can "pull on one end" and see where all the other ends are.
He tells a specific story of a trust-level variable that had been duplicated between two structures during early development (Canvas and Context in the graphics subsystem). For months, code was sometimes updating one, sometimes the other. In C, this would have required a full rewrite to untangle. In Rust, Clippy (the linter) flagged an unused variable, which led bunnie to discover the duplication, and in a couple of hours he had refactored the entire trust computation system cleanly.
This is the real argument against C, and it has nothing to do with memory safety per se. C's abstract machine is so loose — so many things are undefined, so few invariants are enforced — that large C programs become opaque to their own authors. The type system tells you almost nothing about the program's actual behavior. Rust's type system tells you a lot, sometimes more than you wanted to know. The tradeoff is steep learning curves and fight-with-the-compiler sessions, but the payoff is code you can actually maintain at scale.
The CHERI Alternative
There's a hardware-level approach to fixing C's problems: CHERI (Capability Hardware Enhanced RISC Instructions), which adds hardware-enforced fat pointers with bounds and provenance tracking. bunnie considered CHERI for Precursor but the CHERI team was focused on making C better and didn't have bandwidth for Rust support. His assessment: "C needed CHERI much more than Rust needed CHERI." But he's also a fan of belt-and-suspenders security and hopes hardware-enforced capabilities will eventually complement Rust's static guarantees.
The deeper question is whether patching C with hardware features (CHERI), type systems (Substructural Type Systems), or formal verification (Compiler Correctness) is the right approach, or whether we should be designing languages and hardware together from scratch. Chisnall's essay points toward the latter — but as he acknowledges, the sheer volume of legacy C code makes a clean break commercially unrealistic. We're stuck maintaining the PDP-11 fiction for a while yet.
Footnotes
Linked from
- Programming Languages Overview
The C Abstract Machine shows that C hasn't been "close to the metal" since the PDP-11, and the heroic effort to maintain the illusion is the direct cause of Spectre.
- Programming Languages Overview
Concurrency Models bridges to hardware architecture: Erlang's message-passing model, if taken seriously, would let hardware designers shed the enormous complexity of cache coherency protocols — the same point that the C abstract machine article makes…