Undefined Behavior
Here's the deal with C: it's not a low-level language. It hasn't been one for decades. The C abstract machine describes a sequential processor with flat memory — basically a PDP-11 — and modern hardware has spent forty years diverging from that model in every conceivable way. Caches, out-of-order execution, branch prediction, vector units, speculative execution — none of these exist in C's world. The heroic effort to maintain the illusion that C is "close to the metal" is the direct cause of some of the worst security vulnerabilities in computing history.1
Undefined behavior is the crack in the foundation, and it runs deeper than most programmers realize.
The Optimization Fuel
When the C standard says that signed integer overflow is undefined behavior, it's not just saying "we don't know what will happen." It's giving the compiler permission to assume it never happens. And compilers have gotten aggressive about exploiting that permission.2
Take a simple example: a loop with a signed integer index. Because signed overflow is undefined, the compiler can treat that index as if it were a 64-bit value on x86-64, avoiding truncation instructions inside the loop. The signed version compiles to five instructions per iteration; the unsigned version (where overflow wraps and is defined) needs eight. That's a real performance difference, and it comes directly from the compiler exploiting the latitude that undefined behavior provides.
Strict aliasing is another case where UB enables optimization. If two pointers have incompatible types, the compiler can assume they don't alias — they don't point to the same memory. This lets it avoid redundant loads and reorder stores freely. Turn off strict aliasing with -fno-strict-aliasing and the compiler has to assume everything might alias everything, generating precautionary loads everywhere. Roughly 100% of large C and C++ programs violate strict aliasing somewhere, which tells you something about how well the optimization-through-UB contract is actually understood by working programmers.3
Chris Wellons puts it well: undefined behavior is like nitro. Dangerous, volatile, and it makes things go really, really fast. You could argue it's too dangerous to use in practice, but the aggressive exploitation of UB is not without merit.2
The Security Catastrophe
The other side of the UB coin is that mistakes become more punishing, not less. A program that "seems to work" when compiled without optimization can develop security vulnerabilities when the optimizer gets involved, because the optimizer is reasoning about undefined behavior that the programmer didn't know they'd invoked.
The classic example: code that dereferences a pointer and then checks it for null. From the programmer's perspective, this is just a minor ordering bug. From the compiler's perspective, dereferencing a null pointer is UB, so the program can't have reached that point with a null pointer, so the null check is dead code and can be eliminated. CVE-2009-1897 is a real vulnerability that works exactly this way.1
The NUL-terminated string is perhaps the most expensive single design decision in computing history. When Thompson, Ritchie, and Kernighan chose address-plus-NUL-marker over address-plus-length for C strings in the early 1970s, they saved one byte of overhead per string on a PDP-11 with limited core memory. The accumulated cost — in buffer overflows, security vulnerabilities, hardware mitigations, compiler development, and CPU instructions added specifically to handle NUL-terminated strings — is staggering. IBM added special instructions to the ES/9000 in 1992 just for this. Every language runtime that calls open(3) or getaddrinfo(3) still passes NUL-terminated strings to POSIX, regardless of what the language itself uses internally.4
The Abstract Machine Gap
Chisnall's key argument is that the mismatch between C's abstract machine and modern hardware is not just an inconvenience — it's the root cause of Spectre and Meltdown. Modern Intel processors have up to 180 instructions in flight simultaneously. C describes a sequential machine. To bridge that gap, processors use speculative execution, branch prediction, and massive register renaming engines. The speculative execution that allows a CPU to keep its pipeline full from C's serial code is exactly what Spectre exploits.1
A modern high-end core's register rename engine is one of the largest consumers of die area and power, and it can't be turned off while any instructions are running. GPUs don't need it — they get parallelism from explicit threads rather than extracting instruction-level parallelism from scalar code. The cache coherency protocol is another source of massive complexity driven by C's assumption that memory is both shared and mutable by default. An Erlang-style machine where objects are either thread-local or immutable would need a vastly simpler coherency protocol.1
Making C code run fast requires around 200,000 lines of analysis and transform passes in Clang/LLVM, and even then some of those optimizations are technically unsound under the C standard. Loop unswitching can transform dead code into undefined behavior. SROA can expose or hide padding behavior depending on optimization level. The programmers who think they understand what their C code does are, in many cases, wrong — a 2015 survey found that 36% of C programmers gave incorrect answers about basic struct padding behavior.1
The State of the Art
As of 2017, the situation is better than it was but far from solved. Tools like ASan, UBSan, MSan, and TSan have helped us progress from "almost every nontrivial C program executed a continuous stream of UB" to "quite a few important programs seem to be largely UB-free in their most common configurations." But dynamic debugging tools can't help with the worst UBs — the ones nobody knew how to trigger during testing but an attacker figured out in production.3
There are roughly 200 distinct kinds of undefined behavior in C and C++. Each one needs to either be defined, diagnosed at compile time, or have both a sanitizer for debugging and a mitigation mechanism for production. We're nowhere close to that goal. Spatial memory safety violations (buffer overflows) have excellent debugging tools in ASan and Valgrind but can only be fully stopped by actual memory/type safety. Data races are UB in modern C/C++, and the recommended mitigation is refreshingly honest: "Don't create threads."3
The real question lurking behind all of this is whether we should stop trying to make C's abstract machine fast and instead design hardware for programming models that better match modern architectures. Languages with immutable-by-default data, explicit parallelism, and message-passing concurrency would let hardware designers shed enormous complexity. The myth that "parallel programming is hard" would come as a surprise to Erlang programmers who routinely write systems with thousands of parallel components — it's more accurate to say that parallel programming in a language with C's abstract machine is hard, which is just another way of saying C doesn't map to modern hardware well.1
Footnotes
Linked from
- C Abstract Machine
Worse, it can transform dead code (a loop body that never executes) into Undefined Behavior (a branch on an uninitialized variable).
- Forth And Stack Machines
C programmers claim it, but their language requires 200,000 lines of compiler transforms to run fast.
- Language Design Philosophy
The difficulty is not inherent in parallelism — it's an artifact of the abstract machine.
- Maps All The Way Down
*Code is a map.* Undefined Behavior: C's abstract machine is a map of the PDP-11, and modern hardware has diverged so far from it that 200,000 lines of compiler transforms are needed to make the map pretend to match the territory.
- Programming Languages Overview
Undefined Behavior shows how the optimization-through-UB contract creates security vulnerabilities that the programmer cannot predict.