Goodnight Wiki / Software as Infrastructure

Software as Infrastructure

When COVID-19 hit the United States and millions of people filed for unemployment simultaneously, several states blamed the resulting chaos on COBOL — the sixty-year-old programming language running their benefits systems. New Jersey's governor publicly pleaded for volunteer COBOL programmers. The narrative was irresistible: ancient technology crumbles under modern load. Except that's not what happened. When investigators actually looked, it turned out the COBOL backend was fine. The thing that crashed was the Java website that sat in front of it.1

This matters because it tells you something about what we value and what we don't. COBOL was designed in 1959 by a committee led by Jean Sammet (not Grace Hopper, despite popular mythology) with an explicit goal: make code readable enough that anyone could maintain it. MULTIPLY EARNINGS BY TAXRATE GIVING SOCIAL-SECUR ROUNDED is COBOL. It was designed to be infrastructure — to outlast its creators, to be maintained by strangers, to just keep working. And for six decades, it did exactly that. Credit card transactions worldwide still flow through COBOL. The VA disability system serving ten million veterans runs on it. IBM builds COBOL support into its latest mainframes.1

So why was COBOL the scapegoat? Mar Hicks, who wrote the definitive account of this episode, argues it comes down to professional gatekeeping. A language designed to be easy to read is a threat to people whose status depends on code being hard to read. "If it was hard to write, it should be hard to read" is an old programmer joke that's barely a joke. When COBOL was created, programming was seen as feminized clerical work — many of its early users and teachers were women. As the field professionalized and men flooded in during the 1960s, they needed programming to seem difficult, scientific, exclusive. A committee-designed language readable by managers was the opposite of what served their professional interests.1

The real failure wasn't COBOL. It was austerity — state governments had laid off the maintenance engineers years before the pandemic, because when infrastructure works, it's invisible, and invisible things are easy to defund. The blessing and the curse of good infrastructure is the same thing: when it works, nobody notices.

Products versus platforms

Steve Yegge's legendary internal Google rant (accidentally posted publicly in 2011) makes a complementary argument from the other direction. Amazon, he says, does almost everything wrong — terrible facilities, inconsistent hiring, a disaster of a codebase. But around 2002, Jeff Bezos issued a mandate that transformed the company: all teams must expose their data and functionality through service interfaces; no other form of interprocess communication is allowed; all interfaces must be designed to be externalizable; anyone who doesn't comply will be fired.2

This mandate was brutal to implement. Every peer team became a potential denial-of-service attacker. Pager escalation became nightmarish as tickets bounced through twenty services. Monitoring had to become indistinguishable from automated QA. But it forced Amazon to think about everything as a platform first, and that's what gave birth to AWS — which Yegge describes as an "incredible" ecosystem that makes Google's developer offerings look like "what your fifth-grade nephew might mock up."

Yegge's central insight is that "a product is useless without a platform, or more precisely, a platform-less product will always be replaced by an equivalent platform-ized product." Facebook wasn't successful because it built a great product. It was successful because it built a platform that let thousands of other people build products on top of it. Google, Yegge argued, didn't understand this because its own wild success with search had biased the entire culture toward thinking in terms of products.

Moxie Marlinspike's dissection of web3 a decade later arrives at the same conclusion from a completely different angle. The blockchain is supposed to be decentralized, but in practice, virtually all clients interact with it through two companies — Infura and Alchemy — because people don't want to run their own servers and never will. "A protocol moves much more slowly than a platform," Moxie writes. Email is still unencrypted after thirty years; WhatsApp went from unencrypted to full end-to-end encryption in a year. The things that actually win are the things that can iterate quickly, and iteration requires centralized control.3

This is the same tension Hicks identified with COBOL, viewed from the business side. Infrastructure that lasts needs to be stable and maintainable. Platforms that win need to iterate fast. These pull in opposite directions, and the history of software is largely the history of organizations navigating this tension badly.

Who captures the gains

Google's monorepo provides one answer to the maintenance question: invest massively in tooling. Their custom version control system Piper, distributed across ten data centers, serves billions of file reads per day. Their Clients in the Cloud (CitC) system lets developers see the entire codebase as a filesystem without syncing anything locally. Their Rosie tool splits repository-wide refactoring into thousands of individual patches, tests them independently, sends them for code review, and commits them automatically. When Google's compiler team improves garbage collection, every Java developer in the company benefits immediately — no migration step, no version conflicts, no diamond dependency problem.4

But this is a solution available only to organizations with Google's resources. What about everyone else?

The Atlantic's profile of self-automating workers provides a grimly illuminating counterpoint. Programmers who fully automate their own jobs — writing scripts that do in ten minutes what used to take a month — face a peculiar dilemma. Some play League of Legends in their office for six years until someone notices. Others live in fear of discovery, not because the automation is bad, but because they know their employer will claim the IP, absorb the efficiency gains, and either fire them or load them with new work. One coder was fired for "insubordination" after turning a $30,000-a-year business process into a million-dollar-a-year program; the company replaced him with someone cheaper to push the button.5

"The gains from automation have generally been enjoyed not by those who operate the machines, but by those who own them," the article notes, quoting OECD data showing the share of income going to wages has been decreasing since the 1970s while the share going to capital has been increasing. This connects directly to the Luddite question — the original Luddites weren't anti-technology, they were anti-exploitation, and the question of who captures automation's gains remains as unresolved now as it was in 1812.

Bertrand Russell wrote in 1932 that "a great deal of harm is being done in the modern world by the belief in the virtuousness of work." Ninety years later, self-automating programmers feel guilty for not working even when their code performs flawlessly, because the cultural assumption that human labor is inherently virtuous runs deeper than any automation script can reach.

The Big Ball of Mud

There's an architecture that software engineers learn about but never put on their resumes: the Big Ball of Mud. Foote and Yoder's 1999 paper gave it a name, but the phenomenon is universal. A BIG BALL OF MUD is a haphazardly structured, sprawling, sloppy system whose organization, if one can call it that, is dictated more by expediency than design. They compare these systems to shantytowns — squalid and sprawling, built from common materials and unskilled labor, with little concern for infrastructure. Everyone agrees they're bad, but forces conspire to create them anyway.6

The paper's real contribution isn't the diagnosis but the etiology. Throwaway code takes on a life of its own because it works and nobody has time to rewrite it. Piecemeal growth erodes architectures that were once tidy. Shearing layers develop as different parts of a system evolve at different rates, creating fault lines between components. The result is a system where information is shared promiscuously among distant elements, nearly all important state is global or duplicated, and programmers with architectural sensibility refuse to work there.

What makes Foote and Yoder's argument subtle is their refusal to simply condemn the pattern. They note that premature architecture can be more dangerous than none at all — unproved architectural hypotheses become straightjackets that discourage experimentation. A somewhat ramshackle system might be the state of the art for a poorly understood domain. "The class of systems that we can build at all may be larger than the class of systems we can build elegantly, at least at first." This connects directly to Software Correctness At Scale — the gap between what we can build and what we can build well keeps widening. The honest assessment is that most systems are Big Balls of Mud not because engineers are lazy, but because the economic forces — time pressure, shifting requirements, personnel turnover, cost of architecture versus cost of shipping — reliably favor expedience over elegance.6

Forty-Year Software

The most extreme case of software as infrastructure is Voyager 1. In 2017, NASA engineers fired up a set of backup thrusters on the spacecraft that had been dormant since November 1980 — thirty-seven years. The thrusters worked perfectly. To figure out whether this was safe to attempt, the Voyager team "dug up decades-old data and examined the software that was coded in an outdated assembler language." The spacecraft is now in interstellar space, 13 billion miles from Earth, running code that was written when disco was popular and debugged by engineers who are now retired or dead.7

This is what "built to last" actually looks like. Not a hip design philosophy, not a buzzword — but code that works for four decades in an environment where a maintenance visit is physically impossible. The Voyager software was written with the assumption that it would need to outlast its creators, and it has. It's the anti-thesis of move-fast-and-break-things, and the fact that it's running on 1970s hardware with kilobytes of memory makes it a useful corrective to the assumption that software quality requires modern tooling.

The care of systems

What ties these threads together is that software infrastructure is as much a social and political artifact as a technical one. COBOL endured because it was designed for readability and maintained by people who were paid to care for it — until they weren't. Amazon became a platform because a megalomaniac CEO issued a mandate with firing as the enforcement mechanism. Google's monorepo works because the company invests in tooling that most organizations can't afford. Self-automators hide their work because the structures of employment make honesty dangerous.

The pattern that legibility and state power describes — states simplifying messy reality to make it controllable, destroying the local knowledge (metis) that made things work — applies perfectly to software infrastructure. The impulse to replace "legacy" systems with shiny new ones is the software equivalent of High Modernist urban planning. Sometimes the old system, like COBOL, is ugly and unfashionable but functional. Sometimes the new system fails because it was designed by people who didn't understand what the old system actually did. The answer isn't to freeze everything in place — it's to respect the labor of maintenance as much as the labor of creation, and to notice that the most important infrastructure is always the infrastructure you've stopped thinking about.

Footnotes

  1. Built to Last by Mar Hicks — source 2 3

  2. Stevey's Google Platforms Rant by Steve Yegge — source

  3. My first impressions of web3 by Moxie Marlinspike — source

  4. Why Google Stores Billions of Lines of Code in a Single Repository by Rachel Potvin and Josh Levenberg — source

  5. The Coders Programming Themselves Out of a Job by Brian Merchant — source

  6. Big Ball of Mud by Brian Foote and Joseph Yoder — source 2

  7. Voyager 1 Fires Up Thrusters After 37 Years by NASA — source

Open in stacked reader →