Data Format Design
The worst bugs in software aren't in the logic — they're in the data formats. A misunderstood schema, an ambiguous encoding, a format that looks simple but has a hundred edge cases: these cause more production incidents than algorithmic errors ever will. YAML, HTML, the clipboard API, and the "social graph" all illustrate the same fundamental tension: formats that try to be human-friendly become machine-hostile, and vice versa.
The YAML Document From Hell
Ruud van Asseldonk's catalogue of YAML footguns is the best argument I know for never using YAML for anything important. Every example is something that a reasonable person would get wrong:
Norway's country code is NO. In YAML, NO is a boolean. So countries: [GB, IE, FR, DE, NO] parses as [GB, IE, FR, DE, false]. This is the "Norway problem," and it's real — it's not a hypothetical.1
Timestamps are detected and parsed silently. The string 22:22 becomes 1342 seconds because YAML interprets it as a time value. Octal notation means 0777 becomes 511. Version numbers like 1.10 become the float 1.1, losing the trailing zero that distinguishes version 1.10 from version 1.1. None of these conversions produce errors; the data is silently reinterpreted.
The boolean problem alone is astonishing. YAML 1.1 recognises y, Y, yes, Yes, YES, n, N, no, No, NO, true, True, TRUE, false, False, FALSE, on, On, ON, off, Off, and OFF as booleans. That's 22 strings that look like data but are secretly booleans. YAML 1.2 narrows this to just true and false, but most YAML libraries still implement 1.1 by default because backwards compatibility is a stronger force than correctness.1
Van Asseldonk's deeper point isn't that YAML has bugs — it's that these aren't bugs, they're features. YAML was designed to be "a human-friendly data serialization standard." The implicit typing (inferring numbers, booleans, dates from unquoted strings) is an intentional design choice to make YAML files read more naturally. But "human-friendly" and "unambiguous" are in direct tension. JSON resolved this tension by being strict and ugly. YAML resolved it by being pretty and treacherous.
The DOM Is Not What You Think
The "HTML is Dead, Long Live HTML" critique goes deeper than syntax into the layout model itself. The argument: HTML and CSS are built on a document model (the DOM) that was designed for academic papers in the 1990s, and we've been stretching that model ever since to do things it was never meant to do — application layouts, responsive design, interactive components.2
The specific technical complaints are about how CSS layout actually works. The display property conflates two different things: the element's outer layout (how it participates in its parent's layout — block, inline, flex item, grid item) and its inner layout (how it arranges its own children — flow, flex, grid). This conflation means that changing how an element is laid out internally can change how it behaves in its parent's context, producing non-local effects that are hard to reason about.
The position property is another example. position: absolute removes an element from the document flow and positions it relative to its nearest positioned ancestor. But "nearest positioned ancestor" is a surprising concept — it might be the parent, or it might be an element ten levels up the tree. And position: fixed positions relative to the viewport, except when the element has a transform ancestor, in which case it's positioned relative to that ancestor instead. These aren't bugs; they're the accumulated complexity of a layout system that evolved by accretion rather than design.
CSS: Separation of What Concerns?
Adam Wathan's "CSS Utility Classes and Separation of Concerns" challenges the orthodoxy that HTML should be semantic and CSS should handle presentation. His argument is that in practice, the separation runs the other way: your CSS classes end up being tightly coupled to specific HTML structures, and changing the HTML requires changing the CSS and vice versa. The supposed "separation of concerns" is actually "separation of technologies" — the concerns are still entangled, just spread across two files instead of one.3
Wathan's alternative (which became Tailwind CSS) is utility classes: instead of .author-bio { display: flex; }, write class="flex items-center" directly in the HTML. This inverts the traditional relationship — HTML becomes the single source of truth for layout, and CSS is just a library of atomic utilities.
The reaction from the CSS community was fierce, but Wathan's observation about how concerns are actually coupled is hard to argue with. The "semantic" class names that traditional CSS methodology recommends (.author-bio, .card, .hero-section) are no more semantic than utility classes — they just push the styling decisions one level of indirection away. And that indirection has a real cost: to understand what an element looks like, you have to read the HTML, find the class name, find the CSS file, find the rule, and check for specificity conflicts with other rules.
The Social Graph Is Neither
Maciej Ceglowski's essay on the social graph takes the format design problem into the social domain. The "social graph" — the idea of representing all human relationships as nodes and edges in a graph — sounds like clean engineering. It isn't.4
The first problem is that it's not a graph. A graph requires well-defined nodes and edges. Nodes are easy (people). But what are the edges? "Friend" means different things on Facebook, LinkedIn, and Twitter. The XFN standard tried to define universal relationship types, but the result is absurd — there's no "nemesis" or "rival" because the spec writers wanted to exclude negativity, no "fiancee" because the gendered spelling caused a spec argument, and a field for declaring yourself an alcoholic but not for mentioning that you smoke pot.4
The second problem is that it's not social. Real human relationships are contextual, asymmetric, and continuously renegotiated. I might be friends with someone at work but not outside of it. I might want to be closer friends with someone than I currently am. My relationship with my AA sponsor is not the same kind of relationship as my friendship with a college roommate, even if both edges have the label "friend." Ceglowski's point is that the entire project of representing human relationships as a static graph with labeled edges is a category error, not a engineering challenge waiting for better labels.4
This connects to the YAML problem and the DOM problem in a precise way: all three are cases where a formal representation (YAML's type system, HTML's layout model, the social graph's edge types) is too rigid to capture the thing it claims to represent. The result is either silent misrepresentation (YAML turning Norway into false) or a proliferation of workarounds that undermine the original design (CSS hacks for layout, "it's complicated" as a relationship status). The lesson is that format design is hard not because the formats are technically inadequate, but because the things they represent are more complex and ambiguous than any format can express.
The Web Clipboard
The web clipboard API is a microcosm of format design tension. When you copy text from a web page, the clipboard stores multiple representations: text/plain for the raw text, text/html for formatted text, and optionally image/png for images. Google Docs reads the HTML; VS Code reads the plain text. This is elegant in theory — each consumer reads the representation it understands.5
In practice, the clipboard API is locked down to exactly three MIME types for security reasons (arbitrary types could leak information between applications), and applications that need custom data formats — Figma, Google Docs, Microsoft Office — work around this by encoding their proprietary data inside the HTML representation, either as hidden attributes or as base64-encoded blobs. This is format design failing and then being patched with hacks, which is format design's natural state.
Footnotes
Linked from
- Legibility And State Power
Once you have the concept, you see it everywhere: Goodhart's Law (metrics as legibility that corrupts what it measures), Manufacturing Consent (media framing as legibility that shapes what's thinkable), the past exonerative tense (grammar making caus…
- Maps All The Way Down
Data Format Design: YAML's implicit typing makes the map (the parsed data) silently different from the territory (what you typed).
- Software Engineering Overview
Data Format Design catalogs the pathologies: YAML's Norway problem, the DOM's conflated layout properties, the social graph that is neither social nor a graph.