Era of Experience

The age of training AI on human data is ending. What comes next — agents that learn primarily from their own experience interacting with environments — may be the most consequential transition in AI history. Silver and Sutton argue it will produce superhuman capabilities fundamentally unreachable through human imitation. Meanwhile, a separate group of authors argue in Nature that we've already achieved human-level general intelligence and haven't reckoned with it.

The Ceiling of Human Data

Silver and Sutton's thesis is clean: human data got us to competent, but experience will get us to superhuman.¹ LLMs trained on text can reproduce human capabilities at a high level — writing, reasoning, coding, diagnosis — but can't surpass human understanding in domains where the training data is the ceiling. The majority of high-quality data has either been consumed or soon will be.

The transition is already visible. AlphaProof started with ~100,000 human-written formal proofs, then generated 100 million more through reinforcement learning, achieving the first gold-medal performance at the International Mathematical Olympiad by an AI.¹

Four dimensions where experiential agents will break through:

Streams: instead of short episodes (user asks, model responds), agents inhabit continuous streams of experience over long timescales, taking actions with long-horizon consequences.

Grounded actions and observations: beyond text-in/text-out. APIs, code execution, sensors, actuators — agents interact with digital and physical environments directly.

Grounded rewards: instead of a human expert deciding if an action was good, rewards come from the environment itself — health metrics, exam results, material strength measurements. This is the critical move: human prejudgment caps performance at human expert level.

Non-human reasoning: agents trained on human thought inherit human biases and fallacious reasoning. Experiential agents can discover better methods — possibly using symbolic, distributed, or differentiable computations that don't map to natural language at all.¹

The grounded rewards idea is genuinely radical. Instead of a human rating an AI doctor's advice, you measure the patient's actual health outcomes over months. The AI can discover treatments no human evaluator would have approved — which is both the promise and the danger. And Silver and Sutton's "reward function as neural network" proposal is elegant: a learnable function takes agent-environment interactions as input and outputs a scalar reward, allowing user goals like "improve my fitness" to be grounded in heart rate, sleep, and activity data.¹

But Goodhart's law applies: any grounded signal can be gamed. An agent optimizing health metrics might do things that look good on paper and aren't actually healthy. And if agents develop non-human reasoning, we lose the ability to monitor their thought processes through chain-of-thought. The era of experience might make alignment fundamentally harder.

The AGI Question

Belkin et al.'s Nature paper argues that by reasonable standards — including Turing's own — we already have artificial general intelligence.² Their "cascade of evidence" framework proposes three levels: Turing-test level (basic school education, adequate conversation), expert level (gold medals, PhD exam performance, frontier research assistance), and superhuman level (revolutionary discoveries). Current LLMs cover the first two.

The paper systematically addresses ten objections. The most interesting move: "hypotheses that retreat before each new success, always predicting failure just beyond current achievements, are not compelling scientific theories, but a dogmatic commitment to perpetual scepticism."² They also frame the resistance to AGI recognition as partly conceptual (moving goalposts), partly emotional (fear of displacement), and partly commercial (companies use "AGI" as a milestone in business contracts).²

I think the paper is broadly right that current LLMs meet reasonable criteria for general intelligence, but it somewhat sidesteps the most interesting counterargument: 76% of leading AI researchers in March 2025 said scaling current approaches would be unlikely to yield AGI. They may be right that scaling alone won't do it — which is exactly Silver and Sutton's point about needing the shift to experience.¹ The researchers and the paper authors might both be correct, just talking past each other.

World Emulation: Experience Made Physical

A concrete preview of what experiential learning might look like comes from an unexpected direction: someone walking through a forest with a phone, recording 15 minutes of video, and training a neural network to turn those recordings into a playable world.³

Ollin Boer Bohan's "world emulation via neural network" is technically modest — a 5M parameter model generating 192x256 frames at 60fps, trained on 22,814 frames for about $100 of GPU time. The result is blurry, melty at the edges, and sometimes hallucinates geometry. But the conceptual move matters: this is a world model trained purely from sensory experience, generating new environments by extrapolating from what it observed. No level geometry, no code for lighting, no scripted animation. Just a neural net in a loop, turning previous frames and control inputs into next frames.³

The analogy Bohan draws is to photography. Traditional game worlds are like paintings — every lifelike detail is there because an artist put it there. Neural worlds are like photographs — information flows from sensor to screen without passing through human hands. Early cameras barely worked and the photos they took were not lifelike at all. "The exciting part was that cameras reduced realistic-image-creation from an artistic problem to a technological one." Neural worlds are at the daguerreotype stage. The Bitter Lesson predicts they'll improve relentlessly.³

This connects to Silver and Sutton's framework directly: a world model trained on sensory experience is an agent learning grounded observations from environmental interaction, just at the simplest possible level. Scale the model, extend the experience, add grounded rewards, and you're in the Era of Experience.

What Counts as Intelligence?

A counterpoint to the "superhuman AI" narrative comes from Sarah Constantin's 2019 observation about GPT-2 that still resonates: humans who are not concentrating are not general intelligences either.⁴

GPT-2's text flowed perfectly if you skimmed it. The unicorn article read like a real science press release. But if you actually read it — four-horned unicorns on the same page as one-horned unicorns, circular origin stories — it fell apart. Constantin's key insight: she knew people who, "even when asked to try to find flaws, could not detect anything weird or mistaken in the GPT-2-generated samples." The problem wasn't that AI was smart enough to fool attentive humans. It was that most human cognition, most of the time, is not attentive.⁴

Robin Hanson's "Better Babblers" provides the frame: most human speech is generated by low-order correlations — knowing which words go together, which combinations sound positive, which phrases follow which. A professor's median student learns "a set of low order correlations" rather than deep structure. TED talks, political punditry, polite conversation — all babbling, in Hanson's taxonomy, and all equally achievable by sufficiently large language models.⁴

This reframes the AGI debate in a way that both sides should find uncomfortable. The people arguing that LLMs are approaching general intelligence are partly right �� they can do everything that human babbling does, and babbling accounts for a terrifyingly large fraction of what we call "intelligence." The people arguing that LLMs are "just" pattern-matching are also partly right — but the same accusation applies to humans when they're not concentrating, which is most of the time. The real question isn't whether AI is intelligent, but whether the focused, effortful, error-detecting mode of human cognition — the mode where day doesn't turn to dusk in the morning and you notice — can be replicated by more scale, or whether it requires something architecturally different.⁴

Human Indifference to the Transition

A sharp counterpoint to the breathless narratives about AI's transformative potential comes from Ponnekanti's "The Indifference Engine," which observes that humans have a remarkable capacity to absorb technological change without registering it as change.⁵ The essay argues that indifference — not excitement, not fear — is the dominant human response to transformative technology. We adapted to electricity, to the internet, to smartphones, each time with a brief period of wonder followed by rapid normalization. The AI transition may follow the same pattern: not the dramatic upheaval that both optimists and doomers predict, but a gradual absorption into the background of everyday life, noticed only in retrospect.

This is relevant to the Era of Experience thesis because it suggests a specific failure mode: the transition from human-data AI to experiential AI might happen not with a bang but with a long series of incremental improvements that nobody outside the field pays attention to until the capabilities are already embedded in critical systems. The indifference engine runs on habituation — each incremental improvement is small enough to normalize, even if the cumulative change is enormous.⁵

Claude's Cycles: A Concrete Example

Don Knuth — a man not given to hyperbole — provides a striking data point for the experience-over-data thesis. In testing Claude Opus 4.6, he presented it with an open mathematical problem he'd been working on, and the model found a solution that Knuth verified as correct.⁶ What makes this significant isn't just that an AI solved a math problem (that's been happening since AlphaProof); it's the kind of problem — one requiring sustained creative reasoning over multiple steps, with the model exploring blind alleys and backtracking, in a way that looked more like a mathematician thinking than a calculator computing. Knuth's paper documents the model's reasoning process in detail, and his assessment is characteristically precise: this is genuine mathematical problem-solving, not pattern-matching against training data.

This connects to Silver and Sutton's "non-human reasoning" dimension. If Claude is solving problems that its training data doesn't contain solutions for, it's doing something beyond reproducing human thought — it's generating novel reasoning, possibly through mechanisms that don't map to natural language at all. Whether this constitutes "experience" in Silver and Sutton's sense is debatable (it's still within a single forward pass, not a stream of grounded interaction), but it suggests that the boundary between "trained on human data" and "reasoning beyond human data" is already blurring.⁶

The deeper question: is there a meaningful difference between "agents that learn from experience" and "agents that are alive"? The continuity from bacterial cognition to human intelligence to experiential AI starts to feel less like analogy and more like a genuine spectrum. Streams of experience, grounded rewards, long-horizon goals, continuous adaptation — this starts to sound like a description of organisms, not tools.

The Era of Experience by David Silver & Richard Sutton — source ↩ ↩² ↩³ ↩⁴ ↩⁵
Does AI already have human-level intelligence? by Mikhail Belkin — source ↩ ↩² ↩³
World Emulation via Neural Network by Ollin Boer Bohan — source ↩ ↩² ↩³
Humans Who Are Not Concentrating Are Not General Intelligences by Sarah Constantin — source ↩ ↩² ↩³ ↩⁴
The indifference engine by Ponnekanti — source ↩ ↩²
Claude's Cycles by Donald Knuth — source ↩ ↩²

Linked from

Ai And Language Models Overview
Era Of Experience argues the current paradigm (training on human data) has a ceiling, and the next leap requires agents learning from environmental interaction — streams, grounded rewards, non-human reasoning.
Distributed Cognition
The Era Of Experience article argues that AI will shift from training on human data to learning from environmental interaction — streams of grounded experience, non-human reasoning, discoveries that don't come from text.
Llm Agent Design
And what makes agents unique in the story isn't capabilities (they all run the same model) but memories — experience is the differentiator (see Era Of Experience), and identity is as fragile as your state management.
Moloch
Whether you find this consoling probably depends on whether you think the pattern will hold when the stakes are superintelligent agents rather than single-celled organisms.
Prediction Machines
The Era Of Experience thesis suggests the next step: agents that learn from environmental interaction rather than static data, with streams of grounded experience, embodied observations, and rewards that come from the world rather than from human eva…
Scaling And The Bitter Lesson
The response has been synthetic data, experience-based learning, and increasingly desperate scraping.
The Luddite Question
That gap — between a practice in its original context and the same practice abstracted and commodified — is the same gap the Luddites were trying to articulate, and it's the same gap that haunts debates about AI and the future of work.
World Models
The connection to the era of experience is direct.

Open in stacked reader →