Bayesian Epistemology

A mathematical framework for rational belief that was discovered, abandoned, rediscovered, classified, persecuted, and eventually vindicated — with real casualties along the way. The history alone would be worth studying. But the framework also turns out to be a surprisingly powerful lens for understanding broken cognition: delusions, schizophrenia, and the strange specificity of what goes wrong when Bayesian updating fails in the brain.

The Framework in Three Minutes

Two norms.¹ Probabilism: your credences (degrees of belief) should form a coherent probability distribution — non-negative, summing to one. Conditionalization: when you get new evidence, zero out possibilities incompatible with it and rescale the rest. That's it. Everything else follows.

Surprising evidence is powerful evidence — this falls out of the math automatically. Eddington's observation of light bending during a 1919 solar eclipse was deeply surprising, which meant conditionalization on it required a dramatic credence shift toward General Relativity. If you'd already expected the result, it wouldn't have moved the needle.¹

The foundations are contested in interesting ways. Dutch Book arguments say your credences should be probabilistic because otherwise a bookie can construct bets that guarantee you lose money. Accuracy-dominance arguments say non-probabilistic credences are always dominated in accuracy by some probabilistic alternative. Both have problems — Dutch Books give pragmatic rather than epistemic reasons; accuracy arguments depend on particular scoring rules. And the problem of priors remains genuinely deep: the two core norms are too weak to tell you what prior to start with.¹

The History as Thriller

This is where it gets wild.²

Thomas Bayes discovered the core idea in the 1740s and then abandoned it. It was found in his papers after death and published by his friend Richard Price. Laplace independently rediscovered it, gave it modern form, applied it to everything from birth records to planetary masses, then moved on to frequentist methods when his central limit theorem showed the approaches converged with enough data.

For the next century, Bayesianism was declared dead. George Chrystal in 1891: Laplace's principle "being dead, should be decently buried out of sight." Fisher's frequentism became the standard, and Fisher — the arch-frequentist — wielded his influence with quasi-religious fervor.

And yet practitioners kept finding that Bayes worked when nothing else did. French artillery officers used it to aim their guns. Poincaré used it to defend Dreyfus. Insurance actuaries used it because Fisher's maximum likelihood gave zero probability to non-events, producing premiums too low to cover future costs.²

Then came the war. Alan Turing used Bayesian methods to crack Enigma, inventing the "ban" as a unit of evidence and using sequential updating to reduce the search space from 336 possible wheel positions to as few as 18. This was arguably the most consequential application of a mathematical idea in human history. Churchill ordered all evidence destroyed. For decades afterward, Bayesianism remained classified or closeted.²

John Tukey predicted elections for NBC using Bayesian methods for 18 years while publicly denying he was a Bayesian and forbidding his team from discussing their methods. Norman Rasmussen used Bayes to estimate nuclear plant accident risk — something that had never happened, so frequentist methods literally couldn't apply — but avoided the word "Bayes" in his report because it would have been dismissed.²

Fisher, meanwhile, couldn't accept that cigarettes caused lung cancer and hypothesized that lung cancer might cause smoking. Cornfield used Bayesian methods to establish the link. The frequentist-Bayesian war had literal public health consequences.²

The rehabilitation came through computation. Markov Chain Monte Carlo methods (discovered in the 1940s for nuclear weapons work, then independently rediscovered by statisticians) finally made Bayesian inference computationally tractable for complex problems. By the late 20th century, Bayes was everywhere — but the sociology of the suppression is worth remembering. A fundamental tool of rationality was repeatedly discovered, lost, and buried for reasons that were largely social rather than mathematical.

Broken Bayes: Delusions as Failed Updating

Scott Alexander's synthesis of the two-factor theory of delusions is where the framework gets genuinely surprising.³

Take Capgras syndrome: the patient believes their spouse has been replaced by an impostor. The first factor is an abnormal perception — Capgras patients lose the emotional response to familiar faces, so seeing their wife genuinely doesn't feel right. But this alone shouldn't produce delusion. Most people with the same emotional disconnection (it happens in some brain injuries) don't conclude their spouse is an impostor, because the prior probability of that hypothesis is absurdly low.

The second factor is damage to the right dorsolateral prefrontal cortex, which McKay argues handles priors. Without the ability to weigh base rates, patients fall into what amounts to a Super Base Rate Fallacy: they select hypotheses solely on explanatory adequacy (likelihood), ignoring how implausible the hypothesis is to begin with. The impostor hypothesis perfectly explains the emotional flatness, so they adopt it.³

And new evidence can't rescue them — this is the horrifying part. If you can't penalize complexity, then for any counter-evidence, the most explanatorily adequate hypothesis is always a more elaborate version of the delusion. The impostor is also telepathic. The doctor is in on the conspiracy. You're optimizing for fit without any brake on complexity, and the result is a belief system that's perfectly calibrated to its evidence and completely insane. (This is the Bayesian version of being stuck in a personality basin — and unlike personality basins, there's no gradient update that can get you out, because the basin absorbs all evidence.)³

The same brain region (RDPC) is shut down during dreaming — which might explain why we don't notice our dreams are absurd. We accept dream logic because the prior-enforcement module is offline. Lucid dreaming involves switching it back on. The RDPC might literally be the brain's "prior enforcement" module.³

There's an n=1 case Alexander mentions of a rationalist with schizophrenia using Bayes to convince themselves a delusion was false — explicitly computing that the prior probability of being monitored by the government was lower than the prior probability of psychotic symptoms causing false beliefs about being monitored. If this is replicable, it would be one of the most practically important applications of rationality training ever demonstrated.³

Beyond Empirical Uncertainty: Logical Induction

Standard Bayesian reasoning handles empirical uncertainty well — you don't know whether it will rain, but you can update on weather data. But what about logical uncertainty? You don't know whether Goldbach's conjecture is true, and no amount of evidence will help — the answer is fixed by mathematics, you just can't compute it yet.

Solomonoff's theory of inductive inference offered an (uncomputable) ideal for empirical uncertainty, but had nothing to say about uncertainty over mathematical and logical claims. MIRI's logical induction framework, developed primarily by Scott Garrabrant, closes this gap.⁴

The setup is beautifully market-flavored: each logical sentence is treated as a stock worth $1 if true and $0 if false. A reasoner's beliefs become market prices — believing a claim at 50% means shares trade at 50 cents. The key criterion: no polynomial-time trading strategy with finite risk tolerance should be able to earn unbounded profits from the market over time. This is the Dutch Book argument extended from probability to logic.⁴

What falls out is remarkable. A system satisfying this criterion will, among other properties: assign high probabilities to provable conjectures before the proofs are found (it learns to trust patterns like "Ramanujan's conjectures keep turning out to be true"); respect logical relationships between sentences long before they're proven (it learns that "this program outputs 3" and "this program outputs 4" are mutually exclusive); become well-calibrated (on sequences it assigns ~30% probability to, about 30% turn out true); and — perhaps most strikingly — learn to trust its own future beliefs, sidestepping the paradoxes of self-reference that plague naive reflection.⁴

The framework is theoretical rather than practical (the algorithm is computable but wildly inefficient). But it suggests something deep: the "no Dutch Book" principle that motivates Bayesian probability theory may be just the beginning of a much richer set of rationality constraints that emerge from the idea of not being exploitable by an efficient adversary. Where classical Bayesian epistemology gives you norms for ideal agents with unlimited computation, logical induction begins to characterize what good reasoning looks like for agents with real computational limits — which is to say, all of us.

Bayes in the Brain: Neurotransmitters as Priors

The framework stops being abstract when you realize the brain might literally implement Bayesian inference at the biochemical level. Scott Alexander's synthesis of Corlett, Frith & Fletcher (2009) maps specific neurotransmitters onto specific terms in Bayes' theorem.⁵

In their model, perception is a "handshake" between bottom-up sensory data (glutamate via AMPA receptors) and top-down predictions (glutamate via NMDA receptors). Dopamine encodes prediction error — the confidence interval on a given perception. When the handshake succeeds, the prediction error is small and experience feels normal. When it fails, something registers as salient and surprising, demanding attention and model revision.

The pathological cases are revealing. Increase AMPA (more sensory noise) while decreasing NMDA (weaker priors), and you get the delusions of reference characteristic of schizophrenia — random stimuli get flagged as deeply significant because noisy bottom-up signals overwhelm timid top-down models. This predicts that ketamine (which has exactly this pharmacological profile) should produce paranoid psychosis, which it does. It also predicts that schizophrenics should be better at seeing through certain optical illusions — like the hollow-face illusion, where a concave mask looks convex because the brain's strong prior for right-side-out faces overrides the visual evidence. Schizophrenics and marijuana users see through it more reliably than neurotypicals, because their prior-enforcement module is running at lower gain.⁵

Increase both AMPA and NMDA — more noise but also stronger pattern-matching — and you get the vivid hallucinations of LSD: abundant data demanding explanation, coupled with an overenthusiastic narrative engine that fits everything into a grand cosmic pattern. Turn the NMDA dial far enough and the entire world collapses into a single unified meaning — the mystical experience, which is Bayesian overfitting taken to its logical extreme.⁵

For autism, Lawson, Rees and Friston propose the opposite problem: confidence intervals that are too narrow. The autistic brain demands near-exact matches between prediction and perception. A neurotypical brain would shrug off the slightly-different feeling of a shirt shifting on skin — close enough, handshake succeeds. An autistic brain flags it as a prediction error demanding attention. This explains sensory sensitivity, routine-seeking, and stimming (which generates maximally predictable sensory input that drives prediction error to zero). It even explains social difficulty: other people are the least predictable things in the environment, and narrow confidence intervals make unpredictable stimuli overwhelming.⁵

This is the same framework described in Predictive Processing, but the biochemical specificity adds something. If dopamine is literally encoding precision, then amphetamines (dopamine agonists) should narrow confidence intervals and produce increased self-confidence — which they do, right up to grandiose delusions. If NMDA shutdown occurs during dreaming, that explains why we accept absurd dream logic without question — the prior-enforcement module is offline, just as it is in the Capgras syndrome patients discussed above.⁵

I should note: this is speculative neuroscience, and Alexander himself flags his uncertainty. The model has clear gaps — why do psychotics develop stable delusions (strong priors) if the problem is supposed to be weak priors? Why don't autistic sensory-prediction-errors produce delusions of reference? But as a framework for thinking about how Bayesian inference might be physically implemented, and how it breaks in different ways depending on which parameters you perturb, it's remarkably generative.

Different Worlds as Different Priors

If priors are implemented in brain chemistry, and brain chemistry varies across individuals, then people inhabiting the same physical environment may be parsing it with genuinely different Bayesian machinery. Alexander's "Different Worlds" essay pushes this to its logical conclusion.⁶

A paranoid schizophrenic interprets every ambiguous social signal negatively. A Williams Syndrome patient — biologically incapable of distrust — gets into a stranger's car without hesitation. These are extreme cases, but they sit on a continuum. Between clinical paranoia and pathological trust, there's a vast range of healthy variation in how people weight ambiguous social data. A boss calling your work "okay" could be a compliment or an insult; a friend cancelling plans might mean something came up or the friendship is dying. The correct interpretation depends on your priors, and priors vary.⁶

Alexander noticed this in his own psychiatric practice: his patients never had the dramatic emotional meltdowns the textbooks predicted, while his colleague's patients had them constantly. Eventually he realized he was unconsciously projecting a "niceness field" — a set of social signals that made people calm and rational around him, the same way his colleague's warmth invited emotional expression. Neither was wrong. They were generating different data by existing differently in the world.⁶

This has implications for every debate about subjective experience. Some people describe a world of backstabbing Machiavellians; others describe basically-good people hampered by communication difficulties. Both are accurately reporting their experience. The variation isn't just perceptual — it's also generative. The same person who perceives others as hostile may unconsciously elicit hostility, confirming their priors in a self-reinforcing loop. This is the Bayesian version of a self-fulfilling prophecy, and breaking out of it requires the kind of prior-overriding that calibration-and-measurement training tries to cultivate.

Bayes in Practice: Calibration and Common Errors

The practical side of Bayesian reasoning shows up not in abstract philosophy but in the everyday failure mode of confusing P(H|E) with P(E|H). A mammogram is 80% accurate and a woman tests positive. Most doctors estimate her cancer probability at 70-80%. The correct answer is 7.8%, because the base rate (the prior) is only 1%. This error kills people — and studies show that only about 15% of doctors get it right, a finding that has replicated for decades.⁷

The same structure underlies the delusion analysis above. In Capgras syndrome, the patient selects the hypothesis with highest likelihood (the impostor explanation perfectly fits the emotional flatness) while ignoring how catastrophically implausible the hypothesis is a priori. Doctors making the mammography error are doing the same thing in miniature — treating P(positive|cancer) = 80% as if it were P(cancer|positive). The base rate fallacy may be the single most consequential failure of human reasoning, showing up everywhere from medical diagnosis to courtroom evidence to intelligence analysis.

For the practical art of calibrating your own Bayesian priors and reducing this kind of error, see Calibration And Measurement.

Bayesian Epistemology by Stanford Encyclopedia — source ↩ ↩² ↩³
A History of Bayes' Theorem by lukeprog — source ↩ ↩² ↩³ ↩⁴ ↩⁵
Bayes for Schizophrenics by Scott Alexander — source ↩ ↩² ↩³ ↩⁴ ↩⁵
Logical Induction by Nate Soares (MIRI) — source ↩ ↩² ↩³
It's Bayes All The Way Up by Scott Alexander — source ↩ ↩² ↩³ ↩⁴ ↩⁵
Different Worlds by Scott Alexander — source ↩ ↩² ↩³
Warning Signs in Experimental Design and Interpretation by Peter Norvig — source ↩

Linked from

Calibration And Measurement
Only about 15% of doctors get this right, and researchers keep replicating the finding. This is the same failure mode described in Bayesian Epistemology — ignoring priors and selecting hypotheses solely on explanatory adequacy.
Calibration And Measurement
Weather forecasting works because it institutionalized the Bayesian insight that all predictions are probability distributions, not point estimates — and built a culture that rewards calibration over confidence.
Cognitive Biases
Some of what looks like Bayesian Epistemology failure at the individual level is actually the system working as designed — preserving and transmitting knowledge that no single reasoner could justify from scratch.
Cognitive Biases
The Bayesian prior enforcement module in your brain (see Bayesian Epistemology) may be evolutionarily tuned not to treat all evidence equally, but to strongly weight culturally transmitted information, because that information has already survived a …
Cognitive Biases
Scott Alexander's "Different Worlds" argues that social experience is underdetermined in the same way that Bayesian perception is underdetermined — your priors shape what you encounter, and what you encounter confirms your priors.
Decision Theoretic Paradoxes
This connects to a broader theme in Bayesian Epistemology: the choice of which average to compute isn't a mathematical question but a modeling question.
Information And Computation
The connection between Bayesian Epistemology and thermodynamics isn't a metaphor; it's literal.
Philosophy Of Mind Overview
Bayesian Epistemology maps specific neurotransmitters onto terms in Bayes' theorem (glutamate as evidence, dopamine as precision, NMDA as priors).
Physics Overview
To Bayesian Epistemology through Cox's theorem and the uniqueness of probability as a reasoning framework.
Prediction Machines
*Delusion and specification gaming.* In Bayesian Epistemology, delusions arise when the prior-enforcement module is damaged: the brain selects hypotheses solely on explanatory adequacy (likelihood) while ignoring base rates.
Prediction Machines
The productive direction: the Bayesian epistemology mapping of neurotransmitters to Bayesian quantities (glutamate as evidence, dopamine as precision, NMDA as priors) points toward analogous functional roles in transformer dynamics — attention as pre…
Predictive Processing
Modern predictive processing formalizes this — essentially the Bayesian brain hypothesis implemented in neural architecture, with conditionalization on evidence, precision-weighted priors, and belief updating all the way down.
Quantum Bayesianism
It's the unique consistent way to reason under uncertainty — the same conclusion that the Bayesian Epistemology article reaches from the other direction, through Dutch Book arguments and accuracy dominance.
Rationality And Decision Making Overview
Bayesian Epistemology provides the mathematical backbone: probabilism, conditionalization, and the extraordinary history of a framework that was discovered, abandoned, classified, persecuted, and eventually vindicated.
Sleep As Pseudorehearsal
The Bayesian epistemology article adds the neurochemical layer: the RDPC (right dorsolateral prefrontal cortex) acts as the brain's prior-enforcement module, and it shuts down during sleep and dreaming.
Thought Experiments As Fiction
It emerges from a culture saturated in thought experiments — trolley problems, Newcomb's paradox, the simulation argument — and asks: what if we took these seriously enough to feel them? The answer, consistently, is that feeling them changes what the…

Open in stacked reader →