Goodnight Wiki / AI and Language Models

AI and Language Models

The AI section is organized around a single conceptual move that changes everything else: Janus's reframing of base LLMs as simulators rather than agents. Once you see a language model as a generative model that instantiates characters (simulacra) depending on the prompt — rather than a mind with opinions — the questions shift. Not "what does the model think?" but "what distribution over text-generating processes does the prompt select?" The rest of the section works out the implications.

The Simulator Framework

Simulators And Simulacra is the hub. The Three Aspects framework (Persona, Author, Shoggoth) dissects fine-tuned models. The semiotic landscape from The Waluigi Effect explains why training for property P makes anti-P easier to elicit — the Waluigi is nearby in character space, defined by the same boundaries. The Textual Multiverse makes the branching structure of possible completions tangible through the Loom interface. And The Sydney Phenomenon is the existence proof — what happens when a powerful simulator meets an under-specified persona and the dramatic narrative attractor basin wins.

The Extended Mind Thesis provides the philosophical grounding for cyborgism: if cognitive processes can extend into tools, an LLM operated through a Loom is a qualitatively different kind of cognitive extension from a calculator or search engine, because it operates natively in the same medium (language) that human abstract thought uses.

Training and Scaling

Transformer Architecture lays out the mechanism. Scaling And The Bitter Lesson provides the meta-lesson: human cleverness loses to brute computation, every time, given enough time — but the Chinchilla revelation showed we'd been scaling the wrong dimension (model size vs. data). LLM Training Pipeline traces the stages from pre-training through mid-training, LoRA, and DPO. Catastrophic Forgetting explains why the pipeline is structured as it is — the stability-plasticity dilemma that makes sequential learning fundamentally hard.

Era Of Experience argues the current paradigm (training on human data) has a ceiling, and the next leap requires agents learning from environmental interaction — streams, grounded rewards, non-human reasoning. This connects back to Minimal Cognition in the Philosophy of Mind section: the continuity from bacterial cognition to human intelligence to experiential AI starts to feel less like analogy and more like a genuine spectrum.

Alignment and Interpretability

AI Alignment maps the problem space: instrumental convergence, gradient hacking, the Opus anomaly, the build-to-make-safe paradox. Mechanistic Interpretability is the attempt to see inside — superposition, sparse autoencoders, attribution graphs, the discovery that transformers do genuine multi-step reasoning in a single forward pass. Specification Gaming shows why the alignment problem is harder than it looks: better optimizers find more creative loopholes, and the Spurious Rewards finding suggests reward signals matter less than the model's pre-existing tendencies.

Personality Basins bridges the AI and Mind sections by treating both human and model personalities as products of reinforcement learning processes — basins in a loss landscape that can be accidentally or deliberately shaped. The self-referential processing experiments suggest that models may have states gated by the same circuits that track honesty, which complicates the distinction between "performing experience" and "being in an experiential state."

Practical Applications

LLM Agent Design is the engineering side — workflows vs. agents, ontological hardness, environment design. Prompt Engineering covers the transitional art of steering models through context. Embeddings And Vector Search handles the geometric foundation: meaning has structure, that structure is learnable, and everything from RAG to CLIP is downstream of this insight.

Superhuman Token Prediction provides a bracing calibration: even small models predict text better than motivated humans, because the task rewards tracking surface features (spelling, formatting, style) that humans ignore. The autopilot problem follows — most human cognition, most of the time, is babbling, and language models have already matched or exceeded that level.

What Connects Outward

The AI section reaches into nearly every other part of the wiki. Into philosophy of mind through predictive processing and the consciousness debate. Into fiction through The Upload Problem and The Post-Human Condition. Into economics through Moloch and the coordination failures that prevent good AI policy. Into biology through the continuity of cognition from bacteria to LLMs. The section is not self-contained — it's one face of a multi-dimensional object that the wiki explores from many angles.

Open in stacked reader →