Pointer Provenance

Here's something that sounds simple but turns out to be one of the hardest problems in systems programming: where did this pointer come from? The answer matters far more than you'd think, because compilers -- and increasingly hardware -- use a pointer's origin story to decide what optimizations are legal and what memory accesses are valid.

The Aliasing Problem

All of this starts with aliasing. Two pointers "alias" when they point to overlapping memory, which means a write through one might affect a read through the other. Compilers need to know about aliasing to do even basic optimizations -- caching values in registers, reordering loads and stores, eliminating redundant reads.¹

The important distinction: aliasing analysis is actually about accesses, not pointers. Two pointers that technically point to the same memory don't alias in any meaningful sense if one is never used, or if both only read. This is why Rust's fundamental split between &mut (unique mutable reference) and & (shared immutable reference) is so powerful for optimization: it gives the compiler static aliasing information that C programmers can only provide through restrict hints and prayer.

Allocations and Sandboxes

Memory models define the concept of an allocation -- a variable declaration, a heap allocation, a stack frame -- as a sandbox with one True Name. A freshly created allocation is unaliased, and the only legal way to access its memory is through pointers derived from its original name. This chain of custody from allocation to derived pointer is pointer provenance.¹

This is why LLVM's getelementptr inbounds instruction exists: it's a promise that "this pointer offset won't escape the allocation sandbox." Almost every field access in Rust ((*ptr).my_field) compiles to a GEP inbounds. Break this promise, and you get undefined behavior -- not as a theoretical concern, but as real miscompilations that are extremely hard to debug.

The Integer Cast Problem

Here's where things get weird. Rust currently allows this:

let mut addr = my_ptr as usize;
addr = addr & !0x1;
let new_ptr = addr as *mut T;

This is bog-standard tagged-pointer manipulation, and it's conceptually broken. When you cast a pointer to an integer, you lose its provenance -- the integer has no memory of which allocation the pointer came from. When you cast back, the resulting pointer has no provenance at all. The compiler has to either pessimistically assume the new pointer could alias anything (destroying optimizations) or optimistically assume it doesn't (potentially miscompiling your code).¹

This is an active area of language design. Rust's solution-in-progress involves with_addr and map_addr methods that let you manipulate the numeric address of a pointer while preserving its provenance. You get the tagged-pointer arithmetic you want, and the compiler gets the provenance information it needs. It's a small API change with profound implications for what the optimizer is allowed to do.

CHERI: Provenance in Hardware

CHERI (Capability Hardware Enhanced RISC Instructions) makes provenance physical. Every pointer in CHERI is 128 bits wide: 64 bits of actual address plus metadata encoding the valid memory range. There's also a hidden 129th bit -- a "metadata is valid" tag maintained by the hardware, similar to ECC RAM. Try to modify a pointer through non-pointer instructions (like memcpying random bytes over it), and the hardware silently invalidates the tag. Try to dereference the corrupted pointer, and the CPU faults.¹

This is elegant: the allocation sandbox model that compilers reason about abstractly, CHERI enforces concretely. You literally cannot forge a pointer to memory you shouldn't access, because the hardware tracks which pointers are legitimately derived from which allocations.

The cost is real. Every pointer doubles in size, which means more cache pressure and different data structure layouts. But for security-critical systems, the guarantee is powerful. ARM actually shipped the Morello chip -- a full CHERI implementation -- which is more than many academic architectures can claim.

Rust's Ownership and the Deeper Story

Rust's borrow checker is, in an important sense, a compile-time provenance tracker. When you take a &mut reference, you're creating a derived pointer with exclusive access to an allocation (or part of one). The borrow checker ensures that no other live reference has overlapping provenance, which is exactly the aliasing guarantee that lets the compiler optimize aggressively.²

Armin Ronacher makes the complementary point from the practitioner's side: the biggest practical difference between Rust and C++ is that objects in Rust move. A C++ constructor operates on memory that's already been placed somewhere; a Rust constructor returns a value by move, and the caller decides where it lives. This means you can't have self-referential structs (because the self-reference would be invalidated by a move), which in turn means Rust code tends to use handles -- stored offsets rather than raw pointers -- recomputing addresses on demand.³

The Pin system (RFC 2349) partially addresses this, allowing types to declare "I promise not to move," but it's a complex addition to the type system. The underlying tension between move semantics and self-reference is, I think, one of Rust's most fundamental design constraints, and it flows directly from the decision to make provenance and ownership the foundation of memory safety rather than relying on garbage collection or runtime checks.

Monotonicity Types: Provenance for Distributed State

There's an interesting parallel in distributed systems. Monotonicity types track how function inputs relate to outputs across a partial order -- if you increase the input, does the output increase too? This matters for CRDTs and eventual consistency: a monotone function applied to a replicated counter gives you an answer that, even if stale, will never flip back once it becomes true.⁴

The connection to provenance is structural: both are about tracking "where did this value come from and what transformations were applied to it" in order to guarantee properties about the result. Pointer provenance guarantees memory safety; monotonicity types guarantee convergence. Both use the type system to enforce invariants that would otherwise require runtime checking or manual reasoning.

Rust's Unsafe Pointer Types Need An Overhaul by Aria Beingessner -- source ↩ ↩² ↩³ ↩⁴
The Pain Of Real Linear Types in Rust by Aria Beingessner -- source ↩
You can't Rust that by Armin Ronacher -- source ↩
Monotonicity Types by Kevin Clancy -- source ↩

Linked from

Maps All The Way Down
Pointer Provenance: where a pointer came from (its map-history) determines what territory it's allowed to access.
Programming Languages Overview
The section's intellectual backbone runs from Curry-Howard through Substructural Type Systems to Pointer Provenance and Recursion Schemes.
Programming Languages Overview
Pointer Provenance shows that even the concept of "what a pointer is" is under-defined in ways that cause real miscompilations.

Open in stacked reader →