Recursion Schemes
There's a moment in learning Haskell where you realize that fold and unfold are the same pattern in mirror image, and that both are instances of something more general. Recursion schemes are the formalization of that insight: they factor recursion out of your data structures, leaving you to specify only the semantics at each step. The recursion itself is handled by the scheme.
The names are celebrations of the ridiculous -- catamorphism, anamorphism, hylomorphism, paramorphism -- and they have been accused, probably correctly, of being off-putting. But the technical precision matters. The moment someone distinguishes a paramorphism from a catamorphism, you know exactly what additional information is being threaded through the fold.
Pattern Functors and Fixed Points
The key trick is defining your recursive data type in two steps. Instead of the standard recursive definition:
data Natural = Zero | Succ Natural
You first define a non-recursive "pattern functor" where the recursive position is replaced by a type parameter:
data NatF r = ZeroF | SuccF r
Then you recover the recursive type by wrapping it in Fix, the standard fixed-point type: type Nat = Fix NatF. This seems like pointless indirection until you realize what it buys you: the ability to reason about the recursive structure separately from the data at each level.1
Fix has a beautifully simple definition: newtype Fix f = Fix (f (Fix f)). Two functions, fix and unfix, add and remove one layer of recursion. Every recursion scheme is built on top of this layering and unlayering.
The Four Workhorses
Catamorphisms (via cata) are generalized folds. You provide an "algebra" -- a function from one layer of your pattern functor to the result type -- and the recursion scheme handles walking the structure bottom-up. The crucial point is that the algebra itself is not recursive. You define what to do at each step, and cata handles the recursion:
natsum :: Nat -> Int
natsum = cata alg where
alg ZeroF = 0
alg (SuccF n) = n + 1
Anamorphisms (via ana) are generalized unfolds -- the mirror image. You provide a "coalgebra" that produces one layer of structure from a seed value, and the scheme handles corecursive production.
Hylomorphisms (via hylo) compose an anamorphism with a catamorphism: produce a structure then immediately consume it. This is surprisingly common in practice -- many algorithms naturally decompose into "build an intermediate structure, then reduce it." The hylomorphism lets you express this without materializing the intermediate structure.
Paramorphisms (via para) are generalized folds with access to the original subtree at each step, not just the already-folded result. This is what you need when your reduction at a particular node depends on the structure below it, not just the already-computed summary.1
Practical Value
The practical selling point is separation of concerns. Once you've factored out recursion, your interpreters become lists of cases with no recursive calls. Each case says "given the children's results, here's this node's result." This is easier to read, harder to get wrong, and trivially composable -- you can swap out one algebra for another without touching the traversal machinery.
Jared Tobin makes the case that recursion schemes also "suss out similarities in structure between disparate problems." A merge operation on sorted lists and a Fibonacci computation have the same recursive shape; expressing both as anamorphisms makes this visible in a way that hand-written recursion obscures.1
Edward Kmett's recursion-schemes library implements all of this with a sophisticated type-level machinery centered on the Base type family and Foldable/Unfoldable type classes. The library is well-engineered but sparsely documented -- which is, I think, the most Kmettian possible outcome. He solved the most general problem first, and left the specific explanations as an exercise.
Connection to Category Theory
The names come from category theory. A catamorphism is the unique morphism from an initial algebra; an anamorphism is the unique morphism to a final coalgebra. If that means nothing to you, don't worry -- the computational intuition (fold and unfold) is all you need. But for those who care about the Curry-Howard correspondence, there's a deep connection: recursion schemes are to recursive data types what structural induction is to proofs over inductive types. The pattern functor is the induction principle.
The connection to neural networks is also suggestive. Chris Olah observed that certain neural network architectures have the same shape as recursion schemes -- tree-recursive neural nets are paramorphisms, encoder-decoders are hylomorphisms. Whether this is a deep insight or a surface analogy remains to be seen, but it's the kind of cross-domain resonance that makes recursion schemes intellectually compelling beyond their immediate practical value.
Footnotes
Linked from
- Programming Languages Overview
The section's intellectual backbone runs from Curry-Howard through Substructural Type Systems to Pointer Provenance and Recursion Schemes.