SAT Solvers
There's a class of tools that most programmers have never used that can solve problems they currently write hundreds of lines of ad-hoc search code for. Modern SAT solvers — programs that determine whether a Boolean formula can be satisfied — are fast enough to handle instances with millions of variables, and the range of problems that reduce naturally to SAT is staggeringly broad.
What SAT Actually Is
A SAT problem is a Boolean formula in conjunctive normal form: a set of clauses, each clause a disjunction of literals (variables or their negations). Can you assign true/false to the variables such that every clause has at least one true literal? It's the canonical NP-complete problem — every problem in NP reduces to it — which means it's theoretically intractable but practically solvable for an enormous range of real instances.1
The gap between worst-case theory (exponential) and average-case practice (often fast) is the whole story of SAT solving. Random 3-SAT at the phase transition is genuinely hard. But structured instances from real applications — hardware verification, scheduling, planning, cryptanalysis — have exploitable regularity that modern solvers eat for breakfast.
CDCL: The Engine
The core algorithm in modern SAT solvers is Conflict-Driven Clause Learning (CDCL), and understanding it is worth the effort because the ideas generalize far beyond Boolean satisfiability.1
Start with the naive approach: pick a variable, guess a value, propagate forced consequences (unit propagation), repeat until you either satisfy everything or hit a contradiction. When you hit a contradiction, backtrack and try the other value. This is DPLL (Davis-Putnam-Logemann-Loveland, 1962), and it works but doesn't scale.
CDCL's insight is that when you hit a contradiction, you shouldn't just backtrack — you should learn why the contradiction occurred. By analyzing the chain of implications that led to the conflict (the "implication graph"), you can derive a new clause that prevents the solver from ever making the same mistake again. This learned clause gets added to the formula permanently.
The practical effect is dramatic. Without clause learning, the solver might explore the same dead end thousands of times from different branches of the search tree. With it, each failure teaches the solver something about the structure of the problem, and search space gets pruned exponentially. The solver is literally getting smarter as it works.
Two other techniques matter enormously. Restarts — periodically abandoning the current search and starting over with all learned clauses intact — sound wasteful but prevent the solver from getting stuck in unproductive regions of the search space. The learned clauses carry all the useful information, so restarting loses nothing of value. And variable branching heuristics (especially VSIDS — Variable State Independent Decaying Sum) focus the search on variables that appear frequently in recent conflicts, which empirically correlates with being important for the hard part of the problem.
The Practical Part
The point of the "fast, neat and underused" framing is that SAT solvers are a dramatically underutilized technology. Any problem that can be expressed as "find an assignment to Boolean variables that satisfies a set of constraints" is a SAT problem. Many problems that don't obviously involve Booleans can be encoded:1
- Scheduling: time slots as variables, conflicts as clauses
- Hardware verification: circuit correctness as Boolean satisfiability
- Planning: action sequences as variable assignments over time steps
- Puzzle solving: Sudoku, n-queens, graph coloring — all trivial SAT encodings
- Configuration: "does there exist a valid configuration of this system with these requirements?"
The encoding matters. A naive encoding might produce a formula with redundant clauses that obscures structure the solver needs. A good encoding preserves implications — when setting one variable forces another, the encoding should make this visible to unit propagation without requiring search. The art of using SAT solvers is in the reduction: translating your problem into a formula that preserves the structure the solver can exploit.
SMT (Satisfiability Modulo Theories) extends SAT to richer domains — integers, arrays, bitvectors, real arithmetic — by coupling a SAT solver with domain-specific theory solvers. An SMT solver like Z3 can determine satisfiability of formulas involving "if x + y > 5 and y < 3, is there a valid x, y, z such that..." This makes the approach accessible to problems that would require awkward bit-blasting in raw SAT.
Why This Matters
I think SAT solvers represent a genuinely important lesson about the structure of computation. Information And Computation discusses P vs NP — the question of whether every problem whose solutions are easy to verify also has solutions that are easy to find. SAT is the canonical NP-complete problem, so it should be intractable. But CDCL solvers routinely handle industrial instances with millions of variables in seconds.
The resolution isn't that P = NP. It's that real-world instances aren't worst-case instances. The structure that makes them arise from real problems — hardware circuits, scheduling constraints, physical systems — is precisely the structure that CDCL can exploit. Random instances near the phase transition really are hard. But the problems people actually need to solve almost never look random.
This is a deep point about the relationship between theoretical complexity and practical computation: the worst case is real but rare. The practically relevant question isn't "how hard is the hardest instance?" but "how hard are the instances that actually arise?" And for SAT, the answer is: far easier than the worst case, often by many orders of magnitude.
Footnotes
Linked from
- Simulation And Emergence Overview
SAT Solvers and Hash Function Design are the practical side — what happens when you apply computation to search, and what you discover when you let search discover your algorithms for you.