Goodnight Wiki / Program Synthesis

Program Synthesis

The promise of program synthesis is seductively simple: tell the computer what you want, not how to do it. Let the machine figure out the implementation. It's the ultimate abstraction — a programmer absolved from all implementation details. We are not going to be synthesizing entire applications from scratch any time soon, but for small, performance-critical code fragments, synthesis is already practical and surprisingly effective.1

The CEGIS Loop

The dominant paradigm in modern program synthesis is counterexample-guided inductive synthesis, or CEGIS. The idea is elegant: a synthesizer produces a candidate program that might satisfy some specification, a verifier checks whether it actually does, and if not, the verifier produces a counterexample — a specific input where the candidate fails — which feeds back to the synthesizer to guide its next attempt. This loop repeats until the verifier can't find any more counterexamples.1

What makes CEGIS surprising is how few iterations it typically needs. The space of programs is enormous, but the counterexamples are information-dense — each one eliminates huge swaths of the search space. In practice, synthesis often converges in a handful of iterations even for tricky bit-manipulation programs.

The specification can take several forms, each with different tradeoffs. A complete formal specification in first-order logic is the gold standard but rare in practice. Input-output examples are easier to provide but risk under-specification — there may be many programs consistent with a small set of examples. A reference implementation (an "oracle") sounds paradoxical — why synthesize something you already have? — but makes sense when the goal is optimization: give the synthesizer a naive implementation and let it produce an equivalent but faster one.1

Three Flavors

Oracle-guided synthesis works by assembling a program from a library of components — add, subtract, square root, whatever — and using an SMT solver to decide how to wire them together so that the result agrees with the oracle on all test cases. The verification step is clever: instead of checking correctness directly, it asks whether there exists another program that also satisfies the test cases but disagrees with the candidate on some input. If such a program exists, the distinguishing input becomes a new test case, narrowing the search. If no such program exists, the candidate is unambiguous and (assuming the components are sufficient) correct.1

Stochastic superoptimization takes a completely different approach, using Markov-chain Monte Carlo sampling to wander through the space of programs. Start with a random program. Randomly mutate an instruction — change an opcode, swap two instructions, insert or delete one. Use a cost function that measures both correctness (how closely the mutant matches the target) and performance (how fast it runs). Accept or reject the mutation probabilistically, biased toward better programs. This is essentially simulated annealing applied to code. It shouldn't work — the space is astronomical — but empirically it finds correct, near-optimal programs surprisingly quickly.1

Enumerative search is the brute-force approach, which turns out to be more powerful than it sounds. Enumerate all programs at depth 0 (just variables), then depth 1 (one operation applied to depth-0 programs), then depth 2, and so on. The key optimization: don't consider two programs that produce the same outputs on all test cases. This "observational equivalence" pruning can reduce the search space by a factor of 100x at depth 10, which is the difference between intractable and practical. Udupa et al. used this approach to synthesize cache coherence protocols, which is not a toy problem.1

What Synthesis Is Actually Good For

The immediate practical applications are exactly the kind of small, unintuitive programs that humans write badly: bit-twiddling hacks from Hacker's Delight, peephole optimizations in compilers, deobfuscation of deliberately obscured code. The appeal of synthesis for these problems is that humans are terrible at them (how many programmers can write a correct branchless population count from scratch?) while synthesis excels — the programs are small enough to verify exhaustively and the specifications are clear.

The deeper promise is sketching: a programmer writes the high-level structure and leaves holes that synthesis fills in. This is an appeal to the 80-20 rule — programs spend most of their time in a few small hot spots, and synthesizing just those fragments can deliver significant performance improvements. The programmer handles architecture and control flow; the machine handles the finicky micro-optimization that's so easy to get wrong.1

I think synthesis is underrated by working programmers and overrated by researchers. Working programmers dismiss it because the tools are still immature and the scope is limited to small fragments. Researchers overrate it because they focus on the elegance of the algorithms without confronting the practical reality that most code is boring glue logic where synthesis adds no value. The sweet spot is exactly those cases where the code is small, the specification is clear, and the correct implementation is non-obvious — which describes a surprisingly large fraction of the code that actually matters for performance and security.

The connection to verified compilation is direct: a superoptimizer is just a synthesizer whose specification is "produce a program equivalent to this one, but faster." Regehr's prediction that SAT-based superoptimizers could replace half of LLVM's optimization passes is really a prediction about the scalability of synthesis, and the jury is still out.

Footnotes

  1. Program Synthesis Explained by James Bornholt — source 2 3 4 5 6 7

Open in stacked reader →