Compression Is All You Need

Inside a new Freedman paper: a Googol hidden in 100 tokens, and why mathematics is a three-thousand-year AlphaZero run. In March this year, Michael Freedman, who won the Fields Medal back in 1986, published a paper with a few collaborators. The title is brash: Compression Is All You Need: Modeling Mathematics. I am borrowing it for this essay, because once you see what they measured, no other title does the job. They did something that sounds dull at first. They took MathLib, the Lean 4 library with roughly half a million theorems, definitions, and lemmas, turned the whole thing into a dependency graph, and measured two numbers for every element. One they call wrapped length: how many tokens you write in the Lean source to state this thing. The other is unwrapped length: if you recursively expand every reference down to the base axioms, how many raw symbols do you end up with. Then they went looking for the deepest element in MathLib. They found a theorem in algebraic geometry called AlgebraicGeometry.Scheme.exists_hom_hom_comp_eq_comp_of_locallyOfFiniteType. Wrapped, it takes about 100 tokens. Fully unwrapped, it contains around $10^{104}$ raw symbols. ...

 · 9 min · hohoda