Compression

Inside a new Freedman paper: a Googol hidden in 100 tokens, and why mathematics is a three-thousand-year AlphaZero run. In March this year, Michael Freedman, who won the Fields Medal back in 1986, published a paper with a few collaborators. The title is brash: Compression Is All You Need: Modeling Mathematics. I am borrowing it for this essay, because once you see what they measured, no other title does the job. They did something that sounds dull at first. They took MathLib, the Lean 4 library with roughly half a million theorems, definitions, and lemmas, turned the whole thing into a dependency graph, and measured two numbers for every element. One they call wrapped length: how many tokens you write in the Lean source to state this thing. The other is unwrapped length: if you recursively expand every reference down to the base axioms, how many raw symbols do you end up with. Then they went looking for the deepest element in MathLib. They found a theorem in algebraic geometry called AlgebraicGeometry.Scheme.exists_hom_hom_comp_eq_comp_of_locallyOfFiniteType. Wrapped, it takes about 100 tokens. Fully unwrapped, it contains around $10^{104}$ raw symbols. ...

Why a concept called epiplexity may explain where intelligence really comes from. Intelligence, at its core, is a compression problem. Humans cannot track every falling apple, so we invent gravity. We cannot memorize every chess position, so we develop strategy. We cannot remember every sentence we’ve ever read, so we acquire grammar. In each case, intelligence emerges from the same constraint: we cannot brute-force the world. When computation is limited, discovering structure becomes essential. A recent paper from researchers at Carnegie Mellon and NYU introduces a concept that captures this idea precisely. They call it epiplexity — the portion of information that a computationally bounded learner can actually extract. The idea helps explain several puzzles about modern AI, from AlphaZero’s superhuman chess ability to the surprising effectiveness of reasoning-based models. More importantly, it reframes a deeper question: Where does intelligence actually come from? The Static vs. Euclid Problem Consider a simple thought experiment. You have two things in front of you. One is a terabyte of television static — pure noise, every pixel random. The other is a copy of Euclid’s Elements, the geometry text that shaped two thousand years of mathematics. ...