On June 3rd, a new research paper (1) was released by Google. It states that difficult mathematical proofs were solved by combining the LLM Gemini 3.1-pro with a mathematical proof language called LEAN (2). This time, I would like to delve into this paper and consider what kind of developments we can expect from this new AI agent in the future, beyond the framework of mathematics.
1. The Synergistic Effect of the LLM's Flexibility and LEAN's Strictness
Here is the paper, featuring an active AI agent called LEAP. It only uses Gemini 3.1-pro as the LLM, and no specific fine-tuning has been performed. It is being used straight out of the box. Even so, it is reported to demonstrate outstanding exploration capabilities in mathematical proofs. Since it doesn't require any particular additional training, it can be used immediately without doing anything, which is very convenient for practical use. Furthermore, by using LEAN in conjunction, if a proof with contradictory logic due to hallucinations is produced, an error occurs during compilation, creating a mechanism where it is automatically rejected.
LEAP (LLM-in-Lean Environment Agentic Prover)
Since LLM responses can fluctuate probabilistically, humans need to verify them in detail when conducting rigorous arguments. However, by introducing LEAN, this process has been automated. This is very reassuring. It seems that this fantastic result was achieved by combining the flexibility of the LLM and the strictness of LEAN in this way. Let's look closer.
2. The Structure and Accuracy of LEAP
Here is the structure of LEAP. The figure on the left is the roadmap for the theorem to be proved using LEAN. Technically, it forms a structure called a DAG (Directed Acyclic Graph). Complex mathematical proofs are not completed in a single attempt; the proof progresses by going back and forth between the LLM and LEAN several times. The key here is the section in the red frame, where the LLM describes an INFORMAL BLUEPRINT in natural language and converts it into a FORMAL SKETCH in LEAN. Furthermore, a two-tier review by LEAN and the LLM awaits. LEAN verifies whether the new proof method has any contradictions, and the LLM's review verifies whether that method is genuinely effective. In other words, the LLM acts as a pilot in the search for proof methods. Even though it's just using Gemini 3.1-pro as is, its potential is truly surprising.
LEAP workflow
Now, let's look at the results of applying LEAP to an actual task. It tackled the notoriously difficult Putnam 2025. Putnam 2025 contains twelve undergraduate-level problems from the 86th William Lowell Putnam Mathematical Competition, a highly challenging North American mathematics competition.
Looking at the DAG, you can see how the proof actually progresses. In this example of Putnam 2025 Problem A6, you can see layers upon layers of connected proofs. It's certainly a difficult problem. The green indicates the parts that have already been proven.
DAG example for Putnam 2025 Problem A6
The results, as shown below, were that LEAP answered all questions correctly. An overwhelming accuracy.
Results on Putnam 2025
You can see that while the original Gemini 3.1-pro couldn't score at all, it was able to demonstrate tremendous capabilities by combining it with LEAN. I think it is truly a breakthrough.
3. Beyond Mathematical Proofs into Various Fields
What we have seen so far were tasks related to mathematical proofs. With LEAP being able to construct such perfect logic, I felt it would be a waste to keep it confined solely to mathematics. In particular, its application to economics, which is directly linked to business practices, has immense scope and depth, and I believe it can contribute to expanding the areas where LLMs can be active. Economics is also generally described using mathematical formulas, so I think it has a high affinity with LEAP. A paper (3) on its application to economics has already been published, so if you are interested, please do give it a try.
What did you think? I believe the combination of LLMs and LEAN will be expanded and improved in various ways in the future. It might be stepping closer to AGI. It's very exciting.
At Toshi Stats, we plan to take on tasks in the field of economics moving forward. Stay tuned!
You can enjoy our video news “ToshiStats AI Weekly Review” from this link, too!
1) LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks, 3 Jun 2026, Google
2) LEAN
3) We Can't Agree to Disagree, Formally: Aumann's Theorem and Assumption Accounting in Lean, May 27, 2026, Ruize Chen, Ben Eltschig, Ken Ono, Jujian Zhang Axiom Math, Scott Duke Kominers Harvard University; a16z crypto
Copyright © 2026 ToshiStats Co., Ltd. All right reserved.
Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.
