ChatGPT provides mathematical proof autonomously

ChatGPT app

VUB’s Data Analytics Lab has published a study explaining that it is possible to develop original mathematical proofs using a ‘standard’ commercial language model. The researchers used OpenAI’s widely used Large Language Model ChatGPT-5.2 (Thinking) for this purpose. However, as the researchers emphasise, this certainly does not mean that the role of humans in this process is over.

ChatGPT has succeeded in providing proof that verifies a conjecture from 2024 put forward by the mathematicians Ran and Teng. A conjecture is a statement that is believed to be true because there are many examples or indications supporting it, but for which no formal proof yet exists. Mathematicians often formulate such a conjecture after discovering a pattern or following numerous calculations that consistently yield the same result. Until someone provides conclusive proof, it remains a conjecture; once it is proven, it becomes a proposition or theorem.

Solved in 7 chat sessions

The study describes how seven chat sessions with ChatGPT and four versions of the proof collectively produced the final proof. ChatGPT proved particularly useful in the search for the proof, whilst human experts were essential for verifying its accuracy and ensuring the argument was watertight.

The authors demonstrate that ChatGPT-5.2 (Thinking) largely developed the structure of the proof itself, with minimal human intervention. As the paper’s brief description summarises: “With the Data Analytics Lab, we are among the first to demonstrate that a commercially available LLM can independently develop original mathematical proofs.”

“I had long suspected that ChatGPT could help me with the proof of unsolved mathematical problems,” says Brecht Verbeken, a postdoctoral researcher in the Data Analytics Lab research group at the VUB. “And yet I was surprised at how efficiently it went.”

More creative than you might think

The researchers situate their work within the broader context of what they call ‘vibe-proving’, an approach in which language models are used to explore and structure high-level theoretical reasoning. The key question in the publication is whether this vibe-proving technique will undergo the same rapid evolution over the coming year as previously seen with AI-assisted programming (vibe-coding), where systems developed from tools into virtually autonomous code generators. “We often hear people say that the creativity of these systems is fundamentally limited to reformulating training data,” says VUB professor Vincent Ginis (Data Analytics Lab). “I’m glad we can dispel that misconception with our work too…”

The authors emphasise that, although the model generated a substantial part of the proof scheme itself, humans remain crucial for the final check and for closing formal gaps. Above all, the process clearly demonstrates where LLM assistance really makes a difference and where verification bottlenecks remain. This development marks an important moment in the use of AI within theoretical research: not only as an aid to programming and text production, but as a tool that can contribute to original mathematical discoveries. Although this must still be coupled with human supervision and critical reasoning. “Formulating candidate proofs can now be done much faster, but the bottleneck then becomes human verification. That takes time. But the language models will surely help us there too,” concludes VUB Professor Andres Algaba (Data Analytics Lab VUB).

Vincent Ginis (Data Analytics Lab) is a professor of mathematics, physics and artificial intelligence at VUB and a visiting professor at Harvard University. He teaches a range of courses in the sciences and engineering and has published pioneering research in photonics and data analysis, with over 20 international articles and 40 conference presentations.

Portret Vincent Ginis

Andres Algaba is an FWO postdoctoral researcher at the Data Analytics Lab at the VUB. His main research interests include automated science and innovation using large language models, the reliability and transparency of large language models, and the science of science. He is also a member of Jonge Academie België.

Andres Algaba

More information

The publication Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking) is freely accessible via arXiv.

In this article:

  • How did a commercial language model manage to turn a mathematicians’ conjecture into a genuine proof?
  • What does this mean for the role of AI in scientific research: a tool or a fully-fledged ‘collaborator’?
  • Can we simply trust such AI results, or does human oversight remain indispensable?