Get involved: We welcome researchers and students who are interested in AI agents to join us! To receive relevant seminar information in time, please click the following link to register.
AI agents are increasingly being developed to autonomously perform complex tasks across diverse domains, from web navigation to multi-agent collaboration. As these agents grow more capable, new challenges emerge in ensuring robustness, alignment, and adaptability in open-ended environments. The frontier of AI agent research lies in understanding how to design, evaluate, and safely deploy agents in dynamic, interactive settings. This AI Agent Frontier Seminar aims to bring together researchers from academia and industry to discuss recent advances and open questions surrounding the development, safety, and scalability of next-generation AI agents.
Talk Title: STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
Host: Prof. Ming Jin (Virginia Tech)
Talk Time: 27 June 2025, 18:00h CEST Time (Amsterdam, Berlin, Rome, Stockholm, Vienna) / 09:00h California time / 12:00h Eastern Time / +1 00:00h Beijing time.
Abstract: I will discuss recent works on using RL for theorem proving, especially in the possible future regime where we ran out of high-quality training data. To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (65.0%, pass@3200), Proofnet-test (23.9%, pass@3200) and PutnamBench (8/644, pass@3200).
Bio: Prof. Tengyu Ma is an assistant professor of computer science and statistics at Stanford. His research interests broadly include topics in machine learning, algorithms and their theory, such as deep learning, (deep) reinforcement learning, pre-training / foundation models, robustness, non-convex optimization, distributed optimization, and high-dimensional statistics.
Steering Committee
TU Darmstadt
UCL
TUM
UC Berkeley
If we receive the speaker's permission, we will release the recording videos on the AI Agent YouTube Channel (we can only make the videos public after receiving permission).