上一篇有关AI发展的讨论已经是大概三个月前所作了,当时OpenAI的新模型GPT-5刚发布不久。在这三个月中,GPT-5已经成为公认的综合能力最强的大模型,并且暂时还没有能够超越它的新模型出现(按照Google的计划,Gemini 3.0将会很快发布,可以期待它成为下一代的领衔模型)。不过,在这三个月中,尽管市场在基础模型层面没有大跨越式的进步,但最新版的大模型的应用、以及AI智能体的探索,却在不断地产生令人感到惊艳的新进展。本文主要将会简单介绍两个与学术研究高度相关的成果。
The previous discussion on AI developments was written roughly three months ago, shortly after OpenAI released its new model, GPT-5. Over these three months, GPT-5 has become widely recognized as the most capable general-purpose large model, and so far no newer model has surpassed it (according to Google’s roadmap, Gemini 3.0 will be released soon, and we can look forward to it as the next leading model). That said, although there hasn’t been a dramatic leap at the foundation-model layer during this period, the applications of the latest LLMs—and the exploration of AI agents—have kept producing striking new advances. This article will mainly give a brief overview of two achievements that are highly relevant to academic research.
第一个要介绍的,是前沿大模型在数学研究领域已经开始产生的强大实力。自从GPT-5发布后,已经有来自多个领域的信息源表明:其已经具备了能够独立解决部分前沿问题的水平。例如,加州大学尔湾分校的数学教授Paata Ivanisvili声明,GPT-5解决了一个实分析领域的开放问题,构造出了一个猜想的一个复杂反例(https://x.com/PI010101/status/1974909578983907490)。德克萨斯大学奥斯汀分校的物理学教授Scott Aaronson声称,GPT-5帮助他构造出来了一个量子计算问题的数学表达式;他表示,如果是一个博士研究生提供了同样的答案,他会认为这是一个非常聪明的博士生(https://officechai.com/ai/gpt-5-thinking-wrote-the-key-technical-step-in-our-new-paper-quantum-computing-researcher-scott-aaronson/)。一个更系统性的案例便是最近刚发布在arxiv上的一篇论文稿件,其由著名数学家陶哲轩与几位Google Deepmind的研究员合作而成(https://arxiv.org/pdf/2511.02864)。在论文中,作者在60多个数学问题上测试了建立在大语言模型基础上的智能体:AlphaEvolve。在好几个问题中,AlphaEvolve得到了比人类已知结果更好的新结果;在另一些问题中,AlphaEvolve基本证明了已知的最好结果;还有一些问题,智能体虽然没有正确证明结论,但其提出的思路却启发了陶哲轩本人,从而帮助其证明了一个新的定理。上述的一切案例都表明,在数学研究领域,前沿大模型如今已经超出了一个助手的范畴,几乎达到了一个“强大的合作者”的水准。
The first topic is the growing strength frontier large models have begun to demonstrate in mathematical research. Since the release of GPT-5, sources across multiple fields have indicated that it is already capable of independently solving some cutting-edge problems. For example, Paata Ivanisvili, a mathematics professor at the University of California, Irvine, stated that GPT-5 resolved an open problem in real analysis by constructing a complex counterexample to a conjecture (https://x.com/PI010101/status/1974909578983907490). Scott Aaronson, a physics professor at the University of Texas at Austin, reported that GPT-5 helped him formulate a mathematical expression for a quantum computing problem; he noted that if a PhD student had provided the same answer, he would have considered them an exceptionally clever student (https://officechai.com/ai/gpt-5-thinking-wrote-the-key-technical-step-in-our-new-paper-quantum-computing-researcher-scott-aaronson/). A more systematic case is a recently posted arXiv preprint co-authored by the renowned mathematician Terence Tao and several Google DeepMind researchers (https://arxiv.org/pdf/2511.02864). In that paper, the authors evaluated an agent built on large language models—AlphaEvolve—across more than sixty mathematical problems. On several of them, AlphaEvolve obtained new results stronger than the best previously known; on others, it essentially proved the best known results; and in yet more cases, even when the agent did not produce a correct proof, the ideas it proposed inspired Tao himself and helped him prove a new theorem. All these cases suggest that, in mathematical research, frontier large models have now surpassed the role of a mere assistant and are approaching the level of a “powerful collaborator.”
第二个要介绍的,则是一个与大部分学术研究领域都相关的一个新的智能体。最近,由多个研究机构合作的一个自动研究智能体 -- Denario Project 在全网发布(https://arxiv.org/html/2510.26887v1)。通过复杂的智能体工作流程设计,Denario Project可以在各个前沿领域进行自动化地全流程学术探索,包括文献调研、想法生成、建模与计算、案例测试、结论与展望、以及论文生成。在项目的Github主页上,已经有了数十篇由该智能体生成的,涵盖了天文学、生物学、材料学、物理学、化学等大量学科的学术论文。论文作者团队声称,当前的这些论文从研究深度上还相对较浅,因此智能体本身还无法独立进行具有重大意义的科学研究;但作为顶尖学者的一个进行高效率想法探索的工具,它已经堪称优秀。
The second topic concerns a new agent relevant to most academic research fields. Recently, a collaborative, automated research agent—the Denario Project—was released online (https://arxiv.org/html/2510.26887v1). Through a carefully designed multi-agent workflow, Denario can conduct fully automated, end-to-end scholarly exploration across cutting-edge domains, including literature review, idea generation, modeling and computation, case testing, conclusions and outlook, and paper drafting. On the project’s GitHub homepage, there are already dozens of papers generated by the agent, spanning numerous disciplines such as astronomy, biology, materials science, physics, and chemistry. The author team notes that the current papers are still relatively shallow in research depth, so the agent itself cannot yet carry out scientifically significant research independently; however, as a tool for top scholars to explore ideas with high efficiency, it is already excellent.
在很大的程度上,事情的发展印证了我三个月前的判断:随着基础模型能力的提升、以及智能体的发展,AI系统将会很快达到一个相当高的自主性水平;所谓的“AI停滞论”纯属鸵鸟式自欺欺人。而根据我个人在构建智能体层面的经验和判断,上面的这一轮突破核心还是来源于基础模型本身的能力提升。这是因为:对于需要长时间规划和思考的复杂任务而言,基础模型的能力每提升一个档次,就意味着解决关键步骤的成功率提高一个档次;而在一个串行任务中,这意味着原本用最复杂的智能体流程都无法解决的问题,现在通过相对简单的流程就能得到可靠的结果。而基础模型的能力进步显然不会止步于此;无论是GPT系列,Gemini系列,Claude系列,Grok系列,还是Qwen系列,它们的能力在未来1-2年必然还会再提升至少一到两个档次。结合智能体构建技术的进一步发展,我们完全可以给出下面一个看似激进、实则相当保守的预测:在多个学科领域达到优秀甚至顶尖博士水平的自主AI系统,在1-2年内很可能会问世。注意:这并不意味着学术界的某个领域仅仅多了一个优秀博士;它意味着学术界的所有人都可以得到一个顶尖的、近乎全知的博士作为合作伙伴、甚至作为主导研究者。
To a large extent, recent developments have confirmed the judgment I made three months ago: as foundation models improve and agent frameworks mature, AI systems will soon reach a fairly high level of autonomy; the so-called “AI stagnation” thesis is pure ostrich-style self-deception. Based on my own experience building agents, I believe this latest wave of breakthroughs still stems primarily from advances in the foundation models themselves. The reason is simple: for complex tasks that require long-horizon planning and reasoning, each step up in core model capability raises the success probability of the critical steps; in a serial task, that means problems that previously resisted even the most elaborate agent pipelines can now be solved reliably with comparatively simple workflows. And progress at the foundation-model layer is clearly not stopping here. Whether it’s the GPT, Gemini, Claude, Grok, or Qwen families, their capabilities will almost certainly rise by at least one or two further notches over the next 1–2 years. Coupled with continued advances in agent construction techniques, we can offer what may sound radical but is in fact quite conservative: autonomous AI systems that perform at an excellent—even top PhD—level across multiple disciplines are likely to emerge within 1–2 years. Note: this doesn’t merely mean “one more excellent PhD” in a given field; it means everyone in academia could have a top-tier, near-omniscient PhD-level partner—and in some cases, a lead researcher.
如果上述预测成为现实,那么其对于当前的学术界生态而言,毫无疑问会是颠覆性、甚至是毁灭性的。论文军备竞赛、经费申请军备竞赛、各类天才认证与天才奖励,在这一预测下都会变得越来越缺乏实际意义。在Denario Project的论文中,作者已经讨论了类似的系统将来可能为整个学术圈带来的伦理困境。在我看来,这些伦理困境从根本上都来自于同一个源头:当前的学术圈运作系统,本质上是为了将所有学者当作是知识发现的工具而构建的;而问题的根本出路在于去工具性,即尊重人的主体性、并以主体性来重构整个系统运作的叙事。这一观点在我之前很多文章中都有体现,此处不再赘述。但是,我也丝毫不怀疑学术界在AI冲击面前将会展现出来的保守性和顽固性。期待学术界瞬间完成转型,成为一个鼓励自由探索而非内卷竞争的场域,可以说是异想天开。
If the above prediction becomes reality, it will undoubtedly be disruptive, even destructive, for today’s academic ecosystem. The arms races in paper production and grant applications, as well as various schemes for certifying and rewarding “genius,” would all steadily lose their practical meaning under this scenario. In the Denario Project paper, the authors have already discussed the ethical dilemmas that similar systems could bring to the entire scholarly community. In my view, these dilemmas all stem from a single root cause: the current academic operating system is essentially built to treat scholars as instruments for knowledge discovery. The fundamental remedy is de-instrumentalization—that is, respecting human subjectivity and reconstructing the system’s narrative around that subjectivity. I have articulated this position in many previous essays, so I will not repeat it here. That said, I have no doubt about the conservatism and stubbornness academia will display in the face of AI’s impact. To expect academia to transform overnight into a realm that encourages free exploration rather than zero-sum competition is, frankly, wishful thinking.
在接下来的几年中,我相信会有越来越多的“学界大佬”在公共平台上发声,用各种似是而非的理论来论证“学者相比于AI的不可替代性”,以此维护人的主导作用;但他们的研究则大部分、甚至几乎完全交给了AI。而另一方面,越来越多的圈内外人士也会站出来批判论文发表军备竞赛在AI时代的愚蠢,但与此同时学术界依然会无意识地沿着这条路径继续往下运作很长一段时间。这种错位的荒谬景象大概会维持数年到十多年的时间,直到全社会大体上接受人类在绝大多数工具性能力上都不是AI的对手这一底层判断为止。
In the coming years, I expect more and more “big names” in academia to speak out on public platforms, advancing various superficially plausible theories to argue for the “irreplaceability” of scholars relative to AI in order to preserve human primacy—while, in practice, delegating most, if not nearly all, of their own research to AI. Meanwhile, growing numbers of insiders and outsiders alike will denounce the stupidity of the publication arms race in the AI era, yet academia will unconsciously keep operating along that track for quite some time. This misaligned, absurd spectacle will likely persist for several to a dozen years, until society at large broadly accepts the foundational premise that, in the vast majority of instrumental capacities, humans are no match for AI.