We have invited two keynote speakers!
Title
Foresight, Mental Models, and Reframing—Why Accurate AI Fails to Predict Accurate Futures
Abstract
This talk addresses a critical yet often overlooked issue in AI evaluation, redefined through the lens of strategic foresight. Although current AI prediction models generate highly accurate forecasts based on historical data, they frequently fail when facing sudden and discontinuous societal changes. This phenomenon represents a "new frame problem," emerging precisely because AI evaluation methods are confined within stable frames extrapolated from historical experiences. Consequently, even highly accurate AI models can paradoxically predict inaccurate or irrelevant futures.
This talk introduces an alternative approach grounded in human-driven strategic foresight. Using a simple yet powerful tool—the basic 2×2 scenario matrix—humans explicitly define critical uncertainties (key drivers), thereby establishing clear frames that AI can linguistically understand and utilize. Examples illustrate how cooperative approaches between humans and AI significantly enhance the depth, relevance, and quality of the generated future scenarios.
Furthermore, the approach proposes positively reframing the apparent "stress" experienced by AI prediction models when faced with sudden context shifts. In the long term, qualitative foresight methods developed by humans can overcome current limitations in AI performance evaluation, enabling genuine human-AI co-creation. In the short term, this reframing offers valuable insights for AI researchers working under intense competitive pressures, helping them critically reconsider their model configurations and performance strategies.
Ultimately, this talk seeks to uncover implicit assumptions embedded within traditional AI evaluation metrics, demonstrating that integrating strategic foresight can meaningfully enhance the practical performance of AI agents in real-world contexts.
Biography
Dr. Nobuyuki Shirakawa is an Associate Professor at Niigata University, currently serving as Vice Director of the Research Centre on Emerging Technology and Governance. His research focuses on strategic foresight, innovation governance, and the socio-technical integration of emerging technologies. With a multidisciplinary background spanning economics, information science, policy studies, and science and technology studies (STS), Dr. Shirakawa examines how technological innovations dynamically shape societal transformations.
He is a co-founder of Code for Japan, a prominent civic-tech community embodying Tim O'Reilly's concept of "Government 2.0," actively promoting open innovation, community empowerment, and digital transformation in the public sector. He has significantly contributed to major foresight initiatives, notably the 9th and 11th National Foresight Surveys at Japan’s National Institute of Science and Technology Policy (NISTEP). Additionally, Dr. Shirakawa played a pivotal role in establishing the Technology Strategy Center (TSC) at Japan's New Energy and Industrial Technology Development Organization (NEDO), fostering interdisciplinary research and policy dialogue.
Dr. Shirakawa has extensive experience in international research collaborations and policy advisory roles, including engagements with the OECD and the Asian Productivity Organization (APO). His recent research addresses fundamental challenges in AI evaluation frameworks, advocating for the systematic integration of scenario planning and strategic foresight into AI research and policymaking.
He actively contributes to international and interdisciplinary communities through numerous academic publications, policy papers, and books. His overarching aim is to bridge human cognitive limitations and AI predictive capabilities, thereby supporting anticipatory governance and sustainable societal development.
> For more details
Title:
From Cooperation to Discovery for Science: Principled Human–AI collaboration and Textual Novelty Identification
Abstract:
AI is increasingly embedded in our daily lives, and its application to scientific discovery—such as accelerating materials discovery—represents a natural and promising frontier. However, science poses unique challenges. Unlike typical Human–AI collaboration frameworks that rely on humans to supervise uncertain AI outputs, in scientific domains, only experiments reveal ground truth—and even expert suggestions may be unreliable. Human input should thus be treated as advice: potentially helpful, but not infallible.
Another major challenge is the identification of novelty in texts. While large language models (LLMs) are trained to produce the most plausible continuations based on learned distributions, novelty is, by nature, implausible—it lies in the surprising and unexpected. This makes LLMs unlikely to generate or prioritize novel content. These challenges raise two key questions:
(1) How can uncertain or potentially incorrect suggestions from humans or AIs still accelerate discovery?
(2) How can we identify novel content in texts using LLMs with provable guarantees?
In the first part of the talk, I introduce a principled Human–AI collaboration algorithm with two theoretical guarantees: (1) No-harm—even completely incorrect suggestions from humans or LLMs do not degrade the convergence rate to the true optimum. (2) Handover—the algorithm eventually stops querying human or LLM inputs once sufficient information has been elicited, freeing them from supervising the optimizer. We validate our method through real-world experiments in lithium-ion battery electrolyte discovery in collaboration with Oxford battery researchers.
The second part of the talk presents a provable method for novelty identification using classical tools from robust statistics. We reframe novelty detection as a normality estimation problem: if we can reliably learn what is "normal" from a mixed distribution of normal and novel texts, subtracting the normal reveals the novel. Robust statistics provides precisely this ability—learning a representative distribution despite the presence of anomalies. When the learnt parameters remains unchanged regardless of the proportion of anomalies, this subtraction-based approach is provably sound, a property known as redescending, considered a holy grail in robust statistics.
We propose Hölder-DPO, the first objective for LLMs with a provable redescending property, and apply it to dataset cleansing tasks for sanity checking. Our method achieves over 95% unsupervised accuracy in identifying anomalies and reveals that more than 25% of entries in commonly used Anthropic datasets are anomalous.
References:
[1] W. Xu*, M. Adachi*, C.N. Jones, M.A. Osborne, NeurIPS (Spotlight) 37, 104091, 2024, arXiv:2410.10452
[2] M. Fujisawa*, M. Adachi*, M.A. Osborne, arXiv:2505.17859, 2025
(*Equal contributions)
Bio:
Dr. Masaki Adachi is a Senior Researcher at Lattice Lab, Toyota Motor Corporation. He received his Master’s degree from the University of Tokyo in 2015, where he was awarded the President’s Prize, and completed his DPhil at the University of Oxford in 2025 as a Clarendon Scholar, co-advised by Prof. Michael A. Osborne and Prof. David A. Howey. His research interests include AI for science, Human–AI collaboration, uncertainty quantification, and probabilistic numerics.