Singapore is a remarkable place; a melting pot of cultures of the surrounding countries. There's the Indian Village, China Square and my personal favourite, Raffles—an old stomping ground of writers and creatives like Ernst Hemingway and Rudyard Kipling and the birthplace of the Singapore Sling cocktail. But for someone who grew up in Northern Ireland, the combination of the heat and humidity was a test of endurance. I was in town for the 2025 edition of the International Conference for Learning Representations (ICLR); one of the primary machine learning and artificial intelligence conferences (the other two being ICML and NeurIPS). If the climate challenged my physical comforts, the conference definitely challenged my academic assumptions about AI models. I disclose some of the ideas and themes that were most prevalent during the conference and how they agree some broader trends I've been observing across academia and industry.
One major takeaway is a growing (and even somewhat agreed upon) recognition that bigger is not always better when it comes to artificial intelligence models. The field has long been riding a wave of ever-larger models, with training procedures that now require megawatts of GPU power. And yet, the biological system from which neural networks originally drew inspiration—the human brain—operates on only about 10 watts. This contrast is becoming harder to ignore. Moreover, there are clear signs that we are hitting diminishing returns by just adding compute power and is no longer yielding the spectacular gains it once did around 10 years ago. This observation is prompting a broader rethink: Is the path forward through pushing through this computational scaling barrier, or through smarter, more efficient models?
This concern connects to a broader discussion about digital twins (a primary topic of conversation at Winter Simulation Conference 2024 in Florida, USA). Digital twins can be viewed as massive surrogate models meant to replicate large physical or virtual systems. While promising in theory, these models are extremely data-hungry and computationally intensive to train well, and one increasingly common opinion is that their complexity often obscures, rather than clarifies, the underlying systems and dynamics they aim to model. It prompts the question: Are simpler models 'better'? Do simplier models give more explainable, and therefore, more trustworthy outputs?
For me, one of the most memorable moments of the ICLR conference was University of Hong Kong's Prof. Yi Ma's keynote address. There were some great taglines to speak about. One of which was:
"Intelligence is not a game for billionaires."
The pursuit of bigger models increasingly concentrates AI capabilities in a small number of private companies with the resources to afford massive compute budgets. Meanwhile, researchers and practitioners are starting to look for smarter, leaner approaches to machine learning; ones that rely more on statistical insights and efficient algorithms than on brute-force scaling. Prof. Ma also offered another memorable line:
"We developed mathematics to do science because natural language is ambiguous."
This comment is a useful reminder when thinking about large language models (LLMs), which dominated much of the conversation at ICLR this year. LLMs excel at language-related tasks, but whether they are the right path forward for domains that demand precision and rigour—such as mathematics and scientific exploration—remains a topic of debate and something that I am personally skeptical about. In fact, in painting broad strokes, for me, ICLR often felt more like an LLM conference than a general deep learning/AI conference.
As for my research interests (efficient & optimized sampling), diffusion samplers are currently very popular. These methods aim to guide the transformation process of noisy, random data into target distributions by gradually removing the noise. My collaborators and I have recently approached the sampling problem from a different angle. We use graph neural networks to learn a discrete transformation: a mapping from an input node set to a low-discrepancy, representative node set with respect to a target distribution. This framework, part of our Message-Passing Monte Carlo (MPMC) method, emphasizes structure and efficiency. At the Frontiers in Probabilistic Inference workshop at ICLR, we presented an extension of this work: Stein-MPMC, which incorporates Stein discrepancies into the message-passing framework to sample from nonuniform target densities which are known (up to normalizing constant).
While it's still an open question which sampling approaches—diffusion models, flows, our graph-based methods, or something else entirely—will ultimately prove most effective, it is clear that sampling and machine learning are converging in exciting ways.
And perhaps as a last takeaway, there is a renewed openness in the community to rethinking scale: to recognize that true intelligence, whether natural or artificial, might emerge not from scaling, but from smart design, robust mathematics, and creativity!