Machine Learning for Fundamental Physics

The Physics Division Machine Learning group is a cross-cutting effort that connects researchers developing, adapting, and deploying artificial intelligence (AI) and machine learning (ML) solutions to fundamental physics challenges across the HEP frontiers. While all ML group members have a primary affiliation with other areas of the division, there are unique efforts within the group to develop methods with significant interdisciplinary potential. We have strong connections and collaborations with researchers in the Scientific Data Division (and in particular with the Computational Cosmology Center), the National Energy Research Scientific Computing Center (NERSC), the Berkeley Institute of Data Science (BIDS), and the Bakar Institute of Digital Materials for the Planet (BIDMaP).

Specific areas of research are outlined in detail below.

AI/ML Seminar Series

Weekly seminars are held on cutting edge AI/ML topics. The seminars alternate between LBL and UC Berkeley, with details as below.

LBL Seminars: 11am Thursday in the Sessler Conference Room (50A-5132). Details and schedule.

UC Berkeley Seminars: 12pm Thursday in Cory Hall 373. Details and schedule.

Research Areas

Simulation-based inference

LBNL researchers employ simulation-based inference (SBI) to bridge detailed physical simulations with observational and experimental data. In cosmology, this includes neural likelihoods for BAO reconstruction, inference tools for supernova cosmology, galaxy field-level inference with graph neural networks, and fast emulators of large-scale structure. Collider, theory, and neutrino efforts similarly use SBI to unfold neutrino cross sections, study whether dark-matter signals can be distinguished from unresolved point sources, and accelerate detector-level modeling. Projects such as LHC track reconstruction with graph neural networks and transformer-based jet-flavor tagging illustrate how modern architectures can infer latent physical quantities directly from low-level detector data.

Looking ahead, we are building unified SBI pipelines that support rapid analysis as new datasets arrive from DESI, Roman, LSST, ATLAS, and neutrino experiments. These pipelines will enable high-fidelity inference, uncertainty quantification, and cross-domain integration of astrophysical, collider, and neutrino data.

Contacts: Callum Wilkinson, Xiangyang Ju

Anomaly detection

We develop anomaly-detection systems to identify unexpected or rare events in both astrophysical surveys and collider/neutrino experiments. In cosmology, these methods surface unusual supernovae, atypical strong-lensing systems, and anomalies in DESI spectroscopy. In the neutrino program, new work focuses on identifying failure modes in detector simulations — allowing researchers to pinpoint mismodeling and improve the fidelity of reconstruction algorithms. Collider studies likewise apply anomaly detection to spot unusual event topologies or detector behavior that may indicate new physics or systematic issues.

Future efforts include real-time anomaly monitoring in survey and experiment pipelines, enabling rapid follow-up, improved detector operations, and early discovery opportunities across all domains.

Contacts: Zach Marshall

AI Agents and LLMs

Large language models and AI agents are becoming valuable collaborators across our research programs. In cosmology, LLM-based assistants help users navigate data repositories like the American Science Cloud and orchestrate end-to-end simulation workflows. Collider projects are developing agents for ATLAS’s analysis pipeline, enabling automated task orchestration, dataset lookup, and production of analysis code. The CelloAI project builds an LLM with retrieval-augmented generation (RAG) tailored specifically to LHC documentation and software frameworks, making analysis more accessible and consistent for new collaborators.

Over the next few years, we will deploy secure, domain-specific AI assistants within DOE computing environments, automating routine tasks, and enabling more reproducible science across cosmology and HEP.

Contacts: Haichen Wang, Paolo Calafiura, Zarija Lukic

Generative models

Generative AI models are transforming simulation and data augmentation across our programs. Cosmology projects use diffusion models and normalizing flows to synthesize realistic strong-lensing images, galaxy catalogs, and CMB detector responses, while collider efforts apply generative models to emulate detector behavior and explore new tracking approaches. These tools dramatically reduce computational cost and allow researchers to probe complex parameter spaces that traditional simulation methods struggle to sample efficiently.

The next generation of generative tools will integrate physical constraints—such as symmetries, conservation laws, and detector geometry—to produce more realistic datasets and support reliable uncertainty quantification for both cosmic and collider analyses.

Contacts: Simone Pagan Griso, Uros Seljak

Foundation models

We develop large multimodal foundation models trained on astronomical and high-energy physics datasets. Recent work includes AION-1, an omnimodal foundation model for astronomical sciences, and AstroCLIP, a cross-modal representation model for galaxies. These models complement broader foundation-model efforts in collider physics aimed at unifying tracking, calorimetry, particle ID, and high-level event representations within a single architecture.

In the coming years, we expect reusable, open foundation models that span cosmology, astrophysics, and HEP—dramatically reducing duplication of effort and enabling fine-tuning for a wide range of scientific tasks.

Contacts: Uros Seljak, Xiangyang Ju

Data curation

Data curation is essential for AI-ready science. For cosmology, we develop standardized, cloud-accessible data formats for DESI, LSST, CMB, supernova, and strong-lensing data. In collider physics, the preparation of labeled tracking datasets, metadata-rich event samples, and curated simulation outputs is foundational for training high-quality ML models. Across both communities, the American Science Cloud is building automated pipelines for ingestion, validation, metadata generation, and provenance tracking.

As datasets expand, we aim to fully automate the curation process to reduce human effort and ensure that AI tools can operate seamlessly from raw data to final analyses.

Contacts: Stephen Bailey

American Science Cloud

The "AI Universe" project, led by LBNL as part of the DOE American Science Cloud (AmSC) pilots, unites multiple national labs to build an integrated, AI-ready Cosmic Frontier data repository and use foundation models to advance discovery of dark energy, dark matter, and cosmic structure through multimodal astrophysical data analysis.

Contacts: Stephen Bailey, Simone Ferraro, Uros Seljak

AI for Hardware

AI methods support detector innovation across the wide range of experiments pursued across LBNL. In CMB instrumentation, researchers use machine learning and Gaussian-process–driven antenna optimization to design improved mm-wave detectors. We lead efforts on AI-aware chips capable of adapting to faults in real time, enabling more resilient and power-efficient onboard processing. Machine learning also accelerates the evaluation and calibration of hardware components, allowing researchers to optimize designs with fewer fabrication cycles.

Future work will integrate AI into closed-loop design processes for next-generation CMB and collider detectors, reducing iteration time and improving performance.

Contacts: Maurice Garcia-Sciveres, Aritoki Suzuki

Physics for AI

We contribute fundamental research at the intersection of physics and machine learning, ensuring that AI models remain robust and physically grounded. This includes projects such as SEAL, which encourages networks to learn physical symmetries rather than enforcing them rigidly, and Analytics & AI, which uses machine learning to discover optimal collider observables that can be computed exactly using theoretical tools. Our teams also use simulations to probe the robustness of AI models, studying domain shifts and known physical constraints.

This physics-first perspective leads to AI systems that generalize better, offer meaningful uncertainties, and remain interpretable and trustworthy in high-stakes scientific contexts.

Contacts: Nicholas Rodd

A U.S. Department of Energy National Laboratory Operated by the University of California

Page updated

Report abuse