Abstract:
In silico materials design requires navigating vast and diverse chemical spaces. While ab initio methods have been transformative for materials simulations, their high computational cost and poor scaling limit their reach. In this talk, I will introduce foundational potentials (FPs), i.e., machine learning interatomic potentials with near-universal coverage of the periodic table. Acting as efficient surrogates for expensive ab initio calculations, FPs enable exploration of the materials universe at unprecedented scales and accuracy. I will outline the physics-informed AI architectures that underpin these models, highlight recent advancements, and demonstrate their impact on large-scale materials design. I will also discuss the remaining challenges and opportunities for FPs, and share perspectives on how to maximize the “return on data” in materials R&D.
Bio:
Shyue Ping Ong is a professor in the Aiiso Yufeng Li Family Department of Chemical and Nano Engineering at the University of California, San Diego. He earned his PhD from the Massachusetts Institute of Technology in 2011 and leads the Materials Virtual Lab, an interdisciplinary research group applying materials science, computer science, and data science to accelerate materials discovery and design. Ong is a leading researcher in the application of AI and machine learning to materials design. He pioneered the concept of foundational potentials with universal coverage of the periodic table, which have transformed the scale and speed of materials exploration. He is also well-known for his efforts in democratizing access to materials data and software. He is one of the founding developers of the Materials Project and the founder of Python Materials Genomics (pymatgen), an open-source materials analysis library used by hundreds of thousands of researchers worldwide. Ong has published more than 170 articles in materials informatics and has been named a Clarivate Highly Cited Researcher since 2021. He is a recipient of the prestigious U.S. Department of Energy Early Career Research Program and the Office of Naval Research Young Investigator Program awards.
Summary:
Focus: Materials Science
Science: physical simulations
Data: collect data from many simulations
ML: create predictive models, data mining and screening based on simulations
Applications: solid state lighting, aerospace alloys, etc.
Community interactions: OSS Software PyMatgen (https://pymatgen.org/), Materials Project (https://next-gen.materialsproject.org/)
Direct physical modeling
Schrodinger equation: very detailed, too complex to simulate
Density Functional theory: approximate many-electron systems via their density functions
Makes it possible to predict material properties of larger systems (at still a high computational cost)
Materials project: collection of DFT calculations of many materials
Challenge: quantum-mechanical methods (Schrodinger, DFT) are expensive for large systems
Scale cubically or worse with number of atoms, limited to ~1k atoms
Number of new calculations added to the Materials project is slowing down because we are running out of smaller, simpler systems that are within reach of DFT
Tradeoff in model cost, transferability and accuracy
QM methods: transferable and accurate
Empirical potentials: cheaper but not transferable and less accurate
Continuum models: much cheaper but transferability and accuracy are poor
New work:
ML interatomic potentials: accuracy, cheap, moderately transferable
Foundation Potentials: cheap, transferable and accurate
Mapping the Potential Energy Surface (PES) with Machine-Learned Interatomic Potentials (MLIP)
PES is mapped at many atomic configuration points using DFT simulations
ML models are trained on this data and can interpolate PES values outside of the sampled DFT points
Accurate, fast and can be generated using automated workflows
Application: solid electrolytes for batteries
Cathode - electrolyte - anode
Need to predict the electrical conductivity of the electrolyte
Simulation-based approach: simulate for short time scales at high temperatures, extrapolate to low temperatures. Extrapolation yields poor accuracy.
Using MLIP allows direct prediction for room-temperature behavior, much higher accuracy
Challenge: custom-fit MLIP can’t generalize to regions of the energy space away from our available DFT simulation data
Goal: create a Foundation Potential model for the entire periodic table
Approach: message-passing graph ML model
Materials 3-body Graph Networks: https://github.com/materialsvirtuallab/m3gnet
Insight: the DFT simulations in the Materials Project gives enough data to train the 3-body interaction models (89 elements)
Accuracy is reasonably close to DFT
Since M3GNet many other graph-based ML models
Foundational Data
Lots of data becoming available
Mode accuracy with larger models and training size,
Though as models get larger the cost of training and applying them is larger, which reduces their value
MatPES: https://matpes.ai
Smaller dataset of materials calculations
Data chosen to be more representative of different structures and atomic environments
High quality PBE & R2SCAN calculations
Can enable much cheaper model training
Training on MatPES: O($1k), while Training on OMat24: O($1m)
Training on MatPES can provide similar results to training on OMat24
Software
MatCalc (http://matcalc.ai): easy access to materials properties from ML models
MatGL (https://matgl.ai): Materials Graph Library
Applications: Solid electrolytes for solid-state batteries
Predicting Real World Synthesis Conditions
Need to predict both the materials appropriate for a purpose and the conditions under which the material will have those properties
Focus: crystal structures under various temperatures, etc.
Can enable design of amorphous solid electrolyte to get higher conductivity
Designed new electrolyte material in a week vs years
Many more materials are interesting as potential candidates, which need to be screened in the future
Bridging scales to continuum simulations: DFT->Molecular->Continuum dislocation dynamics