Samuel Blau, Berkeley Lab
Meet: https://meet.google.com/niy-gtpk-sro
YouTube Stream: https://youtube.com/live/ROajuR5p3oA
Join group to receive calendar invite: https://groups.google.com/a/modelingtalks.org/g/talks
Abstract:
Machine learning (ML) models hold the promise of transforming atomic simulations by delivering quantum chemical accuracy at a fraction of the computational cost. Realization of this potential would enable high-throughout, high-accuracy molecular screening campaigns to explore vast regions of chemical space and facilitate ab initio-level simulations at sizes and time scales that were previously inaccessible. However, a fundamental challenge to creating ML models that perform well across molecular chemistry is the lack of comprehensive data for training. Despite substantial efforts in data generation, no large-scale molecular dataset exists that combines broad chemical diversity with a high level of accuracy. To address this gap, we introduce Open Molecules 2025 (OMol25), a large-scale dataset composed of more than 100 million density functional theory (DFT) calculations at the ωB97M-V/def2-TZVPD level of theory, representing billions of CPU core-hours of compute. OMol25 uniquely blends elemental, chemical, and structural diversity including: 83 elements, a wide-range of intra- and intermolecular interactions, explicit solvation, variable charge/spin, conformers, and reactive structures. There are ~83M unique molecular systems in OMol covering small molecules, biomolecules, metal complexes, and electrolytes, including structures obtained from existing datasets. OMol also greatly expands on the size of systems typically included in DFT datasets, with systems of up to 350 atoms. In addition to the public release of the data, we provide baseline models and a comprehensive set of model evaluations to encourage community engagement in developing the next-generation ML models for molecular chemistry.
Bio:
Dr. Samuel M. Blau is a Research Scientist at Berkeley Lab working at the intersection of computational chemistry, materials science, high-performance computing, and machine learning. He received his B.S. in 2012 from Haverford College and his Ph.D. in Chemical Physics from Harvard University in 2017. Sam has pioneered the use of self-correcting molecular simulation workflows to enable the construction of chemical reaction networks describing complex reaction cascades, e.g. those responsible for battery interphase formation and photoresist patterning. Sam's research group also develops novel datasets, representations, and models for machine learning of chemistry and materials as well as methods that leverage ML model speed and differentiability for accelerated scientific discovery.