Modeling Complex Systems of Chemical Reactions
Abstract:
Many societally important systems involve multiple chemical reactions. These include smog, the ozone hole, the environmental fate of materials, and processes to make fuels, materials, and pharmaceuticals. Accurate predictive models for any of these systems are or would be valuable, either to allow design and optimization of desired processes, or to guide actions to ameliorate undesired processes. But developing reactive chemistry models with sufficient predictive power is difficult. One of many challenges in this field is data scarcity, and the “clumpiness” of the data that do exist. In this presentation I will briefly review how the need for chemical reaction models has been addressed historically, and then discuss some ways we have recently combined physical models, experimental data, high-performance computers, and machine-learning techniques to make further progress.
Bio:
William H. Green is a world leader in chemical kinetics, reaction engineering, prediction of chemical reactions and properties, and in development of related algorithms and software. He has led many combined experimental/modeling research projects related to fuels, combustion, pyrolysis, and oxidative stability, and he invented an instrument to directly measure rate coefficients for multi-channel reactions. He developed computer methods to predict the behavior of complicated reacting mixtures, many of which are included in the Reaction Mechanism Generator software package. His group has also developed machine-learning methods and software (ASKCOS and Chemprop) for accurately predicting the products of organic reactions and reaction sequences leading to desired products, and for predicting many chemical properties. Prof. Green also invents and analyzes technologies to reduce greenhouse gas emissions, particularly in the transportation/fuel sector. Two of his inventions are now being commercialized, one by Thiozen, a company he co-founded. Prof. Green earned his B.A. from Swarthmore College, and his Ph.D. in Physical Chemistry from the University of California at Berkeley. After postdocs at Cambridge University and the University of Pennsylvania, he worked for Exxon for six years before joining the Chemical Engineering faculty at MIT in 1997. He has co-authored 350 journal articles, which have been cited about 23,000 times. He is a Fellow of the AAAS and of the Combustion Institute, and has received the American Chemical Society’s Glenn Award in Energy & Fuel Chemistry and the AIChE’s Wilhelm Award in Reaction Engineering. He previously served as the Editor of the International Journal of Chemical Kinetics, as the faculty chair of MIT’s Mobility of the Future project, and as the Executive Officer of the MIT Department of Chemical Engineering.
Summary
Focus: modeling complex systems of chemical reactions
Goals:
Understand and Design chemical system
This requires accurate predictive modeling that captures
The key performance properties of chemicals
The steps needed to synthesize them
The degradation of chemicals over time (e.g. in storage)
Key piece: prediction of reactions and reaction rates
RMS: expert system for chemical modeling
Given a mixture of chemicals and conditions: predict how it will evolve over time
Attached to chemical solvers
Challenge: limited accuracy for predictions for chemical properties
Chemprop: Machine-learned model of chemical properties
Prediction tasks
Chemical synthesis (ASKCOS)
Rates
Products
Construct Kinetic Models and solve them for products, rates
Reaction Model Generator (RMG)
Validated against experimental datasets
Chemical engineering
Lab experiments (small synthesis unit, “cheap”)
Pilot plant (close in size to full plant, expensive)
Fit simple extrapolation model to predict how a full factory will work
Predict behavior of factory, then build it
Problem: often these designs don’t actually work, so this flow is risky and expensive
Goal: skip the pilot plant and predict from lab experiment to factory
Challenge: model accuracy (numerical error, accounting for all reactions)
Rate predictions most vulnerable
Chemistry is a Big problem: many molecules and reactions
Total number of possible molecules is ~1020, number of interactions is > 1040
Only a tiny fraction of these have been evaluated
PubChem database contains 108 molecules,
107 molecules have actually been studied
107 reactions have been studied
Want to predict molecule properties before expensive predictions
Challenge:
Data on molecule properties and reactions is sparse
Hard to access data from papers (publisher restrictions and unstructured data extraction challenge)
~104 molecules have detailed datasets
Traditional approach:
Infer based on related molecules
Only works for molecule types with a lot of data
ML-based alternative: learned fingerprint
Chemprop: Graph neural network
Learned to extend this to pairs of molecules (solvent and solute)
Challenging to get enough data to train a Chemprop model
Physical fundamentals-driven ML:
Use underlying physical constraints to guide modeling
E.g. Shrodinger’s equation, statistical mechanics and rate theory
Challenge: very computationally-intensive, so either run out of compute or numerical approximations reduce accuracy
Synthesis:
Use quantum chemistry calculations (e.g. COSMO-RS) to create a training dataset for an ML model
Use model to tweak parameters for higher-level simulations (e.g. Solvation)
There’s a point where the error from using the model reaches the error level of experimental data: this corresponds to the level of noise in the data
Applied approach to multiple types of properties: thermochemistry, solvation energies, reaction barriers, spectra
Can use ML to accelerate conventional first-principles (Quantum Chemistry) calculations
Guess 3D geometries of different molecules to estimate their potential space
Feedback training: used successes in model to create larger dataset, which improved model predictions
Large HPC-heavy workflow
Developed heuristic estimators to detect a model that is not converging and stop it early
ML model is more accurate than Density Functional Theory models but not as accurate as models that account for spatial distribution of electrons (very computationally expensive)
Can use ML trained on quantum to predict solvent effects on reaction rates; good accuracy with only simulations, no experimental data
Workflow
RMG - Reaction Mechanism Generator (https://rmg.mit.edu/): Database of molecular properties
Use parameters from RMG as inputs for dynamic reaction simulations
Get predictions of chemical parameters, feed back into database
Validate against experiments
Example application: high-temperature pyrolysis
Convert natural gas + waste into ethylene acetylene
RMG build a kinetic model to predict 71 pilot plant experiments on 12 different feed compositions; 664 chemical species, 8121 reactions
Pure prediction, except for the free parameter: heat loss through reactor walls
Predictions were very close to experiment but not perfect; much cheaper than building the pilot plant
Sponsor used simulations to design changes to the process
Working towards
Heavier molecules: more dynamical complexity and more chemical species, so error is higher
Multiple phases of reactants: film on surfaces of materials, liquid flowing and vapor
Challenge: quantum chemistry for large molecules with complex 3D structures
Computationally expensive
Less experimental data
3d effects (e.g. folding) are highly variable and dynamic
Huge numbers of reactions
Some progress in this area
RMG works in many cases but many failure cases too
For >10k species direct ODE solves are too complex