Modeling Complex Systems of Chemical Reactions

Prof. William H. Green, MIT Dept. of ChemEng

Abstract:
Many societally important systems involve multiple chemical reactions. These include smog, the ozone hole, the environmental fate of materials, and processes to make fuels, materials, and pharmaceuticals. Accurate predictive models for any of these systems are or would be valuable, either to allow design and optimization of desired processes, or to guide actions to ameliorate undesired processes. But developing reactive chemistry models with sufficient predictive power is difficult. One of many challenges in this field is data scarcity, and the “clumpiness” of the data that do exist. In this presentation I will briefly review how the need for chemical reaction models has been addressed historically, and then discuss some ways we have recently combined physical models, experimental data, high-performance computers, and machine-learning techniques to make further progress.

Bio:

William H. Green is a world leader in chemical kinetics, reaction engineering, prediction of chemical reactions and properties, and in development of related algorithms and software. He has led many combined experimental/modeling research projects related to fuels, combustion, pyrolysis, and oxidative stability, and he invented an instrument to directly measure rate coefficients for multi-channel reactions. He developed computer methods to predict the behavior of complicated reacting mixtures, many of which are included in the Reaction Mechanism Generator software package. His group has also developed machine-learning methods and software (ASKCOS and Chemprop) for accurately predicting the products of organic reactions and reaction sequences leading to desired products, and for predicting many chemical properties. Prof. Green also invents and analyzes technologies to reduce greenhouse gas emissions, particularly in the transportation/fuel sector. Two of his inventions are now being commercialized, one by Thiozen, a company he co-founded. Prof. Green earned his B.A. from Swarthmore College, and his Ph.D. in Physical Chemistry from the University of California at Berkeley. After postdocs at Cambridge University and the University of Pennsylvania, he worked for Exxon for six years before joining the Chemical Engineering faculty at MIT in 1997. He has co-authored 350 journal articles, which have been cited about 23,000 times. He is a Fellow of the AAAS and of the Combustion Institute, and has received the American Chemical Society’s Glenn Award in Energy & Fuel Chemistry and the AIChE’s Wilhelm Award in Reaction Engineering. He previously served as the Editor of the International Journal of Chemical Kinetics, as the faculty chair of MIT’s Mobility of the Future project, and as the Executive Officer of the MIT Department of Chemical Engineering.

Summary

Focus: modeling complex systems of chemical reactions
Goals:
- Understand and Design chemical system
- This requires accurate predictive modeling that captures
  - The key performance properties of chemicals
  - The steps needed to synthesize them
  - The degradation of chemicals over time (e.g. in storage)
- Key piece: prediction of reactions and reaction rates
RMS: expert system for chemical modeling
- Given a mixture of chemicals and conditions: predict how it will evolve over time
- Attached to chemical solvers
- Challenge: limited accuracy for predictions for chemical properties
- Chemprop: Machine-learned model of chemical properties
Prediction tasks
- Chemical synthesis (ASKCOS)
  - Rates
  - Products
- Construct Kinetic Models and solve them for products, rates
  - Reaction Model Generator (RMG)
  - Validated against experimental datasets
Chemical engineering
- Lab experiments (small synthesis unit, “cheap”)
- Pilot plant (close in size to full plant, expensive)
- Fit simple extrapolation model to predict how a full factory will work
- Predict behavior of factory, then build it
- Problem: often these designs don’t actually work, so this flow is risky and expensive
- Goal: skip the pilot plant and predict from lab experiment to factory
- Challenge: model accuracy (numerical error, accounting for all reactions)
  - Rate predictions most vulnerable
Chemistry is a Big problem: many molecules and reactions
- Total number of possible molecules is ~1020, number of interactions is > 1040
- Only a tiny fraction of these have been evaluated
- PubChem database contains 108 molecules,
  - 107 molecules have actually been studied
  - 107 reactions have been studied
- Want to predict molecule properties before expensive predictions
- Challenge:
  - Data on molecule properties and reactions is sparse
  - Hard to access data from papers (publisher restrictions and unstructured data extraction challenge)
  - ~104 molecules have detailed datasets
Traditional approach:
- Infer based on related molecules
- Only works for molecule types with a lot of data
ML-based alternative: learned fingerprint
- Chemprop: Graph neural network
- Learned to extend this to pairs of molecules (solvent and solute)
- Challenging to get enough data to train a Chemprop model
Physical fundamentals-driven ML:
- Use underlying physical constraints to guide modeling
- E.g. Shrodinger’s equation, statistical mechanics and rate theory
- Challenge: very computationally-intensive, so either run out of compute or numerical approximations reduce accuracy
- Synthesis:
  - Use quantum chemistry calculations (e.g. COSMO-RS) to create a training dataset for an ML model
  - Use model to tweak parameters for higher-level simulations (e.g. Solvation)
  - There’s a point where the error from using the model reaches the error level of experimental data: this corresponds to the level of noise in the data
- Applied approach to multiple types of properties: thermochemistry, solvation energies, reaction barriers, spectra
Can use ML to accelerate conventional first-principles (Quantum Chemistry) calculations
- Guess 3D geometries of different molecules to estimate their potential space
- Feedback training: used successes in model to create larger dataset, which improved model predictions
- Large HPC-heavy workflow
- Developed heuristic estimators to detect a model that is not converging and stop it early
- ML model is more accurate than Density Functional Theory models but not as accurate as models that account for spatial distribution of electrons (very computationally expensive)
- Can use ML trained on quantum to predict solvent effects on reaction rates; good accuracy with only simulations, no experimental data
Workflow
- RMG - Reaction Mechanism Generator (https://rmg.mit.edu/): Database of molecular properties
- Use parameters from RMG as inputs for dynamic reaction simulations
- Get predictions of chemical parameters, feed back into database
- Validate against experiments
Example application: high-temperature pyrolysis
- Convert natural gas + waste into ethylene acetylene
- RMG build a kinetic model to predict 71 pilot plant experiments on 12 different feed compositions; 664 chemical species, 8121 reactions
- Pure prediction, except for the free parameter: heat loss through reactor walls
- Predictions were very close to experiment but not perfect; much cheaper than building the pilot plant
  - Sponsor used simulations to design changes to the process
- Working towards
  - Heavier molecules: more dynamical complexity and more chemical species, so error is higher
  - Multiple phases of reactants: film on surfaces of materials, liquid flowing and vapor
Challenge: quantum chemistry for large molecules with complex 3D structures
- Computationally expensive
- Less experimental data
- 3d effects (e.g. folding) are highly variable and dynamic
- Huge numbers of reactions
- Some progress in this area
  - RMG works in many cases but many failure cases too
  - For >10k species direct ODE solves are too complex