Data Science Meets Cheminformatics
"Which chemical entities in a dataset of millions upon billions of possible entities show the greatest promise as potential therapeutics?"
Machine Learning Towards a Unified Synthetic Complexity Metric
"How hard would it be to make such a chemical entity in the most efficient process sequence?"
RoboChem: Automating Process Chemistry with Robots
"Can an AI-driven robot determine how to best automate the chemical production process?"
Macromolecular Structural Prediction by Machine Learning
"Can we use machine learning to predict what a DNA, RNA, or protein target looks like SOLELY on the basis of its nucleotide or amino acid sequences?"
Keith, J. A., Vassilev-Galindo, V., Cheng, B., Chmiela, S., Gastegger, M., Müller, K. R., & Tkatchenko, A. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chemical reviews 2021, 121(16), 9816-9872.
Miljković, F., Rodríguez-Pérez, R., & Bajorath, J. Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis. ACS Omega 2021, 6(49), 33293-33299.
Tynes, M., Gao, W., Burrill, D. J., Batista, E. R., Perez, D., Yang, P., & Lubbers, N. Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search. Journal of Chemical Information and Modeling 2021, 61(8), 3846-3857.
Park, J., Beck, B. R., Kim, H. H., Lee, S., & Kang, K. A Brief Review of Machine Learning-Based Bioactive Compound Research. Applied Sciences 2022, 12(6), 2906.
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke, T. The rise of deep learning in drug discovery. Drug discovery today 2018, 23(6), 1241-1250.
Huang, B., & von Lilienfeld, O. A. Ab initio machine learning in chemical compound space. Chemical reviews 2021, 121(16), 10001-10036.
Jones, C. W., Lawal, W., & Xu, X. Emerging Chemistry & Machine Learning. JACS Au 2022, 2(3), 541-542.
von Lilienfeld, O. A., & Burke, K. (2020). Retrospective on a decade of machine learning for chemical discovery. Nature communications 2020, 11(1), 1-4.
Reymond, J. L. The chemical space project. Accounts of Chemical Research 2015, 48(3), 722-730.
Schneider, P., Walters, W. P., Plowright, A. T., Sieroka, N., Listgarten, J., Goodnow, R. A., ... & Schneider, G. Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery 2020, 19(5), 353-364.
Jiménez-Luna, J., Grisoni, F., & Schneider, G. Drug discovery with explainable artificial intelligence. Nature Machine Intelligence 2020, 2(10), 573-584.
"Which chemical entities in a dataset of millions upon billions of possible entities show the greatest promise as potential therapeutics?
Reading List:
Ruddigkeit, L., Van Deursen, R., Blum, L. C., & Reymond, J. L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Journal of chemical information and modeling 2012, 52(11), 2864-2875.
Shoichet, B. K., McGovern, S. L., Wei, B., & Irwin, J. J. Lead discovery using molecular docking. Current opinion in chemical biology 2002, 6(4), 439-446.
Rodríguez-Pérez, R., Miyao, T., Jasial, S., Vogt, M., & Bajorath, J. Prediction of compound profiling matrices using machine learning. ACS omega 2018, 3(4), 4713-4723.
Mysinger, M. M., Carchia, M., Irwin, J. J., & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. Journal of medicinal chemistry 2012, 55(14), 6582-6594.
Lo, Y. C., Rensi, S. E., Torng, W., & Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug discovery today 2018, 23(8), 1538-1546.
Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. Machine learning methods in drug discovery. Molecules 2020, 25(22), 5277.
Smith, J. S., Roitberg, A. E., & Isayev, O. Transforming computational drug discovery with machine learning and AI. ACS Medicinal Chemistry Letters 2018, 9(11), 1065-1069.
Dara, S., Dhamercherla, S., Jadav, S. S., Babu, C. H., & Ahsan, M. J. Machine Learning in Drug Discovery: A Review. Artificial Intelligence Review 2021, 1-53.
Kim, H., Kim, E., Lee, I., Bae, B., Park, M., & Nam, H. Artificial intelligence in drug discovery: a comprehensive review of data-driven and machine learning approaches. Biotechnology and Bioprocess Engineering 2020, 25(6), 895-930.
Huang, B., & von Lilienfeld, O. A. Ab initio machine learning in chemical compound space. Chemical reviews 2021, 121(16), 10001-10036.
Bento, A. P., Hersey, A., Félix, E., Landrum, G., Gaulton, A., Atkinson, F., ... & Leach, A. R. An open source chemical structure curation pipeline using RDKit. Journal of Cheminformatics 2020, 12(1), 1-16.
Sicho, M., Liu, X., Svozil, D., & van Westen, G. J. GenUI: interactive and extensible open source software platform for de novo molecular generation and cheminformatics. Journal of Cheminformatics 2021, 13(1), 1-17.
Satsangi, S., Mishra, A., & Singh, A. K.. Feature Blending: An Approach toward Generalized Machine Learning Models for Property Prediction. ACS Physical Chemistry Au 2021, ASAP.
Janssen, A. P., Grimm, S. H., Wijdeven, R. H., Lenselink, E. B., Neefjes, J., van Boeckel, C. A., ... & van der Stelt, M. Drug discovery maps, a machine learning model that visualizes and predicts kinome–inhibitor interaction landscapes. Journal of Chemical Information and Modeling 2018, 59(3), 1221-1229.
Batra, R., Chan, H., Kamath, G., Ramprasad, R., Cherukara, M. J., & Sankaranarayanan, S. K. Screening of therapeutic agents for COVID-19 using machine learning and ensemble docking studies. The journal of physical chemistry letters 2020, 11(17), 7058-7065.
Gerdes, H., Casado, P., Dokal, A., Hijazi, M., Akhtar, N., Osuntola, R., ... & Cutillas, P. R. Drug ranking using machine learning systematically predicts the efficacy of anti-cancer drugs. Nature communications 2021, 12(1), 1-15.
Rodriguez, S., Hug, C., Todorov, P., Moret, N., Boswell, S. A., Evans, K., ... & Sokolov, A. Machine learning identifies candidates for drug repurposing in Alzheimer’s disease. Nature communications 2021, 12(1), 1-13.
Specific applications in Drug Discovery:
Vignaux, P. A., Minerali, E., Foil, D. H., Puhl, A. C., & Ekins, S. Machine learning for discovery of GSK3β inhibitors. ACS omega 2020, 5(41), 26551-26561.
Anantpadma, M., Lane, T., Zorn, K. M., Lingerfelt, M. A., Clark, A. M., Freundlich, J. S., ... & Ekins, S. Ebola virus Bayesian machine learning models enable new in vitro leads. ACS omega 2019, 4(1), 2353-2361.
Zorn, K. M., Sun, S., McConnon, C. L., Ma, K., Chen, E. K., Foil, D. H., ... & Caffrey, C. R. A machine learning strategy for drug discovery identifies anti-schistosomal small molecules. ACS infectious diseases 2021, 7(2), 406-420.
Walker, A. S., & Clardy, J. A Machine Learning Bioinformatics Method to Predict Biological Activity from Biosynthetic Gene Clusters. Journal of Chemical Information and Modeling 2021, 61(6), 2560-2571.
Resources & Data Sourcing:
"How hard would it be to make such a chemical entity in the most efficient process sequence?"
Reading List:
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H., & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS central science 2017, 3(5), 434-443.
Gao, H., Struble, T. J., Coley, C. W., Wang, Y., Green, W. H., & Jensen, K. F. Using machine learning to predict suitable conditions for organic reactions. ACS central science, 2018 4(11), 1465-1476.
Badowski, T., Gajewska, E. P., Molga, K., & Grzybowski, B. A. Synergy between expert and machine‐learning approaches allows for improved retrosynthetic planning. Angewandte Chemie International Edition 2020, 59(2), 725-730.
Fortunato, M. E., Coley, C. W., Barnes, B. C., & Jensen, K. F. Data augmentation and pretraining for template-based retrosynthetic prediction in computer-aided synthesis planning. Journal of chemical information and modeling 2020, 60(7), 3398-3407.
Schreck, J. S., Coley, C. W., & Bishop, K. J. Learning retrosynthetic planning through simulated experience. ACS central science 2019, 5(6), 970-981.
Papadakis, E., Anantpinijwatna, A., Woodley, J. M., & Gani, R. A reaction database for small molecule pharmaceutical processes integrated with process information. Processes 2017, 5(4), 58.
Resources & Data Sourcing:
"Can an AI-driven robot determine how to best automate the chemical production process?"
Reading List:
Duros, V., Grizou, J., Sharma, A., Mehr, S. H. M., Bubliauskas, A., Frei, P., ... & Cronin, L. Intuition-enabled machine learning beats the competition when joint human-robot teams perform inorganic chemical experiments. Journal of chemical information and modeling 2019, 59(6), 2664-2671.
Steiner, S., Wolf, J., Glatzel, S., Andreou, A., Granda, J. M., Keenan, G., ... & Cronin, L. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 2019, 363(6423), eaav2211.
Kitson, P. J., Glatzel, S., & Cronin, L. The digital code driven autonomous synthesis of ibuprofen automated in a 3D-printer-based robot. Beilstein journal of organic chemistry 2016, 12(1), 2776-2783.
Asche, S., Cooper, G. J., Keenan, G., Mathis, C., & Cronin, L. A robotic prebiotic chemist probes long term reactions of complexifying mixtures. Nature communications 2021, 12(1), 1-9.
Angelone, D., Hammer, A. J., Rohrbach, S., Krambeck, S., Granda, J. M., Wolf, J., ... & Cronin, L. Convergence of multiple synthetic paradigms in a universally programmable chemical synthesis machine. Nature Chemistry 2021, 13(1), 63-69.
Sans, V., Porwol, L., Dragone, V., & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chemical science 2015, 6(2), 1258-1264.
Caramelli, D., Granda, J. M., Mehr, S. H. M., Cambié, D., Henson, A. B., & Cronin, L. Discovering New Chemistry with an Autonomous Robotic Platform Driven by a Reactivity-Seeking Neural Network. ACS central science 2021, 7(11), 1821-1830.
Hammer, A. J., Leonov, A. I., Bell, N. L., & Cronin, L. Chemputation and the standardization of chemical informatics. JACS Au 2021, 1(10), 1572-1587.
Ley, S. V., Fitzpatrick, D. E., Ingham, R. J., & Myers, R. M. Organic synthesis: march of the machines. Angewandte Chemie International Edition 2015, 54(11), 3449-3464.
Liu, C., Xie, J., Wu, W., Wang, M., Chen, W., Idres, S. B., ... & Wu, J. Automated synthesis of prexasertib and derivatives enabled by continuous-flow solid-phase synthesis. Nature chemistry 2021, 13(5), 451-457.
Bornemann‐Pfeiffer, M., Wolf, J., Meyer, K., Kern, S., Angelone, D., Leonov, A., ... & Emmerling, F. Standardization and Control of Grignard Reactions in a Universal Chemical Synthesis Machine using online NMR. Angewandte Chemie International Edition 2021, 60(43), 23202-23206.
Nambiar, A. M., Breen, C. P., Hart, T., Kulesza, T., Jamison, T. F., & Jensen, K. F. Bayesian Optimization of Computer-Proposed Multistep Synthetic Routes on an Automated Robotic Flow Platform. ACS Central Science 2022.
Liu, J., Sato, Y., Yang, F., Kukor, A. J., & Hein, J. E. An Adaptive Auto‐Synthesizer using Online PAT Feedback to Flexibly Perform a Multistep Reaction. Chemistry‐Methods 2002, e202200009.
Resources & Data Sourcing:
"Can we use machine learning to predict what a DNA, RNA, or protein target looks like SOLELY on the basis of its nucleotide or amino acid sequences?"
Xu, Y., Verma, D., Sheridan, R. P., Liaw, A., Ma, J., Marshall, N. M., ... & Johnston, J. M. Deep dive into machine learning models for protein engineering. Journal of chemical information and modeling 2020, 60(6), 2773-2790.
Mazurenko, S., Prokop, Z., & Damborsky, J. Machine learning in enzyme engineering. ACS Catalysis 2019, 10(2), 1210-1223.
Castillo-Hair, S. M., & Seelig, G. Machine Learning for Designing Next-Generation mRNA Therapeutics. Accounts of chemical research 2021, 803-809.
Moore, P. B., Hendrickson, W. A., Henderson, R., & Brunger, A. T. The protein-folding problem: Not yet solved. Science 2022, 375(6580), 507-507.
Dubchak, I., Muchnik, I., Holbrook, S. R., & Kim, S. H. Prediction of protein folding class using global description of amino acid sequence. Proceedings of the National Academy of Sciences 1995, 92(19), 8700-8704.
Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., ... & Hassabis, D. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577(7792), 706-710.
McGuffin, L. J., Bryson, K., & Jones, D. T. The PSIPRED protein structure prediction server. Bioinformatics 2000, 16(4), 404-405.
Baker, D., & Sali, A. Protein structure prediction and structural genomics. Science 2001, 294(5540), 93-96.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596(7873), 583-589.
Singh, A. Deep learning 3D structures. Nature Methods 2020, 17(3), 249-249.
Wei, L., & Zou, Q. Recent progress in machine learning-based methods for protein fold recognition. International journal of molecular sciences 2016, 17(12), 2118.
Jo, T., Hou, J., Eickholt, J., & Cheng, J. Improving protein fold recognition by deep learning networks. Scientific reports 2015, 5(1), 1-11.
Khatib, F., Cooper, S., Tyka, M. D., Xu, K., Makedon, I., Popović, Z., & Baker, D. Algorithm discovery by protein folding game players. Proceedings of the National Academy of Sciences 2011, 108(47), 18949-18953.
Bashir, A., Yang, Q., Wang, J., Hoyer, S., Chou, W., McLean, C., ... & Ferguson, B. S. Machine learning guided aptamer refinement and discovery. Nature communications 2021, 12(1), 1-11.
Berliner, N., Teyra, J., Colak, R., Garcia Lopez, S., & Kim, P. M. Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation. PloS one 2014, 9(9), e107353.
Detlefsen, N. S., Hauberg, S., & Boomsma, W. Learning meaningful representations of protein sequences. Nature communications 2022, 13(1), 1-12.
Resources & Data Sourcing: