This piece, by Onno Berkan, was published on 11/05/24. The original text, by Durant et al., was published by Nature Computational Science on 10/15/24.
This Oxford study discusses machine learning (ML) in drug discovery, focusing on developing small-molecule therapeutics. It highlights ML's promise of speeding up the drug development process but also outlines significant challenges related to data quality and availability. While ML has excelled in fields like computer vision and natural language processing, its application in drug discovery has been disappointing due to a lack of high-quality data.
The research emphasizes that most available data is limited and often biased, affecting ML algorithms' performance. In particular, there's a problem with "negative data" or instances where compounds fail to bind to targets. This kind of data is seldom reported, leading to an imbalance in the datasets used for training models. The study suggests that researchers must focus more on collecting and leveraging comprehensive datasets to improve model accuracy.
Method validation, or testing models in real-world scenarios rather than controlled conditions, is crucial for understanding their effectiveness. Current benchmarks used in ML often need to be updated or reflect practical applications. The authors argue for a shift towards more rigorous validation methods and better data-sharing practices, including successful and unsuccessful experiments.
Furthermore, the study explores advanced data-collection methods like crowd-sourced data and widely available datasets from the scientific literature. By tapping into underused resources, researchers can enhance the variety and richness of data essential for training robust ML models. Overall, the research calls for a balanced consideration of data quality and quantity, and the need for thorough evaluation to truly advance drug discovery using ML technologies.
Want to submit a piece? Or trying to write a piece and struggling? Check out the guides here!
Thank you for reading. Reminder: Byte Sized is open to everyone! Feel free to submit your piece. Please read the guides first though.
All submissions to berkan@usc.edu with the header “Byte Sized Submission” in Word Doc format please. Thank you!