My topic : Predicting bioactivities of PFAS using unsupervised / semi-supervised learning

Motivation

We want to know bioactivities of PFAS, however, experimental data is limited. Therefore, we assume that molecules with similar chemical structures will demonstrate similar bioactivities. There is published research that used supervised learning to predict bioactivity based on chemical structures


("Using Machine Learning to Classify Bioactivity for 3486 Per- And Polyfluoroalkyl Substances (PFASs) from the OECD List", Cheng, Weixiao; Ng, Carla A., Environmental Science and Technology, 2019, 53 (23), 13970-13980, DOI: 10.1021/acs.est.9b04833)

However, supervised learning does not give reasons for the predictions. We would like to learn which substructures majorly affect bioactivities. Therefore, we decided to use unsupervised learning.