KNIME Workflows for Data Curation
[Uploaded on 22 July 2016]
Chemical Curation Workflow: Click Here To Download
Biological Curation Workflow: Click Here To Download
If KNIME is not yet installed in your computer !!!
Massive screenings of large chemical libraries against panels of biological targets have led to the rapid expansion of publicly available databases such as ChEMBL, PubChem, BindingDB etc. A basic assumption of any cheminformatics study is the accuracy of the input data available in various databases. However, one should be concerned about the poor quality and the irreproducibility of both the chemical and biological records present in such databases. Curating both chemical and biological data, i.e., verifying the accuracy, consistency, and reproducibility of the reported experimental data is critical for the success of any cheminformatics studies, including Quantitative Structure-Activity Relationships (QSAR) .
The KNIME workflows available here (see above), can be used to perform the chemical and biological curation of any raw datasets. Note that the KNIME software and the nodes that were used to develop the KNIME workflows (available here) are free. Further, if one would like to know the basics of KNIME workflow system and how to use it, the following YouTube video would be really helpful:
INPUT FILE : Structure-Data File (SDF)
1. Fourches, Denis, Eugene Muratov, and Alexander Tropsha. "Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research." Journal of chemical information and modeling, 50, 7 (2010): 1189-1204.
2. Fourches, Denis, Eugene N. Muratov, and Alexander Tropsha. "Trust, But Verify II: A Practical Guide to Chemogenomics Data Curation." Journal of Chemical Information and Modeling (2016).