Small Dataset QSAR Modelling


Small Dataset Modeler

(using Exhaustive Double Cross-validation approach)

As the name suggests, this tool is dedicated to QSAR modeling of small data sets. It employs exhaustive double cross-validation approach and a set of optimal model selection techniques including consensus predictions for performing the small-dataset QSAR modelling. It performs four basic steps, i.e., i) Data Pre-treatment, ii) Model development using exhaustive double cross-validation approach, iii) Selection of optimal model and iv) Model Validation (both internal and external).

More details will be added shortly.

To download the Small Dataset Modeler tool (updated version uploaded on 07 May 2022): Click here



Small Dataset Curator

To perform duplicate analysis (descriptor-based) as well as to identify structural outliers, response outliers and activity cliffs that might be present in the small data sets employed for QSAR modeling

To download the Small Dataset Curator tool (beta version, uploaded on 6 June 2019): Click here


Simple QSAR Modelling Tool (using Hold-out approach)

This is a basic tool, which is helpful to perform QSAR modeling of moderate to large data sets (i.e. > at least 50 compounds). Though it is not suitable for the small data sets modelling, however, one may compare the quality of QSAR models developed for small-data sets using the conventional approach (i.e., this tool) and using the dedicated technique (i.e., Small-Dataset Modeler tool mentioned above). Anyways, this tool can be always used for QSAR modelling of moderate to large data sets. Further, the tool assumes that the user will provide the training set and test set data (i.e, CompdID, Descriptor matrix and Response) as input information and it performs three basic steps, i.e., i) "Data-pretreatment" (i.e., remove constant and inter-correlated descriptors), ii) "Variable selection and Model development" using stepwise-MLR and genetic algorithm-MLR and finally, iii) "Model Validation" (both internal and external validation metrics are computed).

To download the Simple QSAR Modelling tool: Click here


Reference [Cite this Article]

Pravin Ambure, Agnieszka Gajewicz, M. Natália D.S. Cordeiro, Kunal Roy. A New Workflow for QSAR Model Development from Small Data Sets: Integration of Data Curation, Exhaustive Double Cross-Validation and A Set of Optimal Model Selection Techniques, J. Chem. Inf. Model. 2019, 59, 10, 4070–4076. https://doi.org/10.1021/acs.jcim.9b00476