FAQ

DTC Lab Software Tools

Click on the queries to read the response

Some QSAR modeling tips and/or What are the different ways to improve a QSAR model?

Collect as many as possible, chemicals with known activity/toxicity/property of your interest.
Always collect the data from reliable sources like published articles, well-known online databases, etc.
The collection should ideally include all range of chemicals with high, moderate, and low or no activity (in terms of pIC50, ideally the range [i.e., maxActivity-minActivity] should be >3 )
Remove structural and response outliers, if any
Make sure there are no duplicates or activity cliffs in the modeling set
Try to calculate a wide range of descriptors, though you may always start with simple meaningful descriptors (constitutional, electrotopological state atom, etc.), and if you are not satisfied with the model, then you should calculate all kinds of descriptors.
Try different division methods: random as well as rational methods (especially for small to moderate data sets)
For small data sets (<30-50 data points), you may avoid data set division step. Use whole data sets as training set and perform cross validation techniques (leave-one-out, leave-many-out, etc.) to validate the models.
You can employ double cross-validation technique to find more diverse QSAR models, especially, for small data sets
Perform Best subset selection using top 20-30 descriptors identified from Genetic algorithms, Stepwise-MLR, etc. to identify best possible models from those descriptors.
If you are not getting a good regression model, try developing a classification-based model.

I am unable to run .jar file, nothing happens after clicking on executable jar files?

Make sure that Java is installed in your computer. You can install Java from here (https://www.java.com/en/download/). If Java is already installed and still clicking on .jar file result in nothing (sometimes one may see a black screen for a moment). This may happen due to issue with jar association on Windows. This issue can be solved using this tool (http://johann.loefflmann.net/en/software/jarfix/index.html). Download and run this tool once and then try running any .jar file again.

Which file format for input files is recommended .csv or .xlsx and why?

Though for some of the tools, both file formats are allowed for input file, but it is always advisable to use .csv file format whenever possible, especially when your input file is bigger in size. Reading .csv input files by the software are comparatively faster than reading .xlsx files.

The software is showing error reading a input file. What are the possible reasons?

Error in reading a input file usually occurs when there is some issue with the input file format such as, missing descriptor values, presence of non-numerical values in descriptor columns (listed below), or when the input file format is not followed properly, etc. Always check the sample input files in the 'Data' folder and/or provided in the relevant software webpage (Snapshots section) to confirm that your input file is properly prepared or not.

Some possible non-numerical values (examples) that are usually present in the descriptor columns, which creates issue in reading the input file by the software:

1) 0.12 E-5 (*Possible Solution: Replace it with 0.0000012)

2) NaN (*Possible Solution: Remove this descriptor)

3) NA (*Possible Solution: Remove this descriptor)

4) - (*Possible Solution: Remove this descriptor)

5) SMILES notation (*Possible Solution: Remove this column from the input file),

etc.

How to run the software using command prompt (Windows)?

Follow these steps to run the jar file using command prompt (also see the snaps attached):

1) run 'Command prompt' in windows. For example, type 'command prompt' in the search option in windows and click on the app.

2) Using 'cd' command go to the software folder, where the jar file is present, for example, cd D:\path\...\SoftwareFolder\ and press enter.

3)Type this command 'java -jar <jar-file-name>.jar' and press enter.

How to run 'Command prompt' in Windows.

How to run the JAR file using command prompt

The output file generated after running 'Best subset selection' is blank/empty. Why?

Since 'best subset selection' approach generate and develop models for all possible combinations (subsets) from the total descriptors present in your input file, which can be highly computationally extensive task. Thus, user have options to define r^2 (internal validation) cut-off and inter-correlation cut-off values, which is useful to reduce the computational time by storing only those models with R^2> set cut-off, as well as, to remove models with inter-correlated descriptors using the set threshold. But these cut-off can be sometimes more stringent and thus no model is found passing the set criteria resulting in blank output. In such cases, user should repeat the best subset selection with less stringent criteria, if feasible.

The software is showing a missing value error (shown in the snapshot below). What to do?

Along with the missing value error, the tool will also show the exact location of the missing value in the input file, i.e., Row number and Column number. Thus, if you have a missing value at that location, please rectify the issue by removing the column (descriptor) or perform imputation to fill the missing value.

However, if there is no missing value and/or if the detected column number is not present in the input file. Then, this is an issue most probably caused due to a hidden endline character that is present in the input file. Fortunately, this can be easily solved by freshly preparing the input file:

HOW TO Freshly prepare the input file

Select and copy the entire data (only data) in your current input file (.csv or .xlsx) and paste it in new .csv or .xlsx file and save the new file. Now use the newly saved file as your input file. You should not face the issue anymore.

Page updated

Google Sites

Report abuse