ROC curves are created as previously described, P values is calculated by following Hanley’s method 4. Scatter plots and bar plots are created using excel. For parallel experiment, identity genes from only 5 random cell types were used to train, and the others were used to test, all overlapped genes are removed from training and testing datasets. Similarly, for different number of cell types experiment, identity genes used in training and testing were from different cell types.
Plots were created using excel. Different sizes of random controls genes were created as previously described. AUCs were calculated as previously described. To robustness of the model, labels of genes were swapped in four different ways: Only swap positive to negative, only swap negative to positive, swap equal number of positive genes to negative and negative genes to positive, and randomly change the labels of genes. After swapping, all experiments were performed as previously described.