All edges, closeness and nodes information are downloaded from GRN CellNet7. For each cell type specific cell identity sub network, only the cell identity nodes were kept. Known identity drivers were defined as Figure 1 described. And control transcription factors are randomly selected from all known transcription factors except the transcription factors in known identity drivers and identity required genes. Due to the small number of identity drivers in the training datasets, SMOTE is used to expand the positive labeled genes in training datasets by following the distribution of features’ value of the known identity drivers. All performance tests are conducted as previously described in Figure 3. Cell type specificity is calculated by using Tau index (ref), and scaled between -1 and 1.
Design of CRISPR gRNA and mutation primers