Problem Statement: Despite the advent of single-cell technology, it is challenging to classify cells into specific and well-defined categories using the data from sc/snRNA-seq due to the difficulty of distinguishing the particular variations between related cell types with comparable transcriptional patterns.
NSForest is an algorithm from JCVI that combines random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations for cell type identification
Single cell RNA-Seq methods are increasingly important in understanding tissue and disease heterogeneity
A bottleneck in the scRNA-Seq pipeline is cell matching, which is to classify cell types from scRNA-Seq data
If successful, our project aims to improve the scRNA-Seq pipeline, impacting the quality of scRNA-Seq analysis
NSForest performs relatively poorly on close but functionally distinct cell types/clusters.
Improve performance within close clusters without sacrificing overall performance significantly
Examine effect of cluster size bias and correct
Package NSForest to make it readily available
Credit: Seunghyun Lee