Machine learning methods for single cell data analysis

We created Portal, a unified framework of to learn harmonized representations of datasets. With innovation in model and algorithm designs, Portal achieves superior performance in preserving biological variation during integration, while having significantly reduced running time and memory compared to existing approaches, achieving integration of millions of cells in minutes with low memory consumption. We demonstrate the efficiency and accuracy of Portal using diverse datasets ranging from mouse brain atlas projects, the Tabula Muris project, and the Tabula Microcebus project. Portal has broad applicability and in addition to integrating multiple scRNA-seq datasets, it can also integrate scRNA-seq with single-nucleus RNA-sequencing (snRNA-seq) data. Finally, we demonstrate the utility of Portal by applying it to the integration of cross-species datasets with limited shared-information between them, and are able to elucidate biological insights into the similarities and divergences in the spermatogenesis process between mouse, macaque, and human.

Benchmarking study

Reference