Machine learning methods for single cell data analysis
We created Portal, a unified framework of to learn harmonized representations of datasets. With innovation in model and algorithm designs, Portal achieves superior performance in preserving biological variation during integration, while having significantly reduced running time and memory compared to existing approaches, achieving integration of millions of cells in minutes with low memory consumption. We demonstrate the efficiency and accuracy of Portal using diverse datasets ranging from mouse brain atlas projects, the Tabula Muris project, and the Tabula Microcebus project. Portal has broad applicability and in addition to integrating multiple scRNA-seq datasets, it can also integrate scRNA-seq with single-nucleus RNA-sequencing (snRNA-seq) data. Finally, we demonstrate the utility of Portal by applying it to the integration of cross-species datasets with limited shared-information between them, and are able to elucidate biological insights into the similarities and divergences in the spermatogenesis process between mouse, macaque, and human.
Benchmarking study
Reference
Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, Tabula Microcebus Consortium, Angela Ruohao Wu, Can Yang. Adversarial domain translation networks enable fast and accurate large-scale atlas-level single-cell data integration. [Nature Computational Science][BioRxiv version 1, 2021][BioRxiv version 2, 2022][Software].