Shrikant Pawar, Anthony Melillo , Hailong Meng , Darwin D'Souza , Bjoern Peters , Randi Vita , Steven H. Kleinstein , Kei-Hoi Cheung
Single-cell RNA sequencing data (scRNA-Seq) has become an important technique to study cellular heterogeneity. An increasing number of studies have generated scRNA-Seq data and deposited such data in repositories such as GEO, SRA and dbGaP. This data sharing is critical for research reproducibility and reuse. However, the rapid pace of experimental innovation has made it difficult to fully standardize these data submissions. Techniques such as CITE-Seq now allow for the combination of single-cell transcriptomics with the detection of protein expression. The use of nucleotide ‘hash’ markers allows for sample multiplexing. While current data deposition guidelines allow for the sharing of such data, the multiple options available for organizing these data for deposition can be a barrier to data reuse across data sets. Here, we present our experience depositing scRNA-Seq and CITE-seq data (both non-hashed and hashed) to shared data portals including GEO, SRA and dbGaP. Our approach for organizing and describing the raw and processed datasets associated with these studies can inform the development of more detailed data standards. Future data mining efforts would benefit immensely from such consistent data submissions.
David Osumi-Sutherland
Yongqun He, Edison Ong, Stefanie Seltmann, Xiaolin Yang, Daniel J. Cooper, Stephan Schurer, William D. Duncan, Alexander D. Diehl, Sirarat Sarntivijai
The Cell Line Ontology (CLO) is a community-based biomedical ontology in the domain of cell line cells. Since its initial publication in 2014, many changes have been made. In its scope, CLO now covers stem cell line cells and stem cell line investigations. CLO also includes cell line cells from China that are expressed in both English and Chinese. We systematically represent the cell line cells studied in the LINCS project. The basic design pattern has also been expanded to cover more areas. CLO has been developed to become interoperable with other ontologies in the OBO library.
Alexander D. Diehl, James A. Overton, Lewis L. Lanier, HIPC Cell Ontology Expert Group, Cell Ontology Editorial Working Group, Steven H. Kleinstein, Randi Vita, Nico Matentzoglu, Bjoern Peters
The Cell Ontology (CL) is an OBO Foundry ontology for representation of cell types across all of biology, while delegating cell type curation in particular domains to specialist ontologies, such as the Plant Ontology. As a result, although there are general cell types representing a range of species, CL is dominated by human, mammalian, and vertebrate cell types, per the interests of funders and biases of users. Major challenges remain in using the CL for data annotation, integration, and analysis. Chief among these is the lack of distinction between general and species-specific cell types. CL relies upon a variety of differentia for logically defining cell types, including protein expression patterns, and morphological and functional criteria. Earlier curation for immune cell types did not distinguish species-specific cell types from more general cell types in terms of marker expression, and some classes have combinations of markers taken from human and mouse cell types. Because mammalian markers were utilized to define immune cell types, other communities, such as zebrafish, find that these definitions are not applicable for their use. We have begun a complex process of detangling species-specific cell types and applying explicit taxon restrictions across the ontology. The general approach is to define high level immune cell types via morphological or functional criteria, while using markers for representing species specific cell types. We are currently prototyping this approach using the innate lymphoid cell hierarchy as an exemplar. This ongoing revision of CL will greatly improve its utility, particularly for categorizing high-throughput scRNA seq and proteomics data tied to highly granular cell types.