Team 7

Team 7: Iterative Clustering Pipeline for scRNA-seq Data

Github Link --->

Leiden Clustering Algorithm

Project Overview

Hello! We are Team 7 and our project is focused on addressing the problems with unsupervised clustering in single cell RNA sequencing data where we designed an iterative clustering pipeline.

To quickly go over single-cell clustering. scRNA-seq datasets have thousands of cells which are each profiled for the expression of many genes. The goal of clustering is to group similar cells together, to reflect meaningful cell types based on expression patterns. This process of clustering is unsupervised which means that there’s no ground truth for cell labels. That means that researchers often end up with results that are either over clustered, where biologically meaningful clusters are split up, or under clustered, risking not identifying rare cell types.

Our project focused on conducting analysis on automating this step of clustering and manual refinement to find metrics and implementations that can contribute to a robust iterative clustering pipeline. Over the course of the project, our main goals were to understand some of the challenges in unsupervised clustering and conduct an initial analysis into a neuronal dataset, identify metrics that can guide cluster refinement and test their relevance through our dataset, and finally to create and run an iterative clustering pipeline to get results.

Project Overview

Team Presentation Slides