DSC 180A (Capstone)

Section B03: Single-Cell Data Analysis

Au' 2021, Wed: 11:00am -- 12:00pm, SDSC 212E

Office hour: Fri: 10am -- 10:50am via zoom

Course Description for Section B03:

In this project, we aim to analyze single-cell data from multiple modalities.

Motivation:

Cells are fundamental units of life. A cell contains a range of information. Cells can be observed, measured and analyzed by different modalities: such as gene expression and protein abundance. However, despite the tremendous advancement in technologies to obtain such measurements, we cannot yet obtain a complete picture of a single cell simultaneously. Often, we are given partial measurements (e.g, two modalities can be measured simultaneously). An important question is to learn the relation and mapping among these different modalities, so as to help build a complete profile of cells given a subset of modalities. This further helps researchers understand how cells develop and how they differ, as well as further understand the mechanism of the information flow in cells.

Goal of this project:

We will explore the multi-modal data sets of single cells available in NeurIPS 2021 Competition on "Multimodal Single Cell Data Integration", and aim to use machine learning approaches to tackle tasks such as cross-modal prediction or joint embedding of multi-modalities.

Specific methodology:

Dimensionality reduction and (coupled) autoencoder-based frameworks.

Goal of Quarter 1:

  • understand data and high level goals

  • perform basic dimensionality reduction analysis to form insights of data

  • implement vanilla version of (coupled) autoencoder to explore data

  • formulate and propose project for Quarter 2

Instructor:

Yusu Wang, Email: yusuwang@ucsd.edu

URL: http://yusu.belkin-wang.org

Office hour: Fri 10am--11am

Week 0:

  • Reading assignment:

    • Gala, R., Budzillo, A., Baftizadeh, F. et al. Consistent cross-modal identification of cortical neurons with coupled autoencoders. Nat Comput Sci 1, 120–127 (2021). https://doi.org/10.1038/s43588-021-00030-1

Week 1 (Sept 29):

  • In class:

    • Welcome. Introduction of the quarter 1 plan. Description of the project

  • After class assignment:

    • Exploration of NeurIPS 2021 Single Cell Data Analysis Competition (read overview and task descriptions here)

    • Explore the three types of data modalities available in the competition

    • Prepare and bring your questions on data to the next meeting!

Week 2 (Oct 6):

  • In class:

    • A short of presentation of data by Dr. Rohan Gala from Allen's Institute for Brain Science.

    • Metric to be used for these datasets

    • Discussion of dimensionality reduction

  • After class assignment:

    • Perform simple dimensionality reduction approaches to some data sets from competition (at least some datasets from two modalities: gene expression and protein abundance): PCA, Isomap, tSNE, UMAP . A simple instruction on using these methods in python is here.

    • Read about Autoencoder (here and here) (you only need to understand a vanilla version of autoencoder)

Week 3 (Oct 12):

  • In class:

    • Discussion of dimensinoality reduction: questions, obstacles, and results

    • A brief lecture on Autoencoder by Chen Cai (see slides here, in particular, there are good references on where to find more information given in the slides)

  • After class assignment:

    • Implement ways to evaluate how dimentionality reduction approaches preserve metrics or nearest neighbor or label information

    • Test your dimensionality reduction approaches on at least two datasets from NeurIPS challenge

    • Prepare a short presentation on your observations on different dimensionality reduction approaches, and how they work on your datasets

    • Further explore vanilla autoencoder and prepare to implement for single cell data next time.

Week 4 (Oct 19):

  • In class:

    • Discussion of dimensinoality reduction: questions, obstacles, and results

    • More metrics to evaluate what is a good low dimensional representation

    • Bring your questions regarding dimensionality reduction and autoencoder

  • After class assignment:

    • Implement a vanilla version of Autoencoder with a few MLP layers as encoder and decoder

    • Test your autoencoder on at least two datasets from NeurIPS challenge

    • Prepare a short presentation on your observations on how the low dimensional representation computed by autoencoder compares with those from other dimentionality reduction algorithms

Week 5 (Oct 26):

  • In class:

    • Discussion of autoencoder results, how they compare with other dimensinoality reduction approaches

    • How can we further improve autoencoder resutls: say by adjusting our loss functions

  • After class assignment:

    • Come up with more criteria in evaluating the quality of low-dimensional representations

    • Test different loss functions to improve these metrics/criteria

    • Test your new autoencoder(s) on at least two datasets from NeurIPS challenge

    • Prepare a short presentation on your observations