Project Timeline and Deadlines
October 11th (Fri): 11:59pm: Project Groups Due. You may work on your own or in a group of two people. Please submit your groups either to the TAs via email or via Canvas (only one person per group needs to submit).
October 17th (Thurs): 11:59pm: Project Proposals Due. Each group must submit a 1-2 page project proposal. This proposal should include: a definition of the problem your project is trying to solve, the motivation for your project, your planned approach and evaluation metrics, as well as a list of project milestones and dates.
October 24th (Thurs): Project Proposal Workshopping Feedback Due. Each group must peer review another group's project proposal materials. Well in advance of this deadline, we will match groups together for the peer review process. Your feedback should consist of a 1/2 to 1 page report discussing both the strong points of the other group's project proposal and provide suggestions on how to improve it.
November 7th (Thurs): 11:59pm: Project Checkpoint Writeup Due. Each group must submit a 2-3 page report detailing their progress so far. In your checkpoint report, you should explicitly discuss your progress towards the milestones in your original proposal. If your originally proposed milestones are no longer appropriate, you should propose a new set of milestones for your future work. In your checkpoint report you should also discuss any preliminary results/conclusions from your work so far.
November 14th (Thurs): 8:00am: Project Checkpoint Workshopping Feedback Due. Each group must peer review another group's project checkpoint materials. Well in advance of this deadline, we will match groups together for the peer review process. Your feedback should consist of a 1-2 page report discussing both the strong points of the other group's project checkpoint and provide suggestions on how to improve it.
November 26th (Tues): 11:59pm: Final Report Drafts Due. Each group must submit a draft of their final project report (see below for formatting instructions). Your draft may include placeholders for specific results, but should be as complete as possible.
December 5th (Thurs): Final Project Poster Session. Each group must make a poster and present their poster during the poster session.
December 10th (Tues): Final Project Reports. Each group must submit a final written report (max 12 pages) discussing the results of their project. As part of your final report, you should also discuss any related work from the computational biology literature. Please use latex/overleaf and NeurIPS style format and submit your report as a PDF file.
Overview
General project ideas:
Apply a comp bio method to a dataset you’re personally interested in and analyze the results
Adapt a general machine learning method specifically for a biological problem (E.g. how does data augmentation translate to something like gene expression data)
Past project ideas:
Using Graphical Lasso to infer and analyze the network structure of nitrogen metabolisms in wastewater processes using 227 wastewater annotated metagenomes from the Joint Genome Institute (JGI) Integrated Microbial Genomes and Microbiomes (IMG/M) Database.
Using synthetic datasets to evaluate the performance of Network Perturbation Identification methods when datasets are non-normal.
Using Convolutional Additive Models to predict binding profiles of three transcription factors (CTCF, FOXA1, and JUND) from sequence data.
Using an unsupervised learning approach to infer co-accessibility and identify genome-wide enhancer-gene interactions in a single-cell ATAC-seq dataset profiling chromatin accessibility in 18,936 human retinal cells.
Identifying exonic regions in the human genome using search trees.
Predicting leukemia subtypes from array-normalized data from the Microarray Innovations in Leukemia study (MILE), and interpreting the learned classifiers.
Improving unsupervised deep learning approaches for dimensionality reduction of single cell RNA-seq using batch-invariant optimization to learn transferable representations.
Predicting P. falciparum gene essentiality across the lifecycle with Random Forests and using gene expression and evolutionary conservation features
Using a panel of recombinantly inbred mice to benchmark deep learning based structural variant callers on non-human genomes.
Example Final Reports (2021)
Here are some example projects from previous years
Improving the Generalizability of Chest X-ray Image Classifier
Isolating salient variations of interest in single cell2transcriptomic data with contrastiveVI
A Deep Bayesian Bandits Approach for Anticancer Drug Screening : Exploration via Functional Prior
Datasets
See here for a collection of some potentially helpful datasets for project brainstorming. All of them are publicly available but some of them require obtaining access e.g. MIMIC. If you're interested in a specific kind of data but can't find it in the repository, please reach out to the TAs! We're more than happy to help you track down new datasets or talk through project ideas.
Format
Please use the NeurIPS latex format.