Final Project

Outline of Deadlines

All Deadlines are by 11:59pm the day of

1/20/2016 - Project Teams formed. Fill out this google form with your information
1. your team name
2. your team member names (project teams are 3-5 people)
3. skills each person brings to the team (make sure you cover coding, biology, data sets, etc.)
4. several sentences on what you are interested in working on. 

2/16/2106 - Project Pitch slides due. Upload your presentation and summary to this folder:
1. Presentation. 4 slides: problem statement, approach, anticipated results, significance
2. One paragraph summary of your project pitch

2/17/2016 - Project Pitches in class.

2/22/2016 - Project Milestone. One paragraph emailed to the instructors (
1. What you have accomplished 
2. What you plan to complete
3. Any issues you need to bring to our attention

3/7/2016 2:59pm - Final Presentation Due (before class, be prepared to present in either class period)
3/9/2016 11:59pm - Final Write Up Due
3/7/2016, 3/9/2016 - Project Presentations

Upload final report and presentation to Google Drive folder (Groupname_Presentation.ppt, Groupname_FinalReport.pdf)
8 minute presentation + 2 minutes questions (may be adjusted based on team sizes)

Written Report

4-credit students (in project teams)

The final project report should be in the format of a scientific research paper: title, abstract, introduction, methods, results, discussion, references.

You will also submit your code, which does not count towards the page limit. We will base approximately 20% of your written report grade on your code implementation (primarily clarity of methods though code commenting/documentation).

There is a 10 page limit, including figures and references.

Formatting guidelines are here:

Look under Manuscript File Requirements.

2-credit students (done individually)

The report should be along the lines of a research grant proposal. Write as if you are going to actually implement the project. It should include the following sections.

  • Specific Aims: start with those that you submitted, and expand if necessary.
  • Background and Significance: why is this project important? What has been done that is similar?
  • Research Plan: how will you implement this project?

There is a 4 page limit, including figures and references.

Examples of Past Final Projects

Regulators of Senescence Linked Transcriptional Profiles

We used several publicly available microarray datasets to develop a model of a shared mammalian transcriptional profile for aging which is able to effectively recapitulate genes previously associated with human aging; area under ROC curve of 71 %. Using this transcriptional profile and predicted human transcription factor binding sites, we suggest several transcription factors (XBP-l, E4BP4, Sox-5, AMLl, and AP-I) as being associated with age related regulation of expression. These are then potential targets for interfering with the process of biological senescence.

Integration of Copy Number Variation, Putative Regulatory Modules, and Gene Expression Data to Link Genome Rearrangments to Breast Cancer Subtypes

Breast cancer is a highly heterogeneous disease, with individual tumors composed of cells with highly variable genomic structure and gene expression patterns. New technologies have emerged which allow researchers to examine the genomes of tumor cells without incurring the cost of sequencing. Array comparative genomic hybridization (aCGH) can been used to assess segmental duplications and deletions, and can correlate copy number variability to changes in expression profiles and disease severity. End sequence profiling (ESP) has been used to examine genome rearrangements by identifying genomic breakpoints. However, neither of these methods has been completely effecetive at giving a complete picture of the state of a tumor genome, and as a result only a few examples have been successful at elucidating the mechanisms responsible for disease progression. By using aCGH data we provide a tool to give additional insight into genome rearrangement detected by ESP, and examine regulatory changes which maybe responsible for disease progression and prognosis.

Analyzing Time Course Expression Data Using Go Term Enrichment

Time course microarray experiments are a powerful tool for measuring the dynamic expression of genes over time. Recently, there has been progress in developing tools and techniques that are appropriate for analyzing the temporal microarray data. We present a method that builds on these techniques by describing the expression of genes at each time point using the Gene Ontology. We apply the method to data obtained from human brain samples and show that groups of differentially expressed genes can be clustered by time point and function to create a timeline of process activities.

Prediction of mRNA Expression in Cancer Using microRNA Expression Levels

MicroRNAs playa key role in gene regulation by lowering the abundance of mRNA transcripts, Recent studies have indicated that microRNA are involved in a large number of biological processes and playa role in cancer, In this work, we search for clusters in which the variation in mRNA expression level is mainly controlled by microRNAs, and then learn models of mRNA expression within these cluster. We start by collecting a large set of micro-array experiments, and use them to find clusters of co-expressed genes, In each cluster, we search for microRNA binding sites whose frequency is enriched in the cluster. We then try to predict gene expression in different types of normal and cancer cells that were not part of the data used for the clustering, and for which we have measurements of both mRNA and microRNA expression levels. In each cluster in which a pattern of several microRNAs are significantly enriched, we learn a model using only the microRNAs in the pattern, We use cross-validation in order to evaluate the performance of the learned models. Unfortunately, this approach is unable to find very significant microRNA binding site enrichment in clusters of co-expressed genes, and the learned models fail to predict gene expression, We discuss how this approach could be extended to incorporate transcriptional regulation, as well as issues related to the available data.

Integrative Biology Approach to Identifying Novel Tumor Antigens and Fusion Proteins Through Antibody and Gene Expression Profiling

Arrays of immobilized proteins (ProtoArray) have been developed for the discover and characterization of novel protein biomarkers specific for infectious diseases, cancers, and autoimmune diseases. In this study, plasma collected from several leukemia patients one year post-transplant, pre-transplant, and their donors. The ProtoArray allows us to at once screen thousands of antibodies of interest for targets of allogeneic antibodies. We have identified a set of potential tumor antigens through subtraction of antibody levels present in AML patients in comparison to healthy individuals. We examined the presence of coding non-synonymous single nucleotide polymorphisms in each of~ proposed tumor antigens. The list of antigens targeted in leukemia patients was intersected with gene expression data relevant to AML to find if there was significant over-representation of genes with differential antibody expression in one or more leukemia experiments. We tested if the set of potential tumor antigens is enriched in genes near known leukemia breakpoints. In this paper we present a set of new tumor antigens as potential drug targets for leukemia. To complement our computational analysis, the novel potential tumor antigens presented in this paper require validation by large clinically characterized patient samples,