Research

Interdisciplinary nature of proteogenomics

Proteogenomics: address the knowledge gap in tumor systems biology

Tumor is a genetic disease. Genetic abnormalities such as somatic mutations, copy number alterations and gene fusions, together contribute to aberrant gene regulation and expression and eventually lead to malignancy-associated dysregulation of cell growth. During the past decade, the landscape of tumor genetic aberrations obtained from large-cohort studies (e.g., the TCGA) has provided deep understanding of tumor biology. However, systematic understanding of tumor biology at protein level had been relatively slow. Proteins and post-translational modifications (PTMs) represent not only essential components of tumor biology, but also key diagnostic biomarkers and drug targets. To this end, proteogenomics aims to profile global protein expression and PTMs using mass spectrometry (MS) and integrate proteomic data with genomic knowledge to understand functional outcomes of genetic abnormalities at protein and PTM level and further guide the targeting of tumor vulnerabilities.

Tumor pathway interpretation and clinical usages

Most, if not all, pathway cascades involve protein-protein interactions and PTMs. Proteogenomics directly measures pathway components at protein and PTM levels and thus improve pathway activity inference under specific biological contexts. In our head and neck cancer study, the ligand-dependent EGFR pathway activity can only be inferred by pathway component phosphorylation, but not by pathway gene expression (which is commonly used in computational biology). Together with genomic abnormalities, we have shown that the pathway readout could indicate the key biomarkers to guide the EGFR mAb treatment.

Proteogenomic profiling of ligand-depend EGFR pathway in head and neck cancer.

Currently in the lab, we develop and apply computational methods to systematically infer the pathway activities using proteogenomic data and explore the clinical usages (e.g., drug response prediction).

Functional interpretation of genomic abnormalities

Genomic studies usually identify a long list of tumor genetic aberrations. Associating genetic abnormalities with protein and PTM alterations helps discriminate tumor genetic drivers from passengers and help prioritize drug targets. In the colon cancer study, while we observed copy number amplified genes across multiple genomic loci, only a proportion of them increase the corresponding protein abundance. Furthermore, they are converged into several oncogenic pathways, including the unappreciated endocytosis.

Genome-wide copy number change converges to abnormal endocytosis in colon cancer

Currently in the lab, we develop and apply computational methods to interpret the functional outcomes of genomic abnormalities using protein and PTM characterizations.

The biological and clinical roles "noncanonical" peptides/proteins

Most of our current proteogenomic analyses are based on known genome structure and gene annotations. In tumor cells, however, multiple types genetic abnormalities can code novel peptides and proteins that are not discoverable by using canonical protein database. These include, but not limited to, somatic mutations (SMs), structure variations, hERVs, UTRs and alternative splicing. The "noncanonical" peptides/proteins encoded from these genomic elements are key players for tumor progression and attractive drug targets. For example, novel peptides presented on the tumor cell surface can be exploited as immunotherapeutic targets. Identifying these peptides/proteins from MS data requires advancement of genomic knowledge and improvement in MS data processing. As an example, we have previously built a customized proteogenomic pipeline to identify tumor neoantigens by incorporating MS proteomics, WES, RNAseq and MHC-I affinity prediction. However, the majority of "noncanonical" peptides/proteins have not been explored in our current MS data, necessitating further computational innovations in proteogenomics.

A proteogenomic pipeline to identify tumor neoantigens

Currently in the lab, we develop and apply computational methods to improve MS data analysis, with a focus on identifying tumor antigens and proteoforms and understanding their biological and clinical roles.

Precision oncology

The molecular mechanisms underlying tumor progression are highly heterogenous across different patients. By integrating genomic and proteomic features, we have shown that the patient stratification based on molecular subtyping both reflects tumor driver genes and pathways and potentially guides treatment selection. Comparing to the genomic and transcriptomic subtyping, proteogenomic subtyping is clinically more relevant by generating biomarkers and druggable targets for each patient groups.

Patient stratification based on proteogenomics-determined biomarkers and potential treatments.

Currently in the lab, we perform integrated proteogenomic analysis to understand the inter-tumor heterogeneity and how this can be connected with precision oncology.