This project aims to compare dimension reduction methods and evaluate which one(s) are most effective. The diagnosis, treatments, and outcomes of patients with Low Grade Gliomas (LGG), a type of brain tumor, can be improved through better understanding of genomic data. Despite the high number of features in the dataset, the use of dimension reduction methods makes finding relationships between genes and individuals possible. The dataset comes from The Cancer Genome Atlas (TCGA) and contains pathway scores for different gene pathways. Each gene pathway is a group of genes, and the pathway scores are a measure of gene expression. The genomics dataset gives pathway scores for 1283 gene pathways across 61 different patients. In the dataset, n (61) is much smaller than p (1283). To address this issue, our group focused on implementing different dimension reduction methods on the pathway scores. We compared Principal Component Analysis (PCA), sparse and robust sparse PCA, and kernel PCA with several different kernels. Each of these methods was evaluated both with and without scaling the pathway score data. Then we used clustering to find genes that are most correlated with each other to ultimately find if related genes have related biological functions.
Thejasvi Dhanireddy (she/her) is from Hoffman Estates, IL. She is a senior, graduating with a B.S. in Biostatistics with minors in Health Care Ethics and Public Health. This fall, she will be pursuing her Masters in Biostatistics at the University of Michigan. She hopes to work in the public health field in the future, specifically using her biostats skills to understand more about health disparities among immigrant communities. During her time at SLU she has enjoyed being a part of the Honors Program and Student Activities Board.