AI in Biomedicine

Matt Mytych

Authors: Matt Mytych, Krishna Patel, Sam Buckley, and Dr. Liangjiang Wang

Faculty Mentor: Dr. Liangjiang Wang

College: College of Science

ABSTRACT

Autism Spectrum Disorders (ASD) refer to a group of neurodevelopmental disorders characterized by cognitive and behavioral delays. Many of the underlying causes of ASD delve into the molecular level, including both protein-coding and non-coding genes. Long non-coding RNAs (lncRNAs) are a group of non-coding RNAs that have no protein coding capacity, but have been linked to ASD. Traditional methods for identifying and validating ASD risk genes is time-consuming and costly, thus a machine learning model is necessary. In this study, we built machine learning models to predict and prioritize candidate lncRNAs associated with ASD. A Support Vector Machine (SVM) model was trained using brain gene expression data collected from the BrainSpan. Performance of the SVM model was compared to other classifiers, such as Logistic Regression (LR) and Random Forest (RF). From all three models, 564 lncRNAs were predicted to be high-confidence ASD risk candidate genes. Developing a model to predict and prioritize autism-associated lncRNAs is one step closer to understanding the pathogenesis of ASD and to potentially find ways for treatment.

Video Introduction

Matt Mytych 2020 Undergraduate Research Symposium