Prediction of Novel Autism Risk Genes By Genomic Data Mining

Landon Ethredge

Authors: Landon Ethredge, Abbegail King, Anvita Pudipeddi, Snehal Shah, Anqi Wei, and Dr. Liangjiang Wang

Faculty Mentor: Dr. Liangjiang Wang

College: College of Engineering, Computing, and Applied Sciences

ABSTRACT

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder which affects social development and general behavior. ASD varies in its symptoms and severity but is often characterized by rigid, repetitive behavior patterns, sensory issues, intense interests, and communication delays. While the majority of currently discovered ASD genes are protein-coding, great promise exists in finding novel autism risk genes among non-coding RNA genes since they make up the bulk of the human genome. Specifically, this study focused on long non-coding RNAs (lncRNAs), which are classified as transcripts longer than 200 nucleotides. This study aimed to identify potential ASD risk genes by means of machine learning, a subset of artificial intelligence. We have trained three machine learning models, Random Forest, Support Vector Machine, and Artificial Neural Network, to mine for relevant features within genomic data of previously discovered ASD risk genes to predict novel risk genes using those features. The Artificial Neural Network Model showed the highest performance of the three. 624 lncRNAs were identified as high confidence candidate lncRNAs by all three models and 1,174 were identified in total. Upon further analysis of the results, many candidate lncRNAs were located in close proximity to known ASD risk genes on their respective chromosomes.

Video Introduction

Landon Ethredge 2021 Undergraduate Research Symposium