Bioinformatics and Biomedical Informatics Lab

Current Members

Spring 2023, from left: Jett, Luke, Naz, Prajwol, Molika, Indika.

Alumni

Spring 2022, from left, top row: Victoria, Naz, Shravan, bottom row: Gini, Rusiru, Indika.
Fall 2020, from left, top row: Morteza, Buwani, Nate, middle row: Naz, Aaron, Chamika, bottom row: Indika.

Fall 2019, from Left (top row), Naz, Morteza, Daniel, Indika, Shyaman. from Left (bottom row) Mohammad, Nate, Gill, Yalan, Buwani.

Summer 2017, from Left, Shriyansh, Morteza, Indika, Mohammad, Daniel, Adam

Projects

Machine learning for Pangenomics

A pangenome represents the aggregate genomic information of multiple individuals or organisms from a related group or species. This NSF-funded project, which is a collaboration with Dr. Brendan Mumey (Gianforte School of Computing, Montana State University), Dr. Joann Mudge and Dr. AlanCleary (National Center for Genome Resources), and Dr. Thiruvarangan Ramaraj (DePaul University), studies a recently-introduced reference-free method for finding genomic features called Frequented Regions (FRs) for the purposes of discovering phenotype associations in a pangenomic data. To explore the utility of FRs for deciphering genotype to phenotype relationships, we use FRs in a supervised machine-learning setting in which FRs are used as features for predicting yeast phenotypes. We demonstrated that FRs have a stronger classification power over more traditional sequence-based features such as SNPs (single-nucleotide polymorphisms) for predicting phenotypes. This project was recently featured in MSU news. This project is funded by NSF. [News item]

Metacognition and Misconceptions: Using Web-Based Writing Exercises in Gateway STEM Courses

This NSF-funded project, which is a collaboration with Dr. Jim Becker (Montana State University), aims to serve the national interest by enhancing the metacognitive skill and conceptual understanding of undergraduate students who typically struggle in foundational Science, Technology, Engineering, and Mathematics (STEM) courses. It is well known that student difficulties in STEM courses such as electric circuit analysis and engineering statics often arise from an inability of the student to accurately identify their knowledge gaps and to develop and follow strategies to close these gaps. This process is a means of exhibiting advanced metacognitive skill. The project will implement a novel writing-centric approach in an introductory circuit analysis undergraduate course to help students build mental models, overcome misconceptions, and enhance metacognitive skill. To achieve an instantaneous and personalized feedback system that works at-scale, the writing exercises will be implemented as web-based applications that leverage recent advances in Natural Language Processing.  This project is funded by NSF. [News items]

Quality Assurance of Machine Learning Applications in Healthcare

Recent advances in machine learning (ML) has led to its use in safety-critical applications in healthcare. For example, ML is used for predicting drug responses in cancer treatment, driving precision medicine-based treatments for cancer patients, and is used for biological threat detection using large data sets on the order of millions of data points. With the use of ML in such safety-critical systems, assuring their quality becomes extremely important. To this end, in collaboration with Dr. Upulee Kanewala, we propose to develop resources for the effective utilization of metamorphic testing for the systematic testing of ML applications. This project is funded by UNF Foundation.

Explainable Automated Inconsistency Detection in Biomedical and Health Literature

With the sheer increase of biomedical research publications and clinical documentation, inconsistent and contradictory findings can be pervasive throughout biomedical sciences literature. Hence, detecting and explaining such inconsistencies in clinical interventions is of the utmost importance. However, manual human curation is almost infeasible given the stark number of research articles and medical documents and its exponential growth over the years. Further, merely detecting inconsistencies may be inadequate because clinicians and researchers value the interpretability of findings directly impacting public health and human life. To address this problem, we are using artificial intelligence methodologies and techniques, such as machine learning (ML), natural language processing (NLP), and logic programming (LP), to automatically detect and identify inconsistent biomedical and clinical discoveries in literature and to extract human-readable explanations as to how the inconsistency is derived from the input document(s).  This UNF School of Computing funded project is undertaken by SoC's DARE (Data Analytics Research Group), in collaboration with Dr.  Catherine Chritie (College of Health). 

Previous Projects

Automated Protein Phenotype Prediction and Protein-Phenotype Relation Extraction

The recently developed Human Phenotype Ontology (HPO), is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this project, we are developing computational methods that can directly predict the set of HPO terms for a given gene including models that can directly extract protein-phenotype relations for biomedical literature.

Automated Protein Function Prediction and Protein-Function Relation Extraction

The function of a protein can be loosely defined as everything it performs or happens to it. The Gene Ontology (GO) is a structured vocabulary which captures protein function in a hierarchical manner and contains thousands of terms. Through various wet-lab experiments over the years scientists have been able to annotate a large number of proteins with GO categories which reflect their functionality. However, experimentally determining protein functions is a highly resource-intensive task, and a large fraction of proteins remain un-annotated.  In this project, we are developing computational methods that can directly predict the set of GO terms for a given gene including models that can directly extract protein-function relations for biomedical literature.

Automatically Generating Psychiatric Case Notes (2018-2020)

Electronic health records (EHRs) are notorious for reducing the face-to-face time with patients while increasing the screen-time for clinicians leading to burnout. This is especially problematic for psychiatry care in which maintaining consistent eye-contact and non-verbal cues are just as important as the spoken words.  In this project, we envision a pipeline that automatically records a doctor-patient conversation, generates the corresponding digital transcript of the conversation using speech-to-text API and uses natural language processing and machine learning techniques to predict and/ or extract important pieces of information from the text. This relevant text is then converted to a more formal written version of the text and are used for auto-populating the different sections of the EHR form. This project was funded by U.S. Economic Development Administration [News item]. 

Data-Driven Improvement to Institutional Repository Access and Visibility  (2018-2020)

This project develops a sustainability plan for the Montana State Unisity hosted Repositories Analytics & Metrics Portal that will keep its dataset open and available to all researchers. The proposal also includes developing a preliminary institutional repositories (IR) reporting model; a search engine optimization (SEO) audit and remediation plan for IR; and exploring whether machine learning can improve the quality of IR content metadata. This was funded by the Institute of Museum and Library Services (IMLS).