Bioinformatics and Biomedical Informatics Lab
Current Members
Prajwol Lamichhane (M.Sc. Advisee)
Ryan Nugent (DIS)
Buwani Manuweera (Volunteer post-graduate researcher)
Guna Jaganathan (co-advised with Dr. Kanewala)
Spring 2019, from left, Buwani, Indika, Lucy and Brendan
Alumni
Nishani Kasineshan (Undergraduate researcher, Fall 24 - Sri Lanka)
Nazmul Kazi (M.Sc. Advisee, 2021-2023, thesis)
Luke Grubbs (DIS Undergraduate Researcher, Spring 23)
Molika So (DIS Undergraduate Researcher, Spring 23)
Jett Baxtor (DIS Undergraduate Researcher, Spring 23)
Daniel Tatum (DIS Undergraduate Researcher, Fall 22)
Andreas Ink (DIS Undergraduate Researcher, Spring 22)
Shravan Kandalakunta (Undergraduate Researcher, Spring 22)
Rusiru Thushara (External Research Intern, Spring 22, University of Peradeniya, Sri Lanka)
Victoria Leventman (DIS Undergraduate Researcher, Spring 22)
Gini Duong (DIS Undergraduate Researcher, Spring 22)
Kimberly Laynes, Data Analytics Research Group Graduate Research Assistant (now at Kriedle lab)
Buwani Manuweera (external M.Sc. Advisee, Montana State University, 2018-2021)
Francis Anokye (external M.Sc. Advisee, African Institute for Mathematical Sciences, Rwanda, Fall 2020 - Spring 2021) - currently pursuing a Ph.D. at Newfoundland and Labrador's University, Canada
Aaron Zellner (M.Sc. Advisee, Fall 2020 - Spring 2021) - currently at Dr. Dutta's lab
Matthew Campbell (DIS Advisee, Spring 2021)
Andres Pierto (Biotechnology Intern, Spring 2021)
Morteza Pourreza-Shahri (Ph.D. Advisee, 2017-2020, thesis) - currently at CodaMetrix
Daniel Laden (M.Sc. Advisee, 2019-2020)
Gill Reynolds (Ph.D. Committee, 2019-2020)
Yalan Yin (M.Sc. advisee, 2019)
Shyaman Jayasundara (International Undergraduate Intern) - now pursuing a PhD at Purdue University.
Mohammed Anani (M.Sc. advisee, 2017-2019, thesis) - now pursuing a Ph.D at University of New Mexico
Adam Morrone (REU Mentee, 2017) - now pursuing an MS at Colorado State University
Daniel Dopp (REU Mentee, 2017) - Currently a senior at University of Kentucky
Shriyansh Kothari (M.Sc. advisee, 2018)
Fall 2019, from Left (top row), Naz, Morteza, Daniel, Indika, Shyaman. from Left (bottom row) Mohammad, Nate, Gill, Yalan, Buwani.
Summer 2017, from Left, Shriyansh, Morteza, Indika, Mohammad, Daniel, Adam
Projects
Machine learning for Pangenomics
A pangenome represents the aggregate genomic information of multiple individuals or organisms from a related group or species. This NSF-funded project, which is a collaboration with Dr. Brendan Mumey (Gianforte School of Computing, Montana State University), Dr. Joann Mudge and Dr. AlanCleary (National Center for Genome Resources), and Dr. Thiruvarangan Ramaraj (DePaul University), studies a recently-introduced reference-free method for finding genomic features called Frequented Regions (FRs) for the purposes of discovering phenotype associations in a pangenomic data. To explore the utility of FRs for deciphering genotype to phenotype relationships, we use FRs in a supervised machine-learning setting in which FRs are used as features for predicting yeast phenotypes. We demonstrated that FRs have a stronger classification power over more traditional sequence-based features such as SNPs (single-nucleotide polymorphisms) for predicting phenotypes. This project was recently featured in MSU news. This project is funded by NSF. [News item]
Metacognition and Misconceptions: Using Web-Based Writing Exercises in Gateway STEM Courses
This NSF-funded project, which is a collaboration with Dr. Jim Becker (Montana State University), aims to serve the national interest by enhancing the metacognitive skill and conceptual understanding of undergraduate students who typically struggle in foundational Science, Technology, Engineering, and Mathematics (STEM) courses. It is well known that student difficulties in STEM courses such as electric circuit analysis and engineering statics often arise from an inability of the student to accurately identify their knowledge gaps and to develop and follow strategies to close these gaps. This process is a means of exhibiting advanced metacognitive skill. The project will implement a novel writing-centric approach in an introductory circuit analysis undergraduate course to help students build mental models, overcome misconceptions, and enhance metacognitive skill. To achieve an instantaneous and personalized feedback system that works at-scale, the writing exercises will be implemented as web-based applications that leverage recent advances in Natural Language Processing. This project is funded by NSF. [News items]
Quality Assurance of Machine Learning Applications in Healthcare
Recent advances in machine learning (ML) has led to its use in safety-critical applications in healthcare. For example, ML is used for predicting drug responses in cancer treatment, driving precision medicine-based treatments for cancer patients, and is used for biological threat detection using large data sets on the order of millions of data points. With the use of ML in such safety-critical systems, assuring their quality becomes extremely important. To this end, in collaboration with Dr. Upulee Kanewala, we propose to develop resources for the effective utilization of metamorphic testing for the systematic testing of ML applications. This project is funded by UNF Foundation.
Explainable Automated Inconsistency Detection in Biomedical and Health Literature
With the sheer increase of biomedical research publications and clinical documentation, inconsistent and contradictory findings can be pervasive throughout biomedical sciences literature. Hence, detecting and explaining such inconsistencies in clinical interventions is of the utmost importance. However, manual human curation is almost infeasible given the stark number of research articles and medical documents and its exponential growth over the years. Further, merely detecting inconsistencies may be inadequate because clinicians and researchers value the interpretability of findings directly impacting public health and human life. To address this problem, we are using artificial intelligence methodologies and techniques, such as machine learning (ML), natural language processing (NLP), and logic programming (LP), to automatically detect and identify inconsistent biomedical and clinical discoveries in literature and to extract human-readable explanations as to how the inconsistency is derived from the input document(s). This UNF School of Computing funded project is undertaken by SoC's DARE (Data Analytics Research Group), in collaboration with Dr. Catherine Chritie (College of Health).
Previous Projects
Automated Protein Phenotype Prediction and Protein-Phenotype Relation Extraction
The recently developed Human Phenotype Ontology (HPO), is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this project, we are developing computational methods that can directly predict the set of HPO terms for a given gene including models that can directly extract protein-phenotype relations for biomedical literature.
Automated Protein Function Prediction and Protein-Function Relation Extraction
The function of a protein can be loosely defined as everything it performs or happens to it. The Gene Ontology (GO) is a structured vocabulary which captures protein function in a hierarchical manner and contains thousands of terms. Through various wet-lab experiments over the years scientists have been able to annotate a large number of proteins with GO categories which reflect their functionality. However, experimentally determining protein functions is a highly resource-intensive task, and a large fraction of proteins remain un-annotated. In this project, we are developing computational methods that can directly predict the set of GO terms for a given gene including models that can directly extract protein-function relations for biomedical literature.
Automatically Generating Psychiatric Case Notes (2018-2020)
Electronic health records (EHRs) are notorious for reducing the face-to-face time with patients while increasing the screen-time for clinicians leading to burnout. This is especially problematic for psychiatry care in which maintaining consistent eye-contact and non-verbal cues are just as important as the spoken words. In this project, we envision a pipeline that automatically records a doctor-patient conversation, generates the corresponding digital transcript of the conversation using speech-to-text API and uses natural language processing and machine learning techniques to predict and/ or extract important pieces of information from the text. This relevant text is then converted to a more formal written version of the text and are used for auto-populating the different sections of the EHR form. This project was funded by U.S. Economic Development Administration [News item].
Data-Driven Improvement to Institutional Repository Access and Visibility (2018-2020)
This project develops a sustainability plan for the Montana State Unisity hosted Repositories Analytics & Metrics Portal that will keep its dataset open and available to all researchers. The proposal also includes developing a preliminary institutional repositories (IR) reporting model; a search engine optimization (SEO) audit and remediation plan for IR; and exploring whether machine learning can improve the quality of IR content metadata. This was funded by the Institute of Museum and Library Services (IMLS).