My research in biomedical and health informatics bridges crucial gaps in advancing precision medicine by leveraging computational and human-centered approaches to enhance healthcare quality and patient outcomes. Within this interdisciplinary domain, my expertise spans biomedical ontologies, machine learning, natural language processing, and big data analytics, all aimed at advancing health informatics towards optimal health outcomes. As the founder of the eHealth Lab, my guiding question is: “How can informatics improve population health and advance biomedical research?” Since becoming an Associate Professor, my team and I have developed innovative solutions that address the needs of medical professionals, clinical researchers, and patients, utilizing diverse data sources like ClinicalTrials.gov, electronic health records (EHRs), national surveys, and social media. Our work has led to advancements in clinical trial design for greater population representativeness and explainable AI models for predicting health outcomes, and the creation of eHealth tools to support patient engagement. Among our collaborative efforts is a machine-learning based just-in-time intervention promoting adherence to technology-based cognitive training. I will detail my research across three main themes: optimizing clinical trial design through informatics, creating trustworthy AI models for predicting patient outcomes, and enhancing patient access to health information. These pillars underscore my lab’s commitment to harnessing informatics to improve health outcomes.
Clinical studies are essential in evidence-based medicine. However, participant recruitment has long been a major concern. Although ~60% of new all cancer cases occur among older adults, they comprise merely 25% of participants in cancer clinical studies. Unjustified or overly-restrictive eligibility criteria are the most important modifiable barriers causing low accrual, early termination, and low generalizability. This in turn can cause the studies to be underpowered and increase the likelihood of adverse drug reactions and toxicity when moved into clinical practice. We are developing data-driven methods and tools to assess the generalizability of clinical studies using the electronic data in clinical trial registries, public patient databases, and clinical data warehouses. This project aims to improve the representation of underserved population subgroups in clinical studies such as older adults with multiple chronic conditions.
Assisted by the availability of data and high-performance computing, artificial intelligence techniques especially deep learning have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including computer vision, speech recognition, and natural language processing. However, it is widely considered as a mystery as to why they work and how they work. While their successes in commercial applications are revolutionizing some industrial segments, AI-based clinical decision support systems have not been well adopted in the healthcare setting. It is critical to improve the transparency of the underlying mechanisms for applications in biomedicine. In this project, we propose to develop and evaluate an interpretable deep learning framework for critical biomedical applications including patient outcome prediction and biomedical natural language understanding. Our goal is to make the artificial intelligence more transparent and useful for both medical practitioners and patients. Relavant Publications
The widely known vocabulary gap between health consumers and healthcare professionals hinders effective communication between the two groups, and impacts the effectiveness of consumer’s health information seeking. Among the efforts of building consumer-oriented controlled vocabularies, Open Access Collaborative Consumer Health Vocabulary (OAC CHV) is the only consumer vocabulary that has been integrated into the UMLS. It is a controlled vocabulary designed to complement the existing UMLS framework and to facilitate the needs of consumer health applications. Through term mapping among its source vocabularies, UMLS enables consumer-facing applications to translate texts with technical and professional terms to consumer-friendly language. In this project, we will first assess the semantic coverage of OAC CHV to understand its deficiencies. Then, we will use three similarity-based approaches to automatically identify new consumer terms that are similar to existing CHV terms in consumer-generated text corpora. This overarching goal of the project is to fill the vocabulary gap between health professionals and consumers in consumer-oriented health applications. The proposed infrastructure can enhance the open-access and collaborative development of CHV towards optimal conceptual content and utility.
Various healthcare information systems such as EHRs have integrated well-curated biomedical controlled vocabularies, e.g., the International Classification of Diseases (ICD) and RxNORM, as their vocabulary foundation. With rich medical concepts linked by hierarchical and associative relationships, these vocabularies and ontologies can also be utilized in health data analytics tasks such as natural language processing, data integration, and decision support. Opportunities exist for leveraging semantic methods to enhance these data science efforts. Our research and development effectively use biomedical ontologies and/or semantics methods to address important problems in biomedicine and fundamental problems in natural language processing such as word sense disambiguation, relation extraction, and temporal information extraction. In addition, we also seek to build effective machine learning models to predict health outcomes for patients such as mortality and readmission.
The goal of this project is to develop structural and semantic methodologies for improving the quality of biomedical terminologies: 1) identifying problematic semantic type assignments to the concepts of the Unified Medical Language System; 2) identifying modeling errors and inconsistencies of SNOMED CT; 3) developing quality assurance methods that are applicable to families of structurally similar ontologies in BioPortal. We are also developing algorithmic methods to identify concepts in existing ontologies in the Unified Medical Language System to enrich another ontology such as SNOMED CT and National Cancer Institute Thesaurus.