Research

Overarching Goal

The overarching goal of my research is to improve the population health and advance biomedical research through the collection, analysis, and application of electronic health data from heterogeneous sources.

Clinical Research Informatics

Clinical studies are essential in evidence-based medicine. However, participant recruitment has long been a major concern. Although ~60% of new all cancer cases occur among older adults, they comprise merely 25% of participants in cancer clinical studies. Unjustified or overly-restrictive eligibility criteria are the most important modifiable barriers causing low accrual, early termination, and low generalizability. This in turn can cause the studies to be underpowered and increase the likelihood of adverse drug reactions and toxicity when moved into clinical practice. We are developing data-driven methods and tools to assess the generalizability of clinical studies using the electronic data in clinical trial registries, public patient databases, and clinical data warehouses. This project aims to improve the representation of underserved population subgroups in clinical studies such as older adults with multiple chronic conditions.

Relavant Publications

Consumer Health Informatics

The widely known vocabulary gap between health consumers and healthcare professionals hinders effective communication between the two groups, and impacts the effectiveness of consumer’s health information seeking. Among the efforts of building consumer-oriented controlled vocabularies, Open Access Collaborative Consumer Health Vocabulary (OAC CHV) is the only consumer vocabulary that has been integrated into the UMLS. It is a controlled vocabulary designed to complement the existing UMLS framework and to facilitate the needs of consumer health applications. Through term mapping among its source vocabularies, UMLS enables consumer-facing applications to translate texts with technical and professional terms to consumer-friendly language. In this project, we will first assess the semantic coverage of OAC CHV to understand its deficiencies. Then, we will use three similarity-based approaches to automatically identify new consumer terms that are similar to existing CHV terms in consumer-generated text corpora. This overarching goal of the project is to fill the vocabulary gap between health professionals and consumers in consumer-oriented health applications. The proposed infrastructure can enhance the open-access and collaborative development of CHV towards optimal conceptual content and utility.

Relavant Publications

Semantics-Powered Data Analytics and Data Mining

Various healthcare information systems such as EHRs have integrated well-curated biomedical controlled vocabularies, e.g., the International Classification of Diseases (ICD) and RxNORM, as their vocabulary foundation [6]. With rich medical concepts linked by hierarchical and associative relationships, these vocabularies and ontologies can also be utilized in health data analytics tasks such as natural language processing, data integration, and decision support. Opportunities exist for leveraging semantic methods to enhance these data science efforts. Our research and development effectively use biomedical ontologies and/or semantics methods to address important problems in biomedicine and fundamental problems in natural language processing such as word sense disambiguation, relation extraction, and temporal information extraction.

Relavant Publications

Biomedical Ontologies and Terminologies

The goal of this project is to develop structural and semantic methodologies for improving the quality of biomedical terminologies: 1) identifying problematic semantic type assignments to the concepts of the Unified Medical Language System; 2) identifying modeling errors and inconsistencies of SNOMED CT; 3) developing quality assurance methods that are applicable to families of structurally similar ontologies in BioPortal. We are also developing algorithmic methods to identify concepts in existing ontologies in the Unified Medical Language System to enrich another ontology such as SNOMED CT and National Cancer Institute Thesaurus.

Relavant Publications