Overarching Goal

My research area is biomedical and health informatics, an interdisciplinary field that requires close collaboration with other researchers in information science, computer science, medical science, statistics, and social science to address challenging problems in healthcare and biomedicine. The overarching goal of my research is to improve the population health and advance biomedical research through the collection, analysis, and application of electronic health data from heterogeneous sources.

My research encompasses both the foundation and application areas of biomedical informatics. In the foundation area, I have developed computational methods to enhance biomedical ontologies and controlled vocabularies and improve their utility in knowledge management, knowledge representation, data analytics, and natural language processing. In the application area, I have developed novel data-driven informatics methods and tools to assess the generalizability of clinical studies and predict patient outcomes with electronic data in clinical trial registries, public patient databases, and clinical data warehouses. While enhancing my integrative knowledge and skills on text mining and ontologies, I have developed a new theme to bridge the vocabulary gap between health professionals and health consumers, which is directly related to health literacy – an important problem in information science. In the past two years, I founded the eHealth Lab in the iSchool, joined the Center for Translational Behavioral Science as Lead of the Methods core, and participated in the FSU-UF Joint Clinical and Translational Science Institute as the Informatics Lead. With these resources, I have been creating synergy within and beyond the iSchool to strengthen FSU’s research capacity in biomedical informatics as well as health data science.

Clinical Research Informatics

Clinical studies are essential in evidence-based medicine. However, participant recruitment has long been a major concern. Although ~60% of new all cancer cases occur among older adults, they comprise merely 25% of participants in cancer clinical studies. Unjustified or overly-restrictive eligibility criteria are the most important modifiable barriers causing low accrual, early termination, and low generalizability. This in turn can cause the studies to be underpowered and increase the likelihood of adverse drug reactions and toxicity when moved into clinical practice. We are developing data-driven methods and tools to assess the generalizability of clinical studies using the electronic data in clinical trial registries, public patient databases, and clinical data warehouses. This project aims to improve the representation of underserved population subgroups in clinical studies such as older adults with multiple chronic conditions.

Relavant Publications

Consumer Health Informatics

The widely known vocabulary gap between health consumers and healthcare professionals hinders effective communication between the two groups, and impacts the effectiveness of consumer’s health information seeking. Among the efforts of building consumer-oriented controlled vocabularies, Open Access Collaborative Consumer Health Vocabulary (OAC CHV) is the only consumer vocabulary that has been integrated into the UMLS. It is a controlled vocabulary designed to complement the existing UMLS framework and to facilitate the needs of consumer health applications. Through term mapping among its source vocabularies, UMLS enables consumer-facing applications to translate texts with technical and professional terms to consumer-friendly language. In this project, we will first assess the semantic coverage of OAC CHV to understand its deficiencies. Then, we will use three similarity-based approaches to automatically identify new consumer terms that are similar to existing CHV terms in consumer-generated text corpora. This overarching goal of the project is to fill the vocabulary gap between health professionals and consumers in consumer-oriented health applications. The proposed infrastructure can enhance the open-access and collaborative development of CHV towards optimal conceptual content and utility.

Relavant Publications

Semantics-Powered Data Analytics and Machine Learning

Various healthcare information systems such as EHRs have integrated well-curated biomedical controlled vocabularies, e.g., the International Classification of Diseases (ICD) and RxNORM, as their vocabulary foundation. With rich medical concepts linked by hierarchical and associative relationships, these vocabularies and ontologies can also be utilized in health data analytics tasks such as natural language processing, data integration, and decision support. Opportunities exist for leveraging semantic methods to enhance these data science efforts. Our research and development effectively use biomedical ontologies and/or semantics methods to address important problems in biomedicine and fundamental problems in natural language processing such as word sense disambiguation, relation extraction, and temporal information extraction. In addition, we also seek to build effective machine learning models to predict health outcomes for patients such as mortality and readmission.

Relavant Publications

Biomedical Ontologies and Terminologies

The goal of this project is to develop structural and semantic methodologies for improving the quality of biomedical terminologies: 1) identifying problematic semantic type assignments to the concepts of the Unified Medical Language System; 2) identifying modeling errors and inconsistencies of SNOMED CT; 3) developing quality assurance methods that are applicable to families of structurally similar ontologies in BioPortal. We are also developing algorithmic methods to identify concepts in existing ontologies in the Unified Medical Language System to enrich another ontology such as SNOMED CT and National Cancer Institute Thesaurus.

Relavant Publications