Software & Tools

simiTerm: A Similarity-Based Method for Identifying New Terms for Consumer Health Vocabulary

The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), i.e., medical terminologies used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers’ health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media including social question and answer (social Q&A) sites, afford us enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study aims to extend an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on two social Q&A disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A. The paper "Enriching consumer health vocabulary through mining a social Q&A site: a similarity-based approach" (Journal of Biomedical Informatics 2017) introduced the method behind this tool. [Source code]

This work was supported by the start-up fund of Florida State University and an Amazon Web Services Education and Research Grant Award (PI: He). The work was partially supported by National Center for Advancing Translational Sciences under the Clinical and Translational Science Award UL1TR001427 (PI: Nelson & Shenkman). 


Visual Analysis Tool of Study Populations of Clinical Trials (VITTA)

Using a previously published database COMPACT as the backend, we designed a tool for visual aggregate analysis of clinical trial eligibility features. This tool consists of four modules for eligibility feature frequency analysis, query builder, distribution analysis, and visualization, respectively. This tool is capable of analyzing:

(1) frequently used qualitative and quantitative features for recruiting subjects for a selected medical condition;
(2) distribution of study enrollment on consecutive value points or value intervals of each quantitative feature;
(3) distribution of studies on the boundary values, permissible value ranges, and value range widths of each feature. 

All analysis results were visualized using Google Charts API. Five recruited potential users assessed the usefulness of this tool for identifying common patterns in any selected eligibility feature for clinical trial participant selection. The recruited potential users rated the user-perceived usefulness of VITTA with an average score of 86.4/100.The paper "Visual aggregate analysis of eligibility features of clinical trials" (Journal of Biomedical Informatics 2015 Vol.54) introduced the theory behind this tool.

This project is supported by National Library of Medicine Grant R01LM009886 (PI: Weng) and an Amazon Web Services Education Research Grant Award (PI: He).

AdviseEditor Tool

This tool could help a UMLS editor in determining whether a combination of semantic types is permitted, prohibited, or requires more investigation.

The paper "Rule-based support system for multiple UMLS semantic type assignments" (Journal of Biomedical Informatics 2013 Vol.46 Issue:1) introduced the theory behind this tool.

This project is supported by National Library of Medicine Grant R01LM008912.

Neighborhood Auditing Tool (NAT)

The UMLS’s integration of more than 100 source vocabularies, not necessarily consistent with one another,causes some inconsistencies. The purpose of auditing the UMLS is to detect such inconsistencies and to suggest how to resolve them while observing the requirement of fully representing the content of each source in the UMLS. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented.

The NAT supports “neighborhood-based” auditing, where, at any given time, an auditor concentrates on a single focus concept and one of a variety of neighborhoods of its closely related concepts. Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings.

The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described.

This project is supported by National Library of Medicine Grant R01LM008912.