We have used `scispacy` and other medical resources NLP models are used to detect UMLS terms present in sentences. The UMLS meta-thesaurus is consulted for ancestor concepts four each term that was found.Feature matrix is built, where each sentence is an individual and each term is a feature. This matrix indicates which features are present in which sentences. A clustering algorithm helped by silhouette analysis is applied in the matrix, and the resulting clusters are subjected to several visualization methods. In order to compare the weight of CUIs that represente temporal concepts, the the feature matrix is shown to build the network graph, edge weights are calculated by Pairwise distances across different sentences are calculated by subtracting their respective feature vectors.
Building Feature matrix
Validation: Sillhouete determines the best number of clusters, after evaluating multiple values, where `2 <= N < 12`
Knowledge Graph Clustering.
K-means Clustering
Guassian Mixture Clustering
NetworkX graphs depicting medical CUI terms that usually appear on the same sentences.
Distance matrices: reordered to match clusters.
Feature matrices: reordered to match clusters.
1. UMLS API specifies the limit of 20 calls per second
Solution: Download the UMLS Metathesaurus file locally and load by using pymedtermino2.
2. Local Meathesaurus database fails to build when in the 'load deprecated terms' mode.
And won't recognize every CUI located by Sciscpacy, failing to provide entity names.
Solution: Step backwards and fetch the entity names from the UMLS API.
3. Doing all the required UMLS API calls may take a long time.
Solution: Create a multithreading pool to send requests in parallel.
Semantic Knowledge Graph | Ontology | Clinical Decision Support