The enormous growth of scientific literature in biomedical research fields provides an opportunity to use semantics to analyze its granularity and get both a general and specific idea of how research is evolving across decades. Our project could successfully capture this through data analysis tools and machine learning models (MPNet) and could highlight an emerging progression in the "centroid" of research topics.
At first, we show that such a semantics-based embedding space is successful in capturing the granularity of the research landscape, as we zoom into cancer research. When we map these embeddings onto PCA, a time-dependent shift of the clusters of keywords is noticed by tracking the movement of the centroid. Thus, there is a specific directionality to the shifting of the focus of cancer research. This is revealed to be in the field of miRNA, Cyclooxygenase-2, etc. Thus, our analysis captures the present focus on DNA and RNA vaccines and COX-2 inhibitors against cancer.
Predicting the direction of research in the upcoming years is of prime importance to scientists in order to prepare for grants and set the ball of research in their own labs rolling. Our model predicts that there will be more focus on anti-cancer drugs and nanoparticle delivery. This agrees with very recent research such as the one from the University of Chicago (Figure 2).
Figure 1: Emerging ways of Cancer Therapy. Here we successfully capture the growth of DNA/RNA and Viral Vector (nanoparticles) Vaccines as potential therapeutics.
Figure 2: Nanoparticles used to deliver cyclodextrin to combat cancer [8].
Our analysis was mostly based on PCA, with the two highest principal compoenents explaining just above 10% of the variance. Also, we noticed the lack of enrichment of other new methods of cancer therapy such as CAR-T.