AI in Natural Language Processing

– Hugging Face Project

Jiwoo Hong, Shuyi Li, Swati Rajesh, Mariam Alamoodi, Yuhang Zhou

Enhancing Biomedical Literature Retrieval

with Semantic Textual Similarity

ABSTRACT: This research addresses the limitations of keyword search in biomedical literature by developing a Semantic Textual Similarity (STS) model. Using a teacher-student framework, we trained a model on a large-scale BioASQ dataset to understand complex medical language. We compared a baseline dual-loss training model with an "Adjusted Dual-Loss" model, which incorporated advanced regularization to improve stability.

The results demonstrated that the adjusted model provided significantly more stable training and was more effective at resisting overfitting. However, a critical finding was the discovery of severe data leakage, with thousands of text snippets repeated across the training and test sets. This data integrity issue compromises the validity of the current performance metrics. Therefore, while the adjusted training method shows promise, the model's reliability must be re-evaluated on a clean, deduplicated dataset.

For more details, please visit the AI in Natural Language Processing - Hugging Face Project

Website: [https://program.blendedlearn.org/best-outcome/enhancing-biomedical-literature-retrieval-with-semantic-textual-similarity]

HuggingFaceTeam3_Poster.pdf

Project Video_HuggingFaceTeam3.mp4

Let me know if you have any thoughts or questions!

Go back to the previous page

Page updated

Google Sites

Report abuse