LLM-based Subject Tagging for
the TIB Technical Library's Open-Access Catalog
Theme: The Development of Energy- and Compute-Efficient LLM Systems
The 2nd LLMs4Subjects Shared Task | GermEval'25 @ Konvens 2025, Hildesheim, Germany
Theme: The Development of Energy- and Compute-Efficient LLM Systems
The 2nd LLMs4Subjects Shared Task | GermEval'25 @ Konvens 2025, Hildesheim, Germany
To assess the performance of submitted solutions, participant's systems were evaluated through both quantitative and qualitative assessments to ensure a comprehensive understanding of their performance.
The quantitative evaluation focused on precision, rprecision, recall, F1 and ndcg scores at various thresholds (k = 5 to 20). Systems were ranked based on their ndcg@20 scores across the specified thresholds, emphasizing the importance of retrieving relevant subjects.
For the qualitative assessment, 5 distinct subject classifications were utilized: Linguistics (lin), Literature Studies (lit), Mathematics (mat), Economics (oek) and Traffic Engineering (ver). Within each classification, 10 record files were selected, and the top 20 GND codes from participants' submissions were extracted. Subject librarians meticulously evaluated these codes to assess their relevance and accuracy.
During the qualitative evaluation, subject librarians marked the predictions based on the following codes: Y: Yes, correct keyword -- I: Irrelevant keyword, but technically correct -- N or Blank: Incorrect. Based on these codes, two different qualitative results were computed. In the first case, both Y and I were considered correct, while in the second case, only Y was considered correct. These two cases were computed in separate files to provide a comprehensive view of system performance.