Citation Extraction and Parsing
28./29. May 2026 | Frankfurt a. M. (Germany)
28./29. May 2026 | Frankfurt a. M. (Germany)
Accurate and open citation data are essential foundations for transparent, reproducible, and networked scholarship. The Workshop on Citation Extraction and Parsing (CiteX 2026) provides an interdisciplinary forum for researchers, developers, and practitioners to explore current advances in the automated identification, structuring, and dissemination of bibliographic references.
Building upon the growing momentum around open citations and initiatives such as WikiCite, WOOC, and the Frankfurt Workshop series, CiteX 2026 aims to foster dialogue across communities working on both the methodological and infrastructural aspects of citation data. The workshop welcomes contributions addressing technical innovations—ranging from traditional rule-based and machine-learning methods to state-of-the-art approaches using large language models (LLMs)—as well as practical and conceptual discussions about the creation, quality, and integration of open citation datasets.
CiteX 2026 invites participation from all disciplines interested in the extraction, parsing, and reuse of citation information. By bringing together perspectives from research, infrastructure, and applied contexts, the workshop seeks to strengthen collaboration, identify shared challenges, and promote the development of interoperable and openly accessible citation ecosystems.
Registration is now open! Registrate via https://t1p.de/p6e0s.
Title: “Reference extraction in the bibliometric hinterlands - recovering from domain switch with GROBID”
Authors: Paul Donner; Yi Wen
Title: “References Tractor: Citation Extraction and Linking with API-Retrieval from Scholarly Knowledge Graphs”
Authors: Nicolau Duran-Silva; Pablo Accuosto; Nicandro Bovenzi
Title: “Detecting and classifying publications based on their abstracts with LLM embeddings and multi-label classifiers”
Authors: Annika Buchholz; Imene Khebouri; Thorsten Koch; Wolfgang Peters-Kottig; Tim Kunt; Tomasz Stompor; Thi Huong Vu; Janina Zittel
Title: “An Iterative OCR and LLM-Based Workflow for High-Accuracy Citation Extraction in Large-Scale Bibliographic Corpora”
Authors: Abdallah Mohamed Abdallah Abdelnaby; Zeki Mustafa Doğan; Jörg-Holger Panzer; Lilja Mareike Sautter
Title: “Open Social Science Citation Index (OpenSSCI). A Dataset of Metadata and Citation Links from SSOAR produced by the OUTCITE Project”
Authors: Muhammad Ahsan Shahid; Philipp Mayr
Title: “RenoBench: A citation parsing benchmark”
Authors: Parth Sarin; Juan Pablo Alperin; Dione Mentis; Adam Buttrick
Title: “Why citation processing remains challenging in SSH publications: toward a robust and scalable GRAPHIA Citation Index API”
Authors: Matteo Romanello; Yurui Zhu; Noushin Najafiragheb; Patryk Hubar-Kołodziejczyk; Marta Soricetti; Angelo Di Iorio; Julien Homo
Title: “Identifying Citation Elements from Footnotes in Monographs Using Computer Vision and Natural Language Processing Methods”
Authors: Michal Ulaniuk; Przemys law Korytkowski
Title: “Bibliographical Parsing of Descriptive Linguistic Literature”
Authors: Harald Hammarström
Title: “CEC: A Tool for Context-Aware Citation Extraction and Citation Intent Classification from Scholarly PDFs”
Authors: Angelo Di Iorio; Ivan Heibi; Silvio Peroni; Lorenzo Paolini; Marta Soricetti
Title: “Extraction of citation metadata in law and the humanities using Grobid: a new workflow, dataset, and model”
Authors: Luca Foppiano; Christian Boulanger
Title: “Stackable Citation Knowledge: Building on Nanopublications for Climate and Biodiversity Research”
Authors: Anne Fouilloux; Jean Iaquinta
Title: “LLM supported annotation and reference style recognition – pilot study with educational research publications”
Authors: Muhammad Ahsan Shahid; Ezgi Tugyan; Anele Schmidt; Verena Weimer; Tamara Heck; Philipp Mayr; Christoph Schindler; Thomas Oerder
28.-29. May 2026
14 Contributions. Each 20 min. speech time.
DAY 1
12:00-1:00 pm Arrival
1:00-1:30 pm Ignition Talk (30 Min)
1:30-2:30 pm 2 Presentations
2:30-3:00 pm Break
3:00-4:00 pm 2 Presentations
4:00-4:15 pm Break
4:15-5:45 pm 3 Presentations
DAY 2
9:00-10:00 am Wrap-Up Workshop
10:00-11:00 am 2 Presentations
11:00-11:15 am Break
11:15-12:45 pm 3 Presentations
12:45-1:30 pm Lunch
1:30-2:30 pm 2 Presentations
2:30-3:00 pm Ignition Talk
3:00-3:30 pm End discussion and Good bye
Automated extraction and parsing of references
Creation and sharing of gold standards and test datasets
Standardization and interoperability of citation data
Quality assessment and validation of extracted references
Provision and integration of open citation data into repositories and search systems
Citation practices across disciplines
Data linking between scholarly works, datasets, and other research outputs
Annotation and enrichment of citation data
Prompt engineering and fine-tuning of LLMs (e.g., GPT-4, LLaMA) for citation tasks
Comparison of LLM-based and tool-based (e.g., GROBID, Anystyle, Cermine) extraction pipelines
In-text citation extraction and context analysis using LLMs
Use of open web search APIs or LLMs for source retrieval
We are looking forward to receiving submissions for presentations, poster and hands-on sessions.
We do prefer presenters to participate on-site; however we try to make oral presentations possible. Please indicate if you will attend on-site or online in the submissions form.
Participation is as well possible without contribution. We will charge 36 Euro for on-site-participation to cover the costs for organisation. Online participation is free of charge.
Please submit an extended abstract (1250-1500 words excl. references) for any contribution format.
Only submissions in English will be considered.
Accepted extended abstract will be published under a Zenodo community.
Please indicate in the submission form, if you are interested in a Special Issue.
Submission deadline: 01. February 2026
Camera-ready version: 31.03.2026
Workshop dates: 28./ 29. May 2026
The Workshop will take place at DIPF | Leibniz Institute of Research and Information in Education, Rostocker Straße 6, 60323 Frankfurt a. M.
Please submit your contribution via Google Forms: Submission – Workshop on Citation Extraction and Parsing (CiteX 2026)
For any question, please contact us via e-mail: workshop.cep.2026@gmail.com
Tamara Heck (Leibniz Institute for Research and Information in Education)
Angelo Di Iorio (University of Bologna)
Philipp Mayr-Schlegel (Leibniz Institute for the Social Sciences)
Marta Soricetti (University of Bologna)
Matteo Romanello (Odoma)
Silvio Peroni (OpenCitations)
Christoph Schindler (Leibniz Institute for Research and Information in Education)
Stephan Stahlschmidt (German Centre for Higher Education Research and Science Studies)
Christian Boulanger (Max Planck Society for the Advancement of Science)
Andreas Wagner (Max Planck Society for the Advancement of Science)
Daniel Mietchen (FU Berlin)
Muhammad Ahsan Shahid (GESIS)
Verena Weimer (Leibniz Institute for Research and Information in Education)