This year’s shared task will focus on Citation Prediction, Discovery, and Placement within scientific documents. Participants will be challenged to develop models that can accurately predict relevant citations, discover masked citations, and identify the specific sentences in which citations should be inserted. The shared task is designed to assess models’ abilities to understand the intricate citation networks in scientific discourse while also exploring how well they handle domain-specific knowledge.
The SCIDOCA 2025 Shared Task is designed to address the growing need for automated citation systems that assist researchers in managing the ever-expanding corpus of scientific literature. By improving citation discovery and placement, this task could lead to advancements in:
Efficient Literature Review: Helping researchers quickly find relevant work.
Improved Scientific Writing Tools: Automating citation insertion to enhance the drafting process.
Citation Network Analysis: Enabling better understanding of citation behaviors across scientific domains.
By focusing on these tasks, the shared task aims to advance the state of research in scientific document analysis and citation management.
November 8 2024: Call for participation (data format finalized before distributing training and test data).
February 3 2025: Training data distributed.
February 19 2025: Test input data distributed (registration closes).
March 5 2025: System submission deadline (outputs + method summary).
March 8 2025: Results and team rankings announced.
March 13 2025: Technical paper submission deadline.
TBA: Notification of paper acceptance.
March 25 2025: Camera-ready submission deadline.
updated on Mar 10 2025.
No External Data Transmission: Systems must operate offline and cannot send any provided data (training or test) to external services or APIs.
No Human Intervention: Systems must function autonomously during test-time inference, with no manual adjustments or parameter tuning.
Restricted Use of Non-Organizer Data:
External citation-related datasets or services (e.g., CrossRef, PubMed) are prohibited.
General-purpose pretrained models (e.g., BERT) are allowed if unrelated to citations.
Citation-related pretrained models (e.g., SPECTER, Galactica) are prohibited.
Subtask 1: Citation Discovery
Predict relevant citations for a given paragraph without specifying the exact sentence.
Subtask 2: Masked Citation Prediction
Predict the correct citation for each masked citation slot in a paragraph.
Subtask 3: Citation Sentence Prediction
Identify the correct citation for each sentence in a paragraph that contains a citation.
Objective:
Predict relevant citations for a paragraph without specifying the exact sentence where the citation belongs.
Input:
Paragraph: A text passage from a scientific document that doesn’t contain citations.
Candidate References: A list of potential references, which includes both the correct citations and distractors (irrelevant but plausible citations).
Example Input:
{
"paragraph": "Recent advances in natural language processing have significantly improved the performance of models on various tasks such as machine translation and question answering.",
"candidate_references": [
"[Vaswani et al. 2017]",
"[Devlin et al. 2019]",
"[Brown et al. 2020]",
"[Radford et al. 2018]"
]
}
Output:
Predicted Citations: A list of citations that are contextually relevant to the paragraph.
Example Output:
{
"predicted_citations": [
"[Vaswani et al. 2017]",
"[Devlin et al. 2019]"
]
}
Evaluation:
For a given paragraph i, calculate Precision, Recall, and F1-Score using the equations:
Precision: measures the proportion of correctly predicted citations (TP) among all predicted citations (TP + FP). It reflects the relevance of the predicted citations.
Recall: measures the proportion of correctly predicted citations (TP) among all ground-truth citations (TP + FN). It reflects the completeness of predictions.
F1-Score: is the harmonic mean of Precision and Recall, balancing both measures.
Evaluation Across the Dataset: Weight the metrics by the number of ground-truth citations (GT_i) in each paragraph.
Objective: Participants will predict the correct citation for each masked citation slot within a paragraph where the citation has been removed.
Input:
Paragraph: A paragraph where one or more citation slots have been masked (replaced by a placeholder such as [MASK1], [MASK2], etc).
Candidate References: A list of potential references, including both correct citations and distractors.
Example Input:
{
"paragraph": "Transformer models like BERT [MASK1] and GPT-3 [MASK2] have revolutionized natural language processing tasks. These models [MASK3] continue to set benchmarks across various domains.",
"candidate_references": [
"[Vaswani et al. 2017]",
"[Devlin et al. 2019]",
"[Brown et al. 2020]",
"[Radford et al. 2018]"
]
}
Output:
Predicted Citations: A dictionary mapping each labeled mask to its corresponding citation.
Example Output:
{
"predicted_citations": {
"[MASK1]": "[Devlin et al. 2019]",
"[MASK2]": "[Brown et al. 2020]",
"[MASK3]": "[Radford et al. 2018]"
}
}
Objective: Given a paragraph, participants will predict the correct citation for each sentence that contains a citation.
Input:
Paragraph: A multi-sentence paragraph without any explicit citation markers.
Candidate References: A list of potential citations, including both correct citations and distractors.
Example Input:
{
"paragraph": ["Transformer models have transformed the field of NLP.","One of the most influential models is BERT.", "We will investigate the results of BERT models." ,"GPT-3 has further pushed the boundaries of language modeling."],
"candidate_references": [
"[Vaswani et al. 2017]",
"[Devlin et al. 2019]",
"[Brown et al. 2020]",
"[Radford et al. 2018]"
]
}
Output:
Sentence Citations: A mapping of sentences to the correct citation(s), if required.
Example Output:
{
"sentence_citations": [
{
"sentence": "Transformer models have transformed the field of NLP.",
"predicted_citation": ["[Vaswani et al. 2017]"]
},
{
"sentence": "One of the most influential models is BERT.",
"predicted_citation": ["[Devlin et al. 2019]"]
},
{
"sentence": "We will investigate the results of BERT models.",
"predicted_citation": [[empty], [Devlin et al. 2019]]
},
{
"sentence": "GPT-3 has further pushed the boundaries of language modeling.",
"predicted_citation": ["[Brown et al. 2020]"]
}
]
}
Vu Tran, The Institute of Statistical Mathematics, Tokyo
An Dao, RIKEN Center for Advanced Intelligence Project
Email: vutran[at]ism.ac.jp, scidocaworkshop[at]gmail.com
Subject: [SCIDOCA 2025 Shared Task] <your inquiry title>