@ FIRE-2025
17th - 20th December, 2025
Results of the SciHigh-2025 Shared Task are now available here
Task Description
This shared task focuses on automatically generating research highlights from scientific paper abstracts using the MixSub dataset proposed by us in an earlier work [1]. Both research highlights and abstracts serve as summaries of a research paper, but highlights offer a more structured and concise version of the key contributions. The goal of this task is to develop machine learning models that can generate high-quality highlights similar to those written by authors.
Participants will explore different summarization techniques, including transformer-based models, retrieval-augmented approaches, and fine-tuned neural networks.
The task aims to improve the efficiency and accuracy of highlight generation, which can benefit researchers and academic indexing platforms.
Use Cases
The generated highlights can be useful in multiple scenarios, such as:
✔ Helping researchers quickly understand the key contributions of scientific papers. They are often easier to read and grasp than a longer paragraph, especially on hand-held devices.
✔ Reducing the time needed to extract relevant information from research articles.
✔ Enhancing metadata for academic search engines and digital libraries.
✔ Evaluating the effectiveness of different summarization techniques for scientific papers.
Important Dates
25th May, 2025 - Training and Validation data release(Training_dataset_download, Validation_dataset_download)
15th June, 2025 - Test data release(Test_dataset_download(Masked)) Test_dataset_download
30th June, 2025 - Run submission deadline
10th July, 2025 – Run submission deadline (extended)
15th July, 2025 - Results declared
30th July, 2025 - Results declared(extended)
30th August, 2025 - Working notes due
30th September, 2025 - Camera-ready copies of working notes and overview paper due
17th December, 2025 - FIRE conference
NOTE: All dates are in AoE timezone .
Dataset
For this shared task, we utilize a subset of the MixSub dataset [1], referred to as MixSub-SciHigh. The MixSub corpus was created by collecting research articles from ScienceDirect, encompassing a diverse range of scientific domains. It comprises 19,785 research papers published in the year 2020. Each data instance is structured as a pair consisting of the abstract and the corresponding author-written research highlights.
Each entry in the dataset includes:
Abstract: A concise summary of the research paper.
Research Highlights: Key contributions manually written by the authors.
An example is given below.
An (abstract, highlights) pair from the MixSub dataset. Taken from https://www.sciencedirect.com/science/article/pii/S0001457519307213
The MixSub-SciHigh dataset is split into three sets:
Training Set: 10,000 data instances,
Validation Set: 1985 data instances,
Test Set: 1840 data instances(Masked ground-truth).
Format: CSV files containing 3 column as Filename, Abstract, Highlights
Evaluation Plan
Submissions will be evaluated using the following automatic metrics:
ROUGE-1, 2, L: Measures lexical overlap with reference highlights.
METEOR: Evaluates semantic similarity using synonym and stemming matching.
✔ Participants are encouraged to analyse common challenges such as hallucinations (incorrect information) and factual inconsistencies in the generated highlights.
✔ The submitted entries will be ranked using the F1-score for the ROUGE-L metric. Innovative ideas in the proposed solution will be appreciated.
Submission Format
You should submit a single .zip file containing all required files to this email ID only: tohidarehman.it@jadavpuruniversity.in
Submission Requirements:
Participants must upload their trained model checkpoint to Hugging Face and share the link to the uploaded model.
Along with the Hugging Face model link, you must include a .csv file containing your results.
CSV File Format:
Each line in the .csv file should contain three comma-separated columns in the following order:
Filename
Abstract
Predicted_Highlights
File Naming Conventions:
The .csv file must be named using the following format: <team_name>_<run_identifier>.csv
Example: TeamA_run1.csv, where:
TeamA is your team name
run1 identifies the specific run
✅ Use underscores (_) only as shown above.
❌ Do not use blank spaces, tabs, or additional underscores in the file name.
The .zip file should be named after the email ID used for registration (excluding the domain).
For example, if your registered email ID is hello@gmail.com, then your zip file should be named:
hello.zip
Additional Notes:
You may submit up to two solutions for this task to demonstrate different approaches or refinements of your work.
Contact
Email: tohidarehman.it@jadavpuruniversity.in
References
[1] Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, and Partha Pratim Das. Generation of Highlights From Research Papers Using Pointer-Generator Networks and SciBERT Embeddings. IEEE Access,Volume 11, pages 91358–91374, 2023, DOI:https://doi.org/10.1109/ACCESS.2023.3292300.