IN TAQEEM2025, we propose two sub-tasks; Task A is for Holistic Scoring, while Task B is for Trait-specific Scoring. We describe and formally define Task A below.
Task A is defined as follows:
Given a set of source prompts, the aim is to train a holistic scoring model using those prompts to score essays written for an unseen target prompt. The model should produce a single holistic score that reflects the overall quality of each essay.
The primary evaluation metric for this task is the Quadratic Weighted Kappa (QWK), a standard AES performance metric that quantifies agreement between human-assigned scores and system predictions. The Root Mean Squared Error (RMSE) will also be reported for a more comprehensive analysis of model performance.
Task A will be assessed based on the average QWK of the holistic score across the test prompts.
You can find detailed information about Task A registration here .
Detailed information about the dataset is here.