IN TAQEEM2025, we propose two sub-tasks; Task A is for Holistic Scoring, while Task B is for Trait-specific Scoring. We describe and formally define Task B below.
Task B is defined as follows:Â
Given a set of source prompts, Task B aims is to train a trait-specific scoring model using those prompts to evaluate essays written for an unseen target prompt. The model should assign individual scores for each of the seven traits: Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Grammar, reflecting the essay's quality in each specific area.
The primary evaluation metric for this task is the Quadratic Weighted Kappa (QWK), a standard AES performance metric that quantifies agreement between human-assigned scores and system predictions. The Root Mean Squared Error (RMSE) will also be reported for a more comprehensive analysis of model performance.
For Task B, the average QWK for each trait across the test prompts will be measured separately, and teams will be ranked according to the average QWK over all traits.
You can find detailed information about Task B registration here .
Detailed information about the dataset is here.