Task B: Trait-specific Scoring

IN TAQEEM2025, we propose two sub-tasks; Task A is for Holistic Scoring, while Task B is for Trait-specific Scoring. We describe and formally define Task B below.

Task Definition

Evaluation Measures

Registration

Dataset

Task Definition

Task B is defined as follows:

Given a set of source prompts, Task B aims is to train a trait-specific scoring model using those prompts to evaluate essays written for an unseen target prompt. The model should assign individual scores for each of the seven traits: Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Grammar, reflecting the essay's quality in each specific area.

Evaluation Measures

The primary evaluation metric for this task is the Quadratic Weighted Kappa (QWK), a standard AES performance metric that quantifies agreement between human-assigned scores and system predictions. The Root Mean Squared Error (RMSE) will also be reported for a more comprehensive analysis of model performance.

For Task B, the average QWK for each trait across the test prompts will be measured separately, and teams will be ranked according to the average QWK over all traits.

Registration

You can find detailed information about Task B registration here .

Dataset

Detailed information about the dataset is here.

Page updated

Google Sites

Report abuse