The dataset provided for MSLG-SPA 2026 is designed to support controlled, reproducible, and comparable research on bidirectional translation between Mexican Sign Language Glosses (MSLG) and Spanish (SPA) . The corpus consists of aligned sentence-level pairs of:
Mexican Sign Language gloss sequences
Spanish text sentences
Each pair represents a semantically equivalent expression in both modalities, enabling participation in the two complementary subtasks of the shared task: Gloss-to-Spanish (MSLG2SPA) and Spanish-to-Gloss (SPA2MSLG).
The grammatical structure of MSL differs significantly from that of Spanish. MSL glosses do not follow Spanish word order, verbal inflection, or agreement patterns, and they often omit grammatical markers that must be reconstructed in spoken-language translation. As a result, translation between MSL glosses and Spanish involves substantial reordering, inference, and abstraction, rather than direct word-by-word mapping.
For example:
MSLG: AMÉRICA-YO VIVIR
SPA: Vivo en América.
MSLG: JUAN ASUNTO APARTE
SPA: Con Juan, es un asunto aparte.
MSLG: pro-TÚ LLEGAR TARDE POR QUÉ
SPA: ¿Por qué llegaste tarde?
These examples illustrate characteristic features of MSL glossing, including topic prominence, compact lexical expressions, and syntactic constructions that differ markedly from Spanish.
Note: Some linguistic resources for MSLG include auxiliary annotations (e.g., pro- to mark personal pronouns or other grammatical information). For the purposes of the MSLG-SPA 2026 shared task, all auxiliary annotations have been removed, and only simplified gloss sequences are provided in the dataset (e.g., TÚ LLEGAR TARDE POR QUÉ). This design choice ensures consistency across systems and focuses the evaluation on translation from incomplete and lossy gloss representations.
The dataset is divided into three disjoint subsets:
Training set (50%, 489 pairs): Used for model training in both subtasks. Participants are expected to perform their own internal validation (e.g., cross-validation or hold-out splits) on this set for model selection and hyperparameter tuning.
Test set MSLG2SPA (25%, 244 pairs): Used exclusively for the final evaluation of the Gloss-to-Spanish subtask.
Test set SPA2MSLG (25%, 244 pairs): Used exclusively for the final evaluation of the Spanish-to-Gloss subtask.
This partitioning ensures that both subtasks share the same training data while being evaluated on non-overlapping test sets, preventing data leakage and allowing an independent and fair assessment of each translation direction.
All data will be distributed exclusively for academic and research purposes. Any other use of the data is the sole responsibility of the user.
Submitted systems will be evaluated automatically using Machine Translation (MT) metrics selected according to the characteristics of each translation direction. Evaluation and ranking are performed separately for each subtask, and systems are compared only within the same subtask.
MSLG2SPA – Gloss-to-Spanish
Systems in the MSLG2SPA subtask are evaluated using the following metrics:
BLEU
METEOR
chrF
COMET
Since this subtask produces well-formed Spanish sentences, COMET is applied to capture adequacy and fluency using pretrained language models, complementing traditional n-gram–based metrics.
For this subtask, a subtask-specific Global Score is computed for each system. Scores for each metric are first standardized using z-score normalization across all submitted systems. The Global Score is then obtained as the arithmetic mean of the standardized metric values. Systems are ranked in descending order of this Global Score.
SPA2MSLG – Spanish-to-Gloss
Systems in the SPA2MSLG subtask are evaluated using the following metrics:
BLEU
METEOR
chrF
COMET is not applied in this subtask, as MSL gloss sequences do not constitute a natural language and do not support fluency-based evaluation.
As in the MSLG2SPA subtask, metric scores are standardized using z-score normalization across all systems submitted to SPA2MSLG. A subtask-specific Global Score is computed as the arithmetic mean of the standardized BLEU, METEOR, and chrF scores. Systems are ranked in descending order of this Global Score.
This approach avoids distortions caused by naïve rank aggregation and preserves meaningful score differences between systems.
Overall Ranking
An additional overall leaderboard is produced for systems that participate in both subtasks. The overall score is computed as the average of the Global Scores obtained in MSLG2SPA and SPA2MSLG.
The overall ranking highlights systems that achieve balanced and robust performance across both translation directions, while preserving the independence and methodological integrity of the subtask-specific evaluations.
Participants must submit their system outputs according to the official format described below. Submission instructions and the Submissions must be sent by email to the official contact address: ansel@cicese.edu.mx
We recommend using the following email subject: MSLG-SPA 2026 Submission – TeamName
Each team may submit more than one solution (run) for each subtask.
For each subtask, systems must generate one plain text (.txt) file containing the translations for the corresponding test set.
One file per subtask must be submitted:
TeamName_SolutionName_MSLG2SPA.txt for the Gloss-to-Spanish subtask, and/or
TeamName_SolutionName_SPA2MSLG.txt for the Spanish-to-Gloss subtask.
Each file must contain one line per test instance.
Lines must appear in the same order as the instances in the official test file.
No additional headers or comments are allowed.
Each line in the file must follow this format:
"SystemOutput"\n
Optionally, teams may include the instance identifier as a verification mechanism to help detect ordering or alignment errors during local development:
"InstanceIdentifier"\t"SystemOutput"\n
When included, the instance identifier will be ignored during evaluation and will be used only to assist participants in validating their submissions.
It is important to strictly respect the format, including quotation marks ("), tab separators (\t, if used), and the newline character (\n, Linux format). Submissions that do not comply with the required format may not be evaluated.
File names must follow the convention described above. Submissions received after the official deadline may not be considered.