Overview

Cruciverb-IT is the first shared task proposed at EVALITA 2026 on crossword puzzle solving.

We propose two tasks: i) answering clues extracted from Italian crosswords; ii) autonomously solving Italian crossword grids.

News

02/10/2025 - Minor updates on the baseline approach.

01/10/2025 - Check the dates for the evaluation window here.

22/09/2025 - Training data available here.

01/09/2025 - We are online!

Motivation

Language games have emerged as valuable testbeds for evaluating and enhancing the reasoning abilities of Language Models (LMs). Among these, crossword puzzles represent a particularly challenging and multifaceted task that requires not only linguistic competence but also cultural knowledge, lateral thinking, and the ability to interpret ambiguous or polysemous clues (Wallace et al., 2022; Rozner et al., 2021; Saha et al., 2025; Sadallah et al., 2025). Solving crosswords involves complex semantic and pragmatic reasoning, making this setting ideal for testing models’ deeper language understanding capabilities beyond surface-level similarity.

Before the advent of modern Language Models (LMs), most approaches to crossword solving relied on retrieval-based methods and shallow lexical and semantic features (Ernandes et al., 2005; Angelini et al., 2005). For example, Barlacchi et al. (2014) proposed a system that exploited lexical resources and similarity metrics to match clues to candidate answers in Italian, while SACRY (Moschitti et al., 2015) incorporated syntactic information and ranking strategies to improve clue-answer matching. However, these systems typically struggle with clues that require deeper interpretative reasoning, such as wordplay, anagrams, or polysemous expressions. Consider, for instance, the clue “Producono con procedimenti lenti”, where lenti can mean both “slow” and “lenses” in Italian; a viable answer could be "ottici" (opticians), illustrating the type of ambiguity traditional systems often fail to resolve.

Despite the impressive advancements in Large Language Models (LLMs), their performance on language games such as crosswords remains limited, especially in morphologically rich and less-resourced languages like Italian (Sarti et al., 2024a; Sarti et al., 2024b). Existing LMs and retrieval-based systems still fall short when faced with clues requiring subtle reasoning or cultural grounding.

This shared task aims to encourage research in this direction by providing a challenging testbed for developing and evaluating systems focused on crossword puzzle solving.

Contact the organizers:

cruciverbit.evalita2026@gmail.com