The training/development data for both subtasks will be made available on September 22.
A sample of the training data will be made available on an earlier date. Please, check back on this page in the next few days.
The dataset for both task was collected using a Sparse Autoencoder (SAE) trained on Minerva-1B-base-v1.0.
Below we detail the main steps included in the data collection and annotation process.
The SAE model was trained on the residual stream of Minerva-1B, using the Sparsify libary. It is a k-Sparse Autoencoder with a top-k activation function and an expansion factor of 32.
To train the model, the "tiny" porton of Clean Italian MC4 was used, amounting to roughly 6B tokens.
Please check our paper for more details on the training of the SAE. The paper will be presented at CLiC-it 2025.
The SAE model is available on HuggingFace.
We collect latent (features) activations from Minerva-1B using the SAE considering only Layer 14 of the model. We chose a layer near the end of the model stack because initial evaluations shown more "semantic" features in later layers.
We collect activations by passing data from the Italian split of Wikipedia through the model, using the Delphi library. For each latent, we collected all tokens that activate it, their surrounding contexts, and the strength of the activation.
Explanations for latents were obtained semi-automatically. This is also reflected in the setup of the training set for both subtasks.
The core of the explanations were obtained using GPT-5.
Specifically, we provided GPT-5 with examples of activations in context, where activating words where highlighted between "<<" and ">>" ; in addition, a list of (word, activation strength) was also provided to the model. We prompted GPT-5 to "analyze text and provide an explanation that thoroughly encapsulates possible patterns found in it", by looking at activating words; the model was also provided with some examples in the prompt.
Then, part of the explanations provided by GPT-5 were manually revised and corrected by us.
For both subtasks, we provide three different splits:
📚🥇TRAIN-GOLD: Smaller set of training examples, with explanations manually annotated by organisers. It will include a few hundred training examples - Release: September 22.
📚🥈TRAIN-SILVER: Larger set of training examples, with explanations provided by GPT-5. It will include a few thousand training examples - Release: September 22.
📝 TEST: Set of test examples with explanations manually annotated by organisers. It will include a few hundred test examples - Release: Evaluation window (see Important Dates).
The data for Subtask 1 will be provided as a single Json file for each split (TRAIN-GOLD, TRAIN-SILVER, and TEST). Each item in the split has the following fields:
Latent ID [str]: the ID of the latent. For example, "layers.14_latent8" for the eight latent of layer 14.
examples [list]: a list of examples of activations for the latent. The number of examples per latent varies, but on average each latent will have around 40 examples. Each example is a dictionary with the following fields:
text [str]: the text of the example, with activating tokens highlighted between "<<" and ">>" . Note that if two or more contiguous tokens activated the latent, they are kept together, e.g., << like this>>.
tokens [list]: list of tokens (strings) in the example, as tokenized by the original Minerva-1B-base-v1.0 model
activations [list]: list of activating tokens found in the example. Each is a dictionary with the following keys:
token [str]: the activating token
strength [int]: strength of activation for the token, normalized in a range [0, 10]
explanation [str]: the plain text explanation for the latent. For TRAIN-GOLD, the explanation is manually annotated; for TRAIN-SILVER, the explanation is generated by GPT-5; for TEST, explanation is left blank.
Here is an example:
For Subtask 1, participants must provide a single explanation for each latent.
The data for Subtask 2 will be provided as a single Json file for each split (TRAIN-GOLD, TRAIN-SILVER, and TEST). Each item in the split has the following fields:
Latent ID [str]: the ID of the latent. For example, "layers.14_latent8" for the eight latent of layer 14.
explanation [str]: the plain text explanation for the latent. For TRAIN-GOLD, the explanation is manually annotated; for TRAIN-SILVER, the explanation is generated by GPT-5; for TEST, explanation is left blank.
examples [list]: a list of examples, both positive and negative, of sentences (and tokens) that activate/not activate the latent. The number of examples per latent is around 100, equally divided between activating and non-activating. Each example is a dictionary with the following fields:
text [str]: the text of the example.
tokens [list]: list of tokens (strings) in the example, as tokenized by Minerva-1B-base-v1.0.
activations [list]: list of activations, with one value for each token. Zero correspond to no activation. A value higher than zero correspond to an activation. For the test set, activations will be an empty list.
activating [bool]: True if the exampe contains tokens that activate the latent, False otherwise. For the test set, the label will remain hidden.
Here is an example:
For Subtask 2, participants must provide a prediction for each of the examples, for each latent.
Note that the test set for Subtask 2 will be formatted slightly differently. Specifically, we will provide participants with pairs of < explanation, example >; the system will have to classify whether the latent is activating or non-activating for the example. This means that the test set will include multiple data points for each explanation/latent.
More information on the test set will be available upon release of the data.