Baselines

These are the Baselines on the Evaluation Phase for the SemEval 2022 Task 2.

Please note that the baselines that we have provided are particularly strong benchmarks using very powerful pre-trained language models (mBERT). This is unlike typical baselines that tend to be weak. The methods used to generate these baselines are developed using results from the paper AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models (Findings of EMNLP 2021).

Subtask A (Zero-shot)

F1 (Macro) Score: 0.6540

Language Breakdown: EN: 0.7070, PT: 0.6803, GL: 0.5065

Method

In the zero-shot setting, we train Multilingual BERT using the context (the sentences preceding and succeeding the one containing the idioms). We do not add the idiom as an additional feature (in the “second input sentence”).

Subtask A (Few-shot)

F1 (Macro) Score: 0.8646

Language Breakdown: EN:0.8862, PT: 0.8637, GL: 0.8162

Method

In the one shot setting, Multilingual BERT is trained on both the zero-shot and one-shot data. In this setting, we exclude the context (the sentences preceding and succeeding the one containing the idioms) and add the idiom as an additional feature in the “second sentence”.

Subtask B (Pre-Train)

Spearman Rank : 0.4810 (Idiom Only: 0.2263 STS:0.8311)

Language Breakdown:

  • EN: 0.5958 (Idiom Only: 0.2488, STS: 0.8300 )

  • PT: 0.5584 (Idiom Only: 0.2761, STS: 0.7745 )

  • GL: 0.1976 (Idiom Only: 0.1976, STS: N/A )

Method

The methodology used for the pre-train only setting involves the introduction of new tokens associated with each MWE into multilingual BERT (mBERT). Note, that we do not actually continue pre-training our models so these embeddings remain random. We then create a sentence transformer model using this mBERT model with MWE tokens added and training it on English and Portuguese STS data. Despite its simplicity, this method has been shown (by the above paper) to be a good way of ensuring that compositionality is “broken”.

Subtask B (Fine-Tune)

Spearman Rank : 0.5951, (Idiom Only: 0.3990, STS: 0.5961)

  • EN: 0.6684 (Idiom Only: 0.4109, STS: 0.6210)

  • PT: 0.6026 (Idiom Only: 0.4090, STS: 0.5523)

  • GL: 0.3842 (Idiom Only: 0.3842, STS: N/A )

Method

The methodology used for the fine-tune setting is very similar to the pre-train setting: We start with the introduction of new tokens associated with each MWE into multilingual BERT (mBERT). We then create a sentence transformer model using this mBERT model with MWE tokens added and training on the fine-tune data. Note that we do not train the sentences transformer model on the STS data (unlike in the pre-train setting).