March - October 2022
PAR-MEX:
Paraphrase Identification in Mexican Spanish
New evaluation period using updated datasets until May 6th, 2022, 00:00 UTC.
Introduction
Text similarity research has been of great interest in Natural Language Processing (NLP) with the growth of technology due to the large number of applications it currently has, such as plagiarism detection or machine translation. Paraphrase is helpful in this task because it consists of expressing an idea in a different way allowing researchers to study the possibilities of language. These variations can be found at any linguistic level, from morphology to pragmatics.
In order to paraphrase effectively, from a word to a complete reformulation, it is not enough to change the structure of a sentence, as it does not cover the full extent of paraphrase. An idea can be expressed differently by changing some words (e.g., looking for synonyms) or reordering its elements. In addition, there is no need to use any of the words contained in the original sentence, so we can paraphrase an idea as long as we keep the main concept. In contrast, we can use the same words in two texts that do not have any kind of relation. Therefore, paraphrasing is not just replacing elements or their structure. Unfortunately, current paraphrase tasks only aim to this last point.
For the first edition of the Paraphrase Identification in Mexican Spanish (PAR-MEX) task, we propose a sentence-level paraphrase identification track, which consists of a traditional paraphrase identification scenario where the aim is to identify if a sentence is a paraphrase of another sentence. The challenge in this scenario consists in distinguishing high and low-level paraphrases from the only lexical overlap of the sentences.