CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations
Discovering whether words are semantically related – and, if so, which kind of relation holds between them – is an important task in Natural Language Processing (NLP) with a wide range of applications, such as Automatic Thesaurus Creation, Ontology Learning, Paraphrase Generation, etc. Semantic relations also play a central role in human lexical retrieval and may thus shed light on the organization of the mental lexicon. Corpus-based approaches to semantic relation identification promise an efficient and scalable solution to the NLP task. At the same time, they may provide a cognitively plausible model for human acquisition of semantic relations.
As part of the 5th CogALex workshop at COLING 2016, we propose a shared task on the corpus-based identification of semantic relations in the form of a “friendly competition”. Its aim is not to find the team with the best system, but to test different distributional models and other corpus-based approaches on a hard semantic task, and thus gain a better understanding of their respective strengths and weaknesses. For this reason, both training and test data will be made available. Participants are expected to submit a short paper (4 pages) describing their approach and evaluation results (using the official scoring scripts), together with the output produced by their system on the test data.
The task is split into two subtasks, which should both be tackled by participating systems.
Subtask 1: For each word pair (e.g. dog–fruit), decide whether the terms are semantically related (TRUE) or not (FALSE). Given a TAB-separated input file with word pairs, participating systems must add a third column specifying whether the two words are semantically related or not. The output should be a TAB-separated file like the example below, but without a header row (a gold standard file with correct answers is provided in the same format). Note that the word pairs must appear exactly in the same order as in the input file. This subtask is evaluated in terms of precision, recall and F1-score for the identification of related word pairs.
Subtask 2: For each word pair (e.g. cat–animal), decide which of the following semantic relations (if any) holds between the two words.
Synonymy (SYN): W2 can be used with the same meaning as W1
Antonymy (ANT): W2 can be used as the opposite of W1
Hypernymy (HYPER): W1 is a kind of W2
Meronymy (PART_OF): W1 is a part of W2
RANDOM (if there is no semantic relation between W1 and W2)
The input file is the same as for subtask 1. Participant systems are expected to return a TAB-separated file like the example below (but without header row), where each word pair is annotated with one relation label from the list above (and a gold standard file with correct answers is provided in the same format). Word pairs must appear exactly in the same order as in the input file. This subtask is evaluated in terms of precision, recall and F1-score for each of the four semantic relations. The overall score is the weighted average of the four F1-scores.
Rules for the Shared Task
Participating systems must be corpus-based and should not make use of existing knowledge bases or semantic networks. In particular, systems must not make use of WordNet (Fellbaum 1998) or ConceptNet (Liu & Singh 2004), because the training and test data are derived from EVALution 1.0 (Santus et al. 2015), which is in turn based on WordNet and ConceptNet.
There are no restrictions on the corpus data used.
Participants may use the training data for supervised machine learning or may submit unsupervised systems that have been developed or tuned on the training data. No additional training data should be used. For example, it is okay to use purely handwritten knowledge patterns for relation mining or to learn knowledge patterns from the CogALex-V training data; it is not okay to bootstrap knowledge patterns from a different set of seed terms.
Systems must be developed and tuned purely on the training data provided. Only a single final system for each participant may be evaluated on the test data and reported as an official score in the submitted paper. Participants are encouraged to carry out additional post-hoc experiments on the test data, which may also be included in the paper.
Participants must submit a complete paper with evaluation scores computed by the official scoring scripts, as well as the output of their final system on the test data. We will check that the system output is consistent with the scores given in the paper.
We intend to make the submitted outputs of all participating systems available on the shared task homepage in order to encourage meta-analysis and a discussion on the difficulty of individual word pairs.
We encourage the participants to make use of the metadata provided with EVALution for the performance analysis of the system. However, these data should not be used for training the systems. The metadata contain useful information such as frequency, possible and most frequent POS, possible and most common capitalization types & inflections, semantic domain, etc. More details can be found in the README.txt file in the data package.
Submission procedure & formatting instructions
System descriptions may have a maximum length of 4 pages + 1 page for references and must comply with the COLING 2016 style. In contrast to the official style, we do not require anonymous submission.
The paper title must have the form “CogALex-V Shared Task: <System ID>[ – <Subtitle>]” where <System ID> is a short name chosen by each team for their submitted system and the <Subtitle> is optional.
The submitted paper must be accompanied by the system output for the test data (in the format required by the official evaluation script).
The paper must report evaluation scores calculated with the official evaluation script (which is included in the test set download) after training and development purely on the training data (see rules above).
Teams should also join the Google group for important updates and discussion concerning the shared task: https://groups.google.com/d/forum/cogalexv; please submit an expression of interest by e-mail (to esantus@gmail.com) if you haven't done so yet.
Participants may be asked to review the papers submitted by other teams.
Please submit your papers + system output through the main workshop submission page, following the instructions above and setting the paper type to Shared Task (4 pages): https://www.softconf.com/coling2016/CogALex-V/
Submission deadline: Sunday, 16 Oct 2016, 23:59 GMT
Dataset and Evaluation
We provide a dataset extracted from EVALution 1.0 (Santus et al. 2015), which was developed from WordNet (Fellbaum 1998) and ConceptNet (Liu and Singh 2004), and which was further filtered by native speakers in a CrowdFlower task. Our dataset is split into a training set (released on the 8th of September 2016) and test set (released on the 26th of September 2016). Official evaluation scripts (in terms of precision, recall and F1-score) will be released together with the test data.
EVALution is a heterogeneus and non-balanced dataset which replicates real-world difficulties. Words in the dataset are not POS-tagged and fall into different frequency ranges. Moreover, words may be used in different senses (even with different POS) within the data set, so that they might hold different relations depending on the respective sense. In the CrowdFlower task, relations were judged according to the paraphrases (e.g. “X is a kind of Y”) shown in the subtask description
Training data: CogALexV_train_v1.zip (3054 word pairs for 318 target words)
Test data and evaluation script: CogALexV_test_v1.zip(4260 word pairs for 429 target words)
Please cite the task description paper
Santus, Enrico; Gladkova, Anna; Evert, Stefan; Lenci, Alessandro (2016). The CogALex-V shared task on the corpus-based identification of semantic relations. In Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V), pages 69–79, Osaka, Japan. [PDF]
if you evaluate a system on this dataset.
Important Dates
8 Sep 2016: Task description and training data released
until 26 Sep 2016: Expression of interest (e-mail to esantus@gmail.com)
27 Sep 2016: Test data and evaluation scripts released
16 Oct 2016: Submission of system description papers (4+1 pages) and system output
25 Oct 2016: Reviews returned
2 Nov 2016: Camera-ready deadline (strict!)
12 Dec 2016: Workshop Date
Organizers
Stefan Evert – stefan.evert@fau.de
Alessandro Lenci – alessandro.lenci@unipi.it
Enrico Santus – esantus@gmail.com
References
Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
Hugo Liu and Push Singh. 2004. ConceptNet — A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal, Vol. 22 (4), October 2004
Enrico Santus, Frances Yung, Alessandro Lenci and Chu-Ren Huang. 2015. EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models. In Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, Association for Computational Linguistics. Beijing, China.