Motivation
Many lexical simplification systems have been proposed up to this date (Glavaš and Štajner, 2015; Paetzold and Specia, 2016a). As it has been shown by Paetzold and Specia, 2016b, systems that discern between complex and simple words before simplification tend to be more reliable in practice. Therefore, the automatic identification of words that are difficult for a given target population is an important step for building better performing lexical simplification systems. This process is known as complex word identification (CWI) (Shardlow, 2013).
The first shared task on CWI was organized at the SemEval 2016 (Paetzold and Specia, 2016c). It featured 21 teams that competed submitting 42 systems trained to predict whether words in a given context were complex or non-complex for a non-native English speaker. Following the success of the first CWI shared task at SemEval 2016 we organize a second edition of the challenge at the BEA workshop 2018.
The goal of this year’s CWI shared task is to predict which words can be difficult for a non-native speaker, based on annotations collected from a mixture of native and non-native speakers.
Tracks
Tasks
In the binary classification task, the participants are asked to label the given target word in particular context as complex or simple.
In the probabilistic classification task, the participants are asked to give a probability of the given target word in particular context being complex.
Participants can submit up to two systems for each track and for each task.
Registration
To participate please complete the registration form.
References
Glavaš G. and Štajner S. 2015. Simplifying Lexical Simplification: Do We Need Simplified Corpora? In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP), pp. 63-68.
Paetzold, G. and Specia, L. 2016a. Unsupervised lexical simplification for non-native speakers Proceedings of the 30th AAAI.
Paetzold, G. and Specia, L. 2016b. PLUMBErr: An Automatic Error Identification Framework for Lexical Simplification. Proceedings of the first international workshop on Quality Assessment for Text Simplification (QATS), pp. 1-9.
Paetzold, G. and Specia, L. 2016c. SemEval 2016 Task 11: Complex Word Identification Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 560-569.
Shardlow, M. 2013. A Comparison of Techniques to Automatically Identify Complex Words. Proceedings of the Student Research Workshop at the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 103-109.