Lexical complexity plays a crucial role in reading comprehension. Predicting lexical complexity accurately can enable a system to better guide a user to an appropriate text, or tailor a text to their needs. NLP systems have been developed to simplify texts for second language learners, native speakers with low literacy levels, and people with reading disabilities.

The Lexical Complexity Prediction (LCP) shared task hosted at SemEval 2021 (Task 1) provides participants with a new annotated English dataset with a Likert scale annotation described below.

The competition is finished. The report describing the results and findings is available here. Thank you for participating!

Task and Data

We provide participants with an augmented version of CompLex, a multi-domain English dataset with sentences annotated using a 5-point Likert scale (1-5) described in Shardlow et al. (2020). The task is to predict the complexity value of words in context.

LCP 2021 will be divided into two sub-tasks:

  • Sub-task 1: predicting the complexity score of single words;

  • Sub-task 2: predicting the complexity score of multi-word expressions.

Teams who participate in both tracks will also be evaluated with respect to the overall performance for sub-task 1 and sub-task 2.


The complexity scores (1-5) in LCP 2021 correspond to the following:

  1. Very Easy: Very familiar words.

  2. Easy: An annotator was aware of the meaning.

  3. Neutral: Neither difficult nor easy.

  4. Difficult: Words for which an annotator was unclear of the meaning, but may have been able to infer the meaning from the sentence.

  5. Very Difficult: Words that an annotator had never seen before, or were very unclear.

Task Organizers

  • Matthew Shardlow (Manchester Metropolitan University - UK)

  • Richard Evans (University of Wolverhampton - UK)

  • Gustavo Henrique Paetzold (Universidade Tecnológica Federal do Paraná - Brazil)

  • Marcos Zampieri (Rochester Institute of Technology - USA)


