Download
You can download the datasets including the gold standard here.
Description
The English dataset consists of mixture of professionally written news, non-professionally written news (WikiNews), and Wikipedia articles. Spanish and German datasets contain data taken from Spanish and German Wikipedia pages. The French dataset contains data taken from French Wikipedia pages.
Each sentence in the English dataset was annotated by 10 native and 10 non-native speakers. Annotators were provided with the surrounding context of each sentence, i.e. a paragraph, then asked to mark words they think would be difficult to understand for children, non-native speakers, and people with language disabilities.
Each sentence in the German, Spanish and French dataset was annotated by 10 people (a mixture of native and non-native speakers).
Data Format
Training Data
The training data will be provided in the following format:
<ID> Both China and the Philippines flexed their muscles on Wednesday. 31 51 flexed their muscles 10 10 3 2 1 0.25
<ID> Both China and the Philippines flexed their muscles on Wednesday. 31 37 flexed 10 10 2 6 1 0.4
<ID> Both China and the Philippines flexed their muscles on Wednesday. 44 51 muscles 10 10 0 0 0 0.0
Each line represents a sentence with one complex word annotation and relevant information, each separated by a TAB character.
The labels in the binary classification task were assigned in the following manner:
The labels in the probabilistic classification task were assigned as <the number of annotators who marked the word as difficult>/<the total number of annotators>.
Test Data
The test data will be in the following format:
<ID> Both China and the Philippines flexed their muscles on Wednesday. 31 37 flexed 10 10
In the test input format, only the first seven columns of the train format are given.
In the test prediction format, the order of the input file should be kept and each line of the file must be a either a binary label or a probability, depending on whether the team participates in the binary/probabilistic classification task.
In the binary classification task, the participating systems will have a task of predicting the right label (0 or 1) for the test data.
In the probabilistic classification task, the participating systems will have a task of giving a probability of the target word being complex.
Data Instances
The number of instances for each training, development and test set is: