Shared Task for Chinese Spelling Check

Registration

Contact person: Dingmin Wang (dmwang@se.cuhk.edu.hk)

Please go to this link: http://www1.se.cuhk.edu.hk/~kf_prj/nlptea2017/ , and offer necessary information for registration.

Important Dates

    • Registration open: May 19, 2017
    • Release of training data: May 19, 2017
    • Registration close: August 20, 2017
    • Release of testing data: August 21, 2017
    • Testing results submission due: August 23, 2017
    • Release of evaluation results: August 25, 2017
    • Technical report submission due: September 12, 2017
    • Report reviews returned: September 30, 2017
    • Camera-ready due: October 10, 2017

Task Description

The goal of this task is to develop a computer-assisted system to automatically diagnose typing errors in Traditional Chinese sentences written by native Hong Kong primary students. There are two kinds of errors: (1) typos and (2) Cantonese usages. Given a sentence, the system should: (1) identify where are the errors, (2) indicate the kind of error for each identified error, and (3) offer correction suggestion for each identified error. Note that a sentence may have no error, multiple errors and multiple types of errors.

    • Background

Given a sentence, there may be no error or more than one errors. Two types of errors are defined: (1) typo and (2) Cantonese usage. Here are some examples:

  1. No error: 我很喜歡吃媽媽做的凉瓜炒蛋飯。
  2. Typo only: 我很喜歡吃媽媽做的瓜炒蛋飯。
  3. Cantonese usage only: 我很鍾意吃媽媽做的凉瓜炒蛋飯。
  4. Typo and Cantonese usage: 我很鍾意吃媽媽做的瓜炒蛋飯。
  5. Multiple typos and multiple Cantonese usages: 我很鍾意食媽媽做的瓜炒飯。

For Cantonese usage, in general there are three cases:

  1. Characters are for Cantonese usage only and should not appear in a formal written Chinese sentence. For example, “比你高”, the character “佢” is a Cantonese character which means “他”.
  2. All characters are properly written (i.e. all are formal written Chinese characters) but some phrases are Cantonese phrases. For example: “昨天撞返一個很久沒有見面的小學同學”, the phrase “撞返” is an oral Cantonese phase, which means ”碰見”.
  3. All characters are properly written and no Cantonese phrase, but the orderings of some characters are incorrect making the sentence become a Cantonese sentence. For example, “我走先然後去打球”, should be “我先走然後去打球”.

Please note that it is possible to have any mixture of the above cases. For example, consider the sentence “大家討論這件事”, in this context the character “緊” means “正在” (i.e. it triggers Case 2). Yet simply replacing “緊” by “正在” does not make sense (i.e. it triggers Case 3). The correction should be: “大家正在討論這件事”.

    • Dataset Description

The dataset is sponsored by the Hong Kong Applied Science and Technology Research Institute (ASTRI). ASTRI was founded by the Government of the Hong Kong Special Administrative Region in 2000.

There are two datasets: training dataset and testing dataset. All sentences show the common usage by Hong Kong primary school students, and contains Traditional Chinese characters and punctuations. They are encoded using UTF-8.

The training dataset contains 1000 sentences and their corresponding golden standard corrections. Note that shared task participants are allowed to use other publicly available data for system development. Using other dataset should be specified in the final system report.

The testing dataset contains 1000 sentences only but without any golden standard correction. It will be released on 2017-08-21 09:00 GMT+0800. Participants should download the dataset and upload their results on or before 2017-08-23 09:00 GMT+0800. Any submission after this deadline will not be accepted. Participant results and the testing dataset golden standard corrections will be announced in early October.

    • Dataset Annotation

For the training data, there are four files: “training-sentences.json”, “training-corrections.json”, “cantonese-mapping.json”. “training-sentences.json” contains all the sentences for training and “training-corrections.json” contains the golden standard corrections. Both files are in JSON format. We explain the file contents using examples below:

training-sentences.json:

[

{

"id":"ASTRI-01",

"sentence":"我很喜歡吃媽媽做的凉瓜炒蛋飯。"

},

{

"id":"ASTRI-02",

"sentence":"我很喜歡吃媽媽做的梁瓜炒蛋飯。"

},

{

"id":"ASTRI-03",

"sentence":"我很鍾意吃媽媽做的凉瓜炒蛋飯。"

},

{

"id":"ASTRI-04",

"sentence":"我很鍾意食媽媽做的梁瓜炒旦飯。"

}

]

training-corrections.json:

[

{

"id":"ASTRI-01",

"typo":null,

"cantonese":null

},

{

"id":"ASTRI-02",

"typo":[

{"position":10, "correction":"凉"

],

"cantonese":null

},

{

"id":"ASTRI-03",

"typo":null,

"cantonese":[

{"position":3, "length":2, "correction":"喜歡", "reorder":null}

]

},

{

"id":"ASTRI-04",

"typo":[

{"position":10,"correction":"凉"},

{"position":13,"correction":"蛋"}

],

"cantonese":[

{"position":3, "length":2, "correction":"喜歡" "reorder":null},

{"position":5, "length":1, "correction":"吃", "reorder":null}

]

}

]

The structure of the above two files should be self-explained. Note that according to Section 1 – Background, there are multiple types of Cantonese usage. This is the reason why “reorder” is necessary for Cantonese error in a sentence. Specifically, given the following sentences:

[

{

"id":"ASTRI-05",

"sentence":"我走先然後去打球。"

},

{

"id":"ASTRI-06",

"sentence":"大家討論緊這件事 。"

}

]

Then, the corresponding golden standard corrections:

[

{

"id":"ASTRI-05",

"typo":null,

"cantonese":[

{"position":3, "length":1, "correction":null, "reorder":2}

]

},

{

"id":"ASTRI-06",

"typo":null,

"cantonese":[

{"position":5, "length":1, "correction":"正在", "reorder":3}

]

}

]

“cantonese-mapping.json” shows all the Cantonese to formal written Chinese mappings. All Cantonese in training dataset and testing dataset will be appeared in this file. Please note that a Cantonese phase may have more than one possible mapping (depends on the sentence context) and different combination of the words in a phrase may yield a completely different result. Example:

[

{"cantonese":"唔", "chinese":["不"]},

{"cantonese":"唔使", "chinese":["不用"]},

{"cantonese":"唔該", "chinese":["請","謝謝"]},

{"cantonese":"邊度", "chinese":["哪裏"]},

{"cantonese":"邊處", "chinese":["哪裏"]}

]

Evaluation

We observed that all modern word processing software spell check will give multiple correction suggestions to the user. This is reasonable and can provide maximum flexibility to the user. We want to include such element in our evaluation as well. Furthermore, for Cantonese phrases, they sometimes have more than one possible mapping to the corresponding formal written Chinese. Hence, we allow participants to give at most five suggestions to a detected error. Specifically, participants have to submit the correction of a sentence in the following format:

{

"id":"xxxxxx",

"typo":[

{"position":N, "correction":["A","B","C"]}

],

"cantonese":[

{"position":N, "length":M, "correction":["A","B"], "reorder":K}

]

}

For example, given the following sentence:

{

"id":"ASTRI-07",

"sentence":"尋日老師按排我做新年聯歡會的主持人"

}

And the following golden standard correction:

{

"id":"ASTRI-07",

"typo":[

{"position":3, "correction":"安"}

],

"cantonese":[

{"position":1, "length":2, "correction":"昨天", reorder:null}

]

}

The participant may submit the following result (note that the corrections are arrays):

{

"id":"ASTRI-07",

"typo":[

{"position":9, "correction":["編","安"]}

],

"cantonese":[

{"position":1, "length":2, "correction":["昨日","昨天"], reorder:null}

]

}

Based on the above discussion, our criteria for judging correctness are:

    • Detection Performance:

Given a sentence, the system should correctly detect both the positions of errors and the types of errors. Mathematically,

      • Precision = TP / (TP + FP)
      • Recall = TP / (TP + FN)
      • F1-Detection = (2 x Precision x Recall) / (Precision + Recall) ​

We will rank all participants results based on F1-Detection score.

    • Correction Performance:

For each of the detected error, the system should correctly suggest the correction. Since we allow multiple correction suggestions for a given error, we say that the correction suggestion is correct if and only if the golden standard suggestion is within the top five suggestions from the participant.

      • Precision = TP / (TP + FP)
      • Recall = TP / (TP + FN)
      • F1-Correction = (2 x Precision x Recall) / (Precision + Recall) ​

We will rank all participants results based on F1-Correciton score.

    • Overall System Performance:

We will rank the overall performance of the system based on F1-Detection and F1-Correction as follow:

      • F1-Overall= (2 x F1-Detection x F1-Correction) / (F1-Detection + F1-Correction) ​