Grammatical Error Diagnosis for Learning Chinese as a Foreign Language
China’s growing global influence has prompted a surge of interest in learning Mandarin Chinese as a foreign language (CFL), and this trend is expected to continue. However, whereas many computer-assisted learning tools have been developed for learning English, support for CFL learners is relatively sparse, especially in terms of tools designed to automatically evaluate learners’ responses. The goal of this shared task is developing the computer-assisted tools to detect several kinds of grammatical errors, i.e., redundant word, missing word, word disorder, and word selection. The input sentence contains one of defined error types. The developed tool should indicate which kind of error type is embedded in the given sentence. If the input contains no grammatical errors, the tool should return: sid, correct. The output format should be sid, error_type if the input contains a grammatical error. Examples are shown as follows:
Input: (sid=B2-1447-6) 希望沒有人再被食物中毒
output: B2-1447-6, Redundant
Input: (sid=C1-1876-2) 對社會國家不同的影響
output: C1-1876-2, Missing
Input: (sid=C1-1876-2) 對社會國家有不同的影響
output: C1-1876-2, correct
The criteria for judging correctness are: (1) Detection level: binary classification of a given sentence, i.e., correct or incorrect should be completely identical with the gold standard. All error types will be regarded as incorrect (2) Identification level: this level could be considered as a multi-class categorization problem. In addition to correct instances, all error types should be clearly identified, i.e., Redundant, Missing, Disorder, and Selection. The following metrics are measured in both levels with the help of the confusion matrix.
The policy of the shared task is an open test. Participants can adopt any linguistic and computational resources to do error diagnosis. we provide passages of CFLs’ essays selected from the NTNU learner corpus for training purpose. The data will be released in SGML format shown as follows. In addition, at least 1000 testing passages selected to cover different error types will be used for testing.
Each participant must submit an evaluation report to describe developed method and its testing results. Please follow the ICCE-2014 template to prepare the report. Non-conforming submissions would not be considered for review. Accepted reports that conform to the specified length and formatting requirements would be included in the ICCE-2014 workshop proceedings. At least one author of each accepted report would be required to register for presenting the developed system. This is the most valuable part of participation, as authors will be able to engage attendees in extended conversations about their work. Besides, high quality research papers would be selected to publish in The Scientific World Journal (TSWJ) Special Issue on "Human Language Technologies for Educational Applications". (SCI-indexed, 2012 Impact factor 1.730)