Grammatical Error Diagnosis for Learning Chinese as a Foreign Language

Important Dates
  • Registration open for the shared task: May 5, 2014
  • Release of training data: June 12, 2014 
  • Registration close for the shared task: July 15, 2014
  • Release of dry run data for participants to familiarize themselves with the testing process: July 21, 2014
  • Release of test data for formal evaluation: August 4, 2014 August 13, 2014
  • Return of the output for the test data for evaluation: August 6, 2014 August 15, 2014 
  • Announcement of the evaluation results for all participants: August 8, 2014 August 16, 2014 
  • Report of the developed techniques and their results from the participants: August 18, 2014 August 28, 2014 
  • Feedback to the report from the organizers: September 2, 2014 
  • Final camera-ready report: September 10, 2014 

Task Description

China’s growing global influence has prompted a surge of interest in learning Mandarin Chinese as a foreign language (CFL), and this trend is expected to continue. However, whereas many computer-assisted learning tools have been developed for learning English, support for CFL learners is relatively sparse, especially in terms of tools designed to automatically evaluate learners’ responses. The goal of this shared task is developing the computer-assisted tools to detect several kinds of grammatical errors, i.e., redundant word, missing word, word disorder, and word selection. The input sentence contains one of defined error types. The developed tool should indicate which kind of error type is embedded in the given sentence. If the input contains no grammatical errors, the tool should return: sid, correct. The output format should be sid, error_type if the input contains a grammatical error. Examples are shown as follows: 

Example 1
    Input: (sid=B2-1447-6
)    希望沒有人再食物中毒
    output:  B2-1447-6, Redundant

Example 2
    Input: (sid=
C1-1876-2)    對社會國家不同的影響
    output:  C1-1876-2, Missing

Example 3
    Input: (sid=
C1-1876-2)    對社會國家不同的影響
    output:  C1-1876-2, correct

Example 4
    Input: (sid=A2-0775-2
)    我起床很早
    output:  A2-0775-2, Disorder

Example 5
    Input: (sid=B1-0110-2
)    我會穿著一黃色的襯衫
    output:  B1-0110-2, Selection

Evaluation Metrics

The criteria for judging correctness are: (1) Detection level: binary classification of a given sentence, i.e., correct or incorrect should be completely identical with the gold standard. All error types will be regarded as incorrect (2) Identification level: this level could be considered as a multi-class categorization problem. In addition to correct instances, all error types should be clearly identified, i.e., Redundant, Missing, Disorder, and Selection. The following metrics are measured in both levels with the help of the confusion matrix.

                                                                           System Results
                                                           Positive                             Negative
            Gold           Positive            TP (True Positive)            FN (False Negative)
          Standard    Negative            F
P (False Positive)            TN (True Negative)  

  • False Positive Rate = FP / (FP+TN)
  • Accuracy = (TP+TN) / (TP+TN+FP+FN)
  • Precision = TP / (TP+FP)
  • Recall = TP / (TP+FN)
  • F1-Score = 2*Precision*Recall / (Precision+Recall)

Data Sets

The policy of the shared task is an open test. Participants can adopt any linguistic and computational resources to do error diagnosis. we  provide passages of CFLs’ essays selected from the NTNU learner corpus for training purpose. The data will be released in SGML format shown as follows. In addition, at least 1000 testing passages selected to cover different error types will be used for testing.

    <ESSAY title="寫給即將初次見面的筆友的一封信">
    <SENTENCE id="B1-0112-1">我的計畫是十點早上在古亭捷運站</SENTENCE>
    <SENTENCE id="B1-0112-2">頭會戴著藍色的帽子</SENTENCE>
    <MISTAKE id="B1-0112-1">
    <MISTAKE id="B1-0112-2">

Evaluation Report

Each participant must submit an evaluation report to describe developed method and its testing results. Please follow the ICCE-2014 template to prepare the report. Non-conforming submissions would not be considered for review. Accepted reports that conform to the specified length and formatting requirements would be included in the ICCE-2014 workshop proceedings. At least one author of each accepted report would be required to register for presenting the developed system. This is the most valuable part of participation, as authors will be able to engage attendees in extended conversations about their work. Besides, high quality research papers would be selected to publish in The Scientific World Journal (TSWJ) Special Issue on "Human Language Technologies for Educational Applications". (SCI-indexed, 2012 Impact factor 1.730)

  • Authors of main conference/workshop/WIPP/DSC papers  are required to register in the ICCE.
  • Each registrant may register for up to two papers (regardless of papers submitted to which program component).

  1. Chung-Hsien Wu, Chao-Hong Liu, Matthew Harris, and Liang-Chih Yu (2010). Sentence Correction Incorporating Relative Position and Parse Template Language Models. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1170-1181.
  2. Ru-Yng Chang, Chung-Hsien Wu, and Philips Kokoh Prasetyo (2012). Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism. ACM Transactions on Asian Language Information Processing, 11(1), article 3, March 2012. 
  3. Chi-Hsin Yu and Hsin-Hsi Chen (2012). Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language. Proceedings of the 24th International Conference on Computational Linguistics, 3003-3017.