Chinese Grammatical Error Diagnosis

Task Description 

The goal of this shared task is developing the computer-assisted tools to diagnose several kinds of grammatical errors, i.e.redundant word, missing word, word disorder, and word selection. The input sentence contains one of defined error types. The developed tool should indicate which kind of error type is embedded in the given sentence and its occurred positions. If the input contains no grammatical errors, the tool should return: sid, correct. The output format should be a quadruple of sid, start_off, end_off, error_type , if the input contains a grammatical error.  In this output format, sid means the unique sentence identifier, start_off and end_off represent the positions of starting and ending character where a grammatical error occurs, in which each character or punctuation occupies 1 for counting positions, and error_type should be one of defined errors. Examples are shown as follows: 

    Example 1
        Input: (sid=B2-0880)    他是我以前的室友
B2-0880, 4, 4, Redundant
        (Notes: "的" is a redundant character in this input sentence) 
    Example 2
        Input: (sid=
A2-0017)    那電影是機器人的故事
        Output:  A2-0017, 2, 2, Missing
        (Notes: There is a missing word between "那" and "電影") 
    Example 3
        Input: (sid=A2-0017)    電影是機器人的故事
        Output:  A2-0017, correct
    Example 4  
        Input: (sid=B1-1193)    吳先生是修理腳踏車的拿手
        Output:  B1-1193, 11, 12, Selection
       (Notes: "拿手" is wrong word. One of correct words should be "好手")  
    Example 5
        Input: (sid=B2-2292)    所以我不會讓失望她
        Output:  B2-2292, 7, 9, Disorder
       (Notes: "失望她" is a word ordering error. The correct order should be "她失望") 

Evaluation Metrics

The criteria for judging correctness are: (1) Detection level: binary classification of a given sentence, i.e., correct or incorrect should be completely identical with the gold standard. All error types will be regarded as incorrect. (2) Identification level: this level could be considered as a multi-class categorization problem. In addition to correct instances, all error types should be clearly identified, i.e., Redundant, Missing, Disorder, and Selection. (3) Position level: besides identifying the error types, this level also judges the positions of erroneous range. That is, the system results should be perfectly identical with the quadruples of gold standard. The following metrics are measured in both levels with the help of the confusion matrix.

                                                                           System Results
                                                           Positive                             Negative
            Gold           Positive            TP (True Positive)            FN (False Negative)
          Standard    Negative            F
P (False Positive)            TN (True Negative)  

  • False Positive Rate = FP / (FP+TN)
  • Accuracy = (TP+TN) / (TP+TN+FP+FN)
  • Precision = TP / (TP+FP)
  • Recall = TP / (TP+FN)
  • F1-Score = 2*Precision*Recall / (Precision+Recall)

Data Sets

We provide passages selected from the NTNU learner corpus for training purpose. The data will be released in SGML format shown as follows. In addition, at least 1000 testing passages selected to cover different error types will be used for testing. The policy of this shared task is an open test. Participants can adopt any linguistic and computational resources to do error diagnosis. For example, NLP-TEA-1 CFL Datasets are publicly available at

    <SENTENCE id="B2-0880">他是我的以前的室友</SENTENCE> 
    <MISTAKE start_off="4" end_off="4">

    <SENTENCE id="A2-0017">那電影是機器人的故事</SENTENCE>
    <MISTAKE start_off="2" end_off="2">

    <SENTENCE id="B1-1193">吳先生是修理腳踏車的拿手</SENTENCE>
    <MISTAKE start_off="11" end_off="12">

    <SENTENCE id="B2-2292">所以我不會讓失望她</SENTENCE>
    <MISTAKE start_off="7" end_off="9">

Technical Report

Each participant must submit a technical report to describe developed method and its testing results. Please follow the ACL-2015 template to prepare the report. Non-conforming submissions would not be considered for review. Accepted reports that conform to the specified length and formatting requirements would be included in the NLP-TEA 2015 workshop proceeding. At least one author of each accepted report would be required to register for presenting the developed system. This is the most valuable part of participation, as authors will be able to engage attendees in extended conversations about their work.

Important Dates

  • Registration Open: January 5, 2015
  • Release of Training Data: March 2, 2015
  • Registration Close: April 1, 2015
  • Dry Run (Format Validation) Data Released: May 1, 2015
  • Release of Test Data: May 11, 2015 (11:59pm Pacific Time)
  • Testing Results Submission Due: May 13, 2015 (11:59pm Pacific Time)
  • Release of Evaluation Results: May 15, 2015
  • Technical Report Submission Due: June 7, 2015
  • Report Reviews Returned: June 14, 2015
  • Camera-Ready Due: June 21, 2015
  • Workshop Date: July 31, 2015

  1. Lung-Hao Lee, Liang-Chih Yu, Kuei-Ching Lee, Yuen-Hsien Tseng, Li-Ping Chang, and Hsin-Hsi Chen (2014). A Sentence Judgment System for Grammatical Error Detection. Proceedings of the 25th International Conference on Computational Linguistics (COLING'14), 67-70.
  2. Chi-Hsin Yu and Hsin-Hsi Chen (2012). Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language. Proceedings of the 24th International Conference on Computational Linguistics (COLING'12), 3003-3017. 
  3. Ru-Yng Chang, Chung-Hsien Wu, and Philips Kokoh Prasetyo (2012). Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism. ACM Transactions on Asian Language Information Processing, 11(1), article 3, March 2012. 
  4. Chung-Hsien Wu, Chao-Hong Liu, Matthew Harris, and Liang-Chih Yu (2010). Sentence Correction Incorporating Relative Position and Parse Template Language Models. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1170-1181.