Home



The 2017 shared task on Native Language Identification (NLI) will take place at the BEA-12 workshop.
 
NLI is the task of identifying the native language (L1) of a writer based solely on a sample of their writing. The task is typically framed as a classification problem where the set of L1s is known a priori. Most work has focused on identifying the native language of writers learning English as a second language. Two previous shared tasks on NLI have been organized in which the task was to identify the native language of non-native speakers of English-based on essays and spoken responses they provided during a standardized assessment of academic English proficiency. The first shared task was based on the essays only and was also held with the BEA workshop in 2013. It was a total success with 29 teams competing, making it one of the largest shared tasks that year. Three years later, Computational Paralinguistics Challenge at Interspeech 2016 hosted a sub-challenge on identifying the native language based solely on the spoken responses.

This year's shared task combines the inputs from the two previous tasks. There will be three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response, not the audio), and NLI using both responses from a test taker. We feel this will make for a more challenging shared task while building on the methods and results from the previous two shared tasks. The training and development data for the shared task will be available in March 2017. There will be two tracks, one open and one closed. In the closed track, you can only use the labeled data we provide to train your system. In the open track, you can use any data you want. We do allow and encourage submissions to both tracks.

Shared Task Report and System Papers

Links to the shared task report and all of the system description papers will be posted here once they are online.

Cite the Shared Task Report:
Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and Yao Qian. 2017. "A Report on the 2017 Native Language Identification Shared Task". In Proceedings of the 12th Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics, Copenhagen, Denmark.
@InProceedings{nli2017,
  author    = {Malmasi, Shervin and Evanini, Keelan and Cahill, Aoife and Tetreault, Joel and Pugh, Robert and Hamill, Christopher and Napolitano, Diane and Qian, Yao},
  title     = {{A Report on the 2017 Native Language Identification Shared Task}},
  booktitle = {Proceedings of the 12th Workshop on Building Educational Applications Using NLP},
  month     = {September},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics}
}

Key Dates

Training data release (phase 1) March 2017
Training data release (phase 2) May 2017
Test set release June 19, 2017
Result notification
June 26, 2017
Draft System Description papers due  July 05, 2017
Camera-ready papers due
July 14, 2017
 

Contact

You can reach the organizers via nlisharedtask@gmail.com 
 
The organizing committee for the shared task is: