Mexican Learner Corpus 

A longitudinal corpus of Mexican English learners 

The Mexican Learner Corpus (MexLeC) is a longitudinal collection of oral texts produced by learners of English from Mexican universities. This is a three years postdoctoral project (2020-2025) currently funded by Consejo Nacional de Humanidades, Ciencia y Tecnología with institutional support of the Facultad de Lenguas  de la Universidad Autónoma del Estado de México. This corpus aims to provide a basis for research on second language acquisition in Mexican learners and for future applications in the design of strategies and materials for English language teaching.


MexLeC participants  are distributed into three sections wich are the tree institutions currently participating in the corpus. The first section includes two cohorts from Universidad Autónoma del Estado de México (labeled 001A and 001B); the second, two  cohorts from Universidad Autónoma del Estado de Hidalgo (labeled 002A and 002B); and finally a third section, contains one cohort from Universidad Autónoma del Estado de Querétaro (labelled 003A).  A cohort represents a group of students in the same English class. Since many participants abandon the project before the four years period required, a different cohort is interviewed to collect the number of interviews required per university (200).

All the participants are University students of bachelor in modern languages (translation and/or teaching), their data about gender, age, mother tongue and second language learning background and L2 proficiency level are contained at the data section.

To elicit data it has been designed a 15-20-minutes oral interview divided into four tasks. The interaction in this interview  is one-on-one (interviewer and participant). All the interaction is video-recorded via video call app. (Recorded videos are available for reasearch under request). Tasks in the interview are:

The transcription guidelines  for interviews have been adapted from The Trinity Lancaster Corpus (Gablasova, Brezina y McEnery, 2019) and the LINDSEI Corpus (Center for English Corpus Linguistics, 2021). Information on the profile of the participants such as learning experiences, mother tongue, and their contact with other foreign languages have been collected to help in the data interpretation of the corpus. The interviewers are Mexican Spanish native speakers holding the levels of proficiency B2 and C1. 

Transcription guidelines

Transcription Guidelines MexLeC.docx

Official Video presentation at the XVII Foro de Estudios en Lenguas. Universidad Autónoma de Quintana Roo.

II Simposium Lingüística y docencia: Lingüística de Corpus y Aprendizaje de Lenguas. Facultad de Lenguas Universidad Autónoma del Estado de México (event organised by MexLeC team).

XI Coloquio de Lingüística Computacional. Grupo de Ingeniería Lingüística, Universidad Nacional Autónoma de México.