Development of a corpus of authentic, written and spoken language production by learners of English at different proficiency levels
Using audio and video recording technologies, this project will collect authentic, spoken and written language, and physical gestures by English language learners at a variety of proficiency levels. The result will be a Learner Corpus - The Applied Linguistics & TESOL Corpus (ALTEC) - that can be used to analyze language by English language learners in an ESL instructional environment. The Community Language Program (CLP) (https://www.tc.columbia.edu/communitylanguage) is operated by the Applied Linguistics program at Teachers College. English language courses are taught at seven different proficiency levels. Video cameras and microphones will record regular adult English language instruction at all proficiency levels. In addition, written essays completed during the placement exam administration will be included in the corpus together with additional placement test scores. These data are part of the intake process for the English language courses and are completed by all students.
Phase 1: Data Collection
March & April 2023 - ongoing
Video and Audio recordings of classroom interaction at Elementary, Intermediate, and Advanced levels. See progress in the chart below (updated 4-26-2026)
Phase 2: Transcription
December 2023 - ongoing
The transcription work involves leveraging annotation tools like ELAN for linguistic annotation and advanced artificial intelligence tools like Whisper for accurate voice-to-text conversion, among others, to transcribe educational videos. To date, four classroom recordings have been fully transcribed, and we are now consolidating a standardized transcription style guide based on lessons learned. See progress in the chart below for our data collection.(updated 4-26-2026)
Phase 3: Analysis
February 2026 - ongoing
We have launched a pilot classroom interaction analysis on one of the transcribed sessions to inform Phase 3. (Updated 4-26-2026)
Current Research Assistants
Betul Demirezen
Multimodal Conversation Analysis
Graduate Student
Soo Joo
Second Language Assessment
Graduate Student
Davynn Xu
Agentic AI in Language Education
Graduate Student
Former Research Assistants