ALTEC Learner Corpus

Development of a corpus of authentic, written and spoken language production by learners of English at different proficiency levels

Purpose 

Using audio and video recording technologies, this project will collect authentic, spoken and written language, and physical gestures by English language learners at a variety of proficiency levels. The result will be a Learner Corpus - The Applied Linguistics & TESOL Corpus (ALTEC) - that can be used to analyze language by English language learners in an ESL instructional environment. The Community Language Program (CLP) (https://www.tc.columbia.edu/communitylanguage) is operated by the Applied Linguistics program at Teachers College. English language courses are taught at seven different proficiency levels. Video cameras and microphones will record regular adult English language instruction at all proficiency levels. In addition, written essays completed during the placement exam administration will be included in the corpus together with additional placement test scores. These data are part of the intake process for the English language courses and are completed by all students. 

Phase 1: Data Collection

March & April 2023 - ongoing

Video and Audio recordings of classroom interaction at Elementary, Intermediate, and Advanced levels. See progress in the chart below (updated 3-15-2024)

Phase 2: Transcription

December 2023 - ongoing

The transcription work involves leveraging advanced artificial intelligence tools like ELAN for linguistic annotation and Whisper for accurate voice-to-text conversion, among others, to transcribe educational videos. See progress in the chart below. (updated 3-15-2024)

Phase 3: Analysis

Research Assistants

Xin Li

Language Assessment & Gamification 

Graduate Student

Shamini Shetye

Applied Linguistics (Second Language Acquisition and NLP)

Graduate Student

Hao Yu

Second Language Acquisition & Technology

Graduate Student

Xinhui Xu

Assessment Technology & Data-driven Learning

Certificate Program Student

Yilin Zhang

Conversational AI & Virtual Reality

Graduate Student