ALTEC Learner Corpus
Development of a corpus of authentic, written and spoken language production by learners of English at different proficiency levels
Purpose
Using audio and video recording technologies, this project will collect authentic, spoken and written language, and physical gestures by English language learners at a variety of proficiency levels. The result will be a Learner Corpus - The Applied Linguistics & TESOL Corpus (ALTEC) - that can be used to analyze language by English language learners in an ESL instructional environment. The Community Language Program (CLP) (https://www.tc.columbia.edu/communitylanguage) is operated by the Applied Linguistics program at Teachers College. English language courses are taught at seven different proficiency levels. Video cameras and microphones will record regular adult English language instruction at all proficiency levels. In addition, written essays completed during the placement exam administration will be included in the corpus together with additional placement test scores. These data are part of the intake process for the English language courses and are completed by all students.
Phase 1: Data Collection
March & April 2023 - ongoing
Video and Audio recordings of classroom interaction at Elementary, Intermediate, and Advanced levels. See progress in the chart below (updated 3-15-2024)
Phase 2: Transcription
December 2023 - ongoing
The transcription work involves leveraging advanced artificial intelligence tools like ELAN for linguistic annotation and Whisper for accurate voice-to-text conversion, among others, to transcribe educational videos. See progress in the chart below. (updated 3-15-2024)
Phase 3: Analysis
Research Assistants