Resources

Multilingual Writing Resources

AI & Multilingual Writing

Learner Corpus Research

COWS-L2H

The Corpus of Written Spanish- L2 and Heritage is a completely open-access collection of essays written by learners of Spanish as a second or heritage language at University of California, Davis. There are currently over 4,000 essays from more than 2,500 students, offering both longitudinal and cross-sectional data. Students have responded to 8 different prompts, and we are continuing to add more. Many of the essays have been manually annotated for certain patterns of learner language use (e.g., grammatical gender agreement). The data is available in text file and csv format. We have also part-of-speech tagged the corpus using the Freeling tagger

Website

Steps for using the corpus

Data Analysis & Coding

I have several scripts in R and Python on Github that can serve as a resource for other researchers.

R for Data Science (Wickham & Grolemund, 2017): an incredible reference for working with data in R

Statistics for Linguists: An Introduction Using R (Winter, 2020): a great handbook for statistical analysis in R

NLTK book (Bird, Klein, & Loper, 2009): a basis for corpus linguistics in python

Corpora in the Classroom

Data-Driven Learning Resources

On this webpage, you can find slides and a handout for a workshop on using corpora to teach Spanish as a second language (ELE). The following are some of the corpora referenced:

Talk Series

Cluster on Language Research (CLR)

As a co-chair of the CLR in 2021-22, I am very excited to share our events with you. We host a bi-weekly talk series and an annual symposium in May. The talk series is streamed on Zoom for anyone who would like to attend virtually. Don't forget to submit an abstract to the symposium!

CLR Website | CLR Twitter