Research Objective
The ¡Vale! Corpus seeks to deepen our understanding of Guarani-Spanish bilingualism by exploring the effect of Paraguayans' migration to Spain on language use, language change, and linguistic identity.
A linguistic corpus is a comprehensive collection of texts, videos, and audio compiled for analysis and study. Corpora have myriad uses across many disciplines, including:
phonetic analysis
cultural and sociolinguistic analysis (language attitudes, etc.)
historical documentation & language preservation
Some examples of existing corpora are:
Dr. Bittar's CEGPA (housed within the California Language Archive)
The Vale Corpus is comprised of 48 interviews conducted with Paraguayan immigrants in Spain between September and November of 2024. Participants ranged in age from 22 to 56 years old. 25 of the participants are located in Barcelona, 22 in Madrid, and 1 in Bilbao. Participants were asked about their experiences as migrants, including their language use, the maintenance of Guarani, and the differences between Paraguayan Spanish and Peninsular Spanish.
There are four main instruments involved in the creation of the Vale Corpus:
Bilingual Language Profile (BLP) questionnaire
Adobe Premiere Pro video editing software
Sonix automatic transcription software
ELAN linguistic annotation software
In the past twenty years, the population of Paraguayan-born people living in Spain has skyrocketed from 1,000 to 127,000. Because the majority of these immigrants are bilingual in Guarani (an indigenous language of Paraguay) and Spanish, exploring this ongoing migration contributes to understanding Guarani-Spanish bilingualism while expanding broader linguistic debates about how immigration affects language change and identity.
The beginnings of our project involved compiling and organizing our data. This meant getting all of the Bilingual Language Profile (BLP) data and scores on a spreadsheet, as well as editing and transcribing interviews. With all of the materials and quantitative data organized, we are now focusing in on specific data points. We have two focus groups so far, each of 4 participants. Within the focus groups, participants vary by age, total time living in Spain, and Spanish-Guarani language dominance (quantified by the BLP score). With these variations in mind, we will be able to analyze how such factors play into linguistic identity, language attitudes, etc.
We will continue to work on the Corpus through this Summer, focusing on data processing and web development. The goal is to edit and transcribe all of the interviews and to build a website to house the corpus. Our website will make the corpus a publicly available resource for other linguists, applied linguists, sociologists, historians, and beyond to study modern Spanish-Guarani bilingualism in the immigrant context. Along the way, we will continue to sample the data in order to elaborate on our research objective.