GENERATIVE ARTIFICIAL INTELLIGENCE TOOLS
Co-authored with Charlene Polio
This chapter discusses how generative artificial intelligence (GenAI) tools, particularly large language models (LLMs) like ChatGPT, are emerging as powerful web-based tools for research purposes, such as data analysis, in applied linguistics research. While much attention has focused on pedagogical applications, we review how GenAI can be leveraged to support various stages of the research processes in empirical studies, such as instrument design, automated coding, text annotation, and qualitative data analysis. We address key concerns around validity and reliability as well as ethical considerations related to transparency, data privacy, and potential bias in AI-generated output. Given that GenAI is in the early stage of research application, we describe its current capacities and limitations based on emerging empirical research and propose promising directions for future studies.
OPTIMIZING AI FOR ASSESSING L2 WRITING ACCURACY: AN EXPLORATION OF TEMPERATURES AND PROMPTS
Co-authored with Charlene Polio and Adam Pfau
This study investigates the impact of temperature and prompt settings on ChatGPT-4 in assessing second language (L2) writing accuracy. Building on Pfau et al. (2023), we used a corpus of 100 essays by L2 writers of English and examined how three temperature settings (0, 0.7, 1) and two prompt types (defined, undefined) influenced ChatGPT-4’s performance in error detection compared to human coding. Results indicated that ChatGPT-4, while generally underestimating error counts compared to human coders, showed a strong positive correlation with human coding across various settings. Notably, prompts with a detailed definition of errors yielded higher correlation coefficients (ρ = 0.826 to 0.859) than those without (ρ = 0.692 to 0.702), suggesting that more detailed prompts enhance ChatGPT-4’s performance. Descriptive statistics showed that with a less-detailed prompt, the error detection ability of ChatGPT-4 was nearly identical across temperature settings, yet with a more detailed prompt, ChatGPT-4’s performance was slightly better at higher temperatures. We discuss the importance of temperature in relation to prompt specificity for reliable L2 writing accuracy assessment and provide suggestions for optimizing AI tools such as ChatGPT-4 for assessing L2 writing accuracy.
Co-authored with Jingyuan Zhuang, Ryan Blair, Amy I. Kim, Fei Li, Rachel Thorson Hernández, Luke Plonsky
Studies in Second Language Learning and Teaching, 2023
The importance of academic journals in second language (L2) research is evident on at least two levels. Journals are, first of all, central to the process of disseminating scientific findings. Journals are also critical on a professional level as most L2 researchers must publish articles to advance their careers. However, not all journals are perceived as equal; some may be considered more prestigious or of higher quality and may, therefore, achieve a greater impact on the field. It is therefore necessary that we understand the identity and quality of L2 research journals, yet very little research (e.g., Egbert, 2007; VanPatten & Williams, 2002) has considered these issues to date. The current study sought to explore L2 journal identity and quality, and the relationship between these constructs. In order to do so, a database was compiled based on three different types of sources: (1) a questionnaire eliciting L2 researchers’ perceptions of the quality and prestige of 27 journals that publish L2 research (N = 327); (2) manual coding of different types of articles (e.g., empirical studies, review papers), data (quantitative, qualitative, mixed), research settings, and authorship patterns (K = 2,024) using the same 27 journals; and (3) bibliometric and submission data such as impact factors, citation counts, and acceptance rates. Descriptive statistics were applied to explore overall quality and prestige ratings as well as publication trends found in each journal. The relationships between those patterns and subjective ratings were also examined. In addition, regression models were built to determine the extent to which perceptions of journal quality and prestige could be explained as a function of journal and article features. We discuss the findings of the study in terms of on-going debates concerning publication practices, study quality, impact factors, journal selection, and the “journal culture” in applied linguistics.
EXPLORING THE POTENTIAL OF CHATGPT IN ASSESSING L2 WRITING ACCURACY FOR RESEARCH PURPOSES
Co-authored with Adam Pfau and Charlene Polio
Research Methods in Applied Linguistics, 2023
This study investigates ChatGPT's potential for measuring linguistic accuracy in second language writing for research purposes. We processed 100 L2 essays across five proficiency levels with ChatGPT-4 and manually coded for precision and recall with regard to ChatGPT's identification of errors. Our findings indicate a strong correlation (ρ = 0.97 using one method and .94 using another method) between ChatGPT's error detection and human coding, although this correlation diminishes with lower proficiency levels. While ChatGPT infrequently misidentifies errors, it often underestimates the total error count. The study also highlights ChatGPT's limitations, such as the issue of consistency, and provides guidelines for future research applications.
Despite the growing interest in interlanguage complexity development in study abroad (SA) research, no clear-cut conclusions can be made as to whether and to what extent learners' interlanguage complexity increases following a sojourn abroad. The current study meta-analyzed the overall effects of study abroad on measured oral and written complexity, as well as the moderator effects (i.e., learner demographics, SA contextual features, and outcome measures) on the variability of interlanguage complexity effect sizes (Cohen's d). A comprehensive search was conducted to obtain studies that have quantitatively documented lexical and syntactic complexity changes during SA through a pre-and-post SA design. A total of 30 independent samples from 28 primary studies involving 602 participants were retrieved and coded for gains and for moderator variables. Results show an overall small effect of study abroad on language complexity development (d = 0.37). In addition, moderator analyses suggest that larger effects are associated with (a) learners at an intermediate proficiency level, (b) learners enrolled in a language study program while SA, (c) programs that implemented a language pledge, or (d) programs with Mandarin Chinese as the target language. More fine-grained and systematic reporting practices are proposed for future research.
RECASTS IN SCMC: REPLICATING AND EXTENDING GURZYNSKI-WEISS ET AL. (2016)
The Routledge handbook of second language research in classroom learning, Routledge, 2019
The present chapter reports on an empirical study that replicates and extends an investigation by Gurzynski-Weiss, Al Khalil, Baralt, and Leow (2016), which addressed the strands of textual enhancement (enhanced vs. unenhanced), learner awareness (low vs. high), and type of linguistic item (lexis vs. morphology vs. syntax) in synchronous computer-mediated communication (SCMC) via the use of recasts and think-alouds. Low-intermediate adult L2 learners of Spanish completed three story retell tasks via iChat with an interlocutor while thinking out loud. Interlocutors immediately provided either an enhanced or unenhanced recast for each target error. The present study employs the same research design and participant population with 40 adult L2 learners of Spanish, while additionally addressing immediate and delayed learning outcomes. The findings indicate that learners tend to be more aware of recasts targeting lexis than grammar. Furthermore, recasts, whether enhanced or unenhanced in SCMC, aid in L2 development, particularly of lexis.