Main ongoing projects (see full list here) :
(a) The Lancaster-Northern Arizona Corpus of American Spoken English (LANA-CASE)
Collaborators at NAU: Lizzy Hanks, Jesse Egbert, Doug Biber, Randi Reppen
Collaborators at Lancaster University: Tony McEnery, Paul Baker, Vaclav Brezina, Gavin Brookes, Isobelle Clarke, Raffaella Bottini
The goal of this project is to compile a comparable American English counterpart to the widely known Spoken BNC2014 (Love et al., 2017). While there are several spoken corpora that represent specific subsets of the United States population, this corpus will be the first publicly available, large-scale corpus that represents general conversational American English. More details are available on our website and X and other social media: @LANA_corpus.
Hanks, E., McEnery, T., Egbert, J., Larsson, T., Biber, D., Reppen, R., Baker, P., Brezina, V., Brookes, G., Clarke, I., & Bottini, R. (2024). Building a spoken corpus of American English conversation: Challenges and innovations in corpus compilation. Research in Corpus Linguistics, 12(2), 24-44.
(b) On cumulative knowledge building and the importance of linguistics in corpus linguistics
Due to a variety of factors, doing linguistics with a corpus is arguably becoming increasingly challenging. We might think that the fact that we have access to more corpora, more corpus tools, and more statistical methods than ever before would make it easier, but instead, it seems as if we are witnessing a trend away from language and linguistic description toward increased focus on statistical reporting. As linguists, we find this trend troubling. In a series of publications, we explore some ways for us to improve how we approach linguistic research questions with quantitative corpus data. We also talk about how to explicitly build on linguistic findings of prior studies in a cumulative manner.
We have a blog where we write about methodological issues related to corpus linguistics: https://linguisticswithacorpus.wordpress.com/.
Larsson, T., & Biber, D. (2025). Encouraging cumulative knowledge building as normal practice in (learner) corpus research. International Journal of Learner Corpus Research, 11(1), 1-16.
Larsson, T., & Biber, D. (2024). On the perils of linguistically opaque measures and methods: Toward increased transparency and linguistic interpretability. In P. Crosthwaite (Ed.), Corpora for Language Learning: Bridging the Research-Practice Divide (pp. 131-139). Taylor & Francis.
Larsson, Tove, Egbert, Jesse, & Biber, Doug. (2022). On the status of statistical reporting versus linguistic description in corpus linguistics: A ten-year perspective. Corpora, 17(1).
Larsson, Tove, Egbert, Jesse, & Biber, Doug. (2021). Do corpus linguists focus on statistics at the expense of linguistic description? A ten-year perspective. In B. Busse, N. Dumrukcic, & R. Möhlig-Falke (Eds.), Language and linguistics in a complex world: Data, interdisciplinarity, transfer, and the next generation (Extended book of abstracts from ICAME 41). Cologne: USB Cologne.
Egbert, Jesse, Larsson, Tove, & Biber, Doug. (2020). Doing linguistics with a corpus: Methodological considerations for the everyday user. Cambridge Elements in Corpus Linguistics. Cambridge: Cambridge University Press.
(c) Grammatical complexity from a register-functional perspective
Grammatical complexity is defined as the addition of optional structural elements to ‘simple’ phrases and clauses (Biber et al., 2022). We are hoping to learn more about its status as a multidimensional construct by using a confirmatory framework.
Biber, D., & Larsson, T. (in press). Accounting for the entire system of complexity features: Evidence for general oral versus literate grammatical complexity dimensions. Corpus Linguistics and Linguistic Theory.
Biber, D., Gray, B., Larsson, T. & Staples, S. (2025), Grammatical analysis Is required to describe grammatical (and “syntactic”) complexity: A commentary on “Complexity and difficulty in Second Language Acquisition: A theoretical and methodological overview”. Language Learning, 75(2), 575-581. https://doi.org/10.1111/lang.12683
Biber, D., Larsson, T., Hancock, G. R., Reppen, R., Staples, S., & Gray, B. (2025). Comparing theory-based models of grammatical complexity in student writing. International Journal of Learner Corpus Research, 11(1), 145-177.
Larsson, T., Biber, D., & Hancock, G. R. (2024). On the role of cumulative knowledge building and specific hypotheses: The case of grammatical complexity. Corpora, 19(3), 263–284.
Biber, D., Larsson, T., & Hancock, G. R. (2024). The linguistic organization of grammatical text complexity: Comparing the empirical adequacy of theory-based models. Corpus Linguistics and Linguistic Theory, 20(2), 347-373.
Biber, D., Larsson, T., & Hancock, G. R. (2024). Dimensions of text complexity in the spoken and written modes: A comparison of theory-based models. Journal of English Linguistics, 52(1), 65-94.
Larsson, T., Berber Sardinha, T., Gray, B., & Biber, D. (2023). Exploring early L2 writing development through the lens of grammatical complexity. Applied Corpus Linguistics, 3(3), 100077.
(d) Investigations of L2 spoken and written production
Two key findings of corpus linguistic research over the years are (a) that language is highly patterned and (b) that register plays an important part for language production. In a series of projects, my collaborators and I explore different aspects of L2 spoken and written production using corpus linguistic methods to better understand these phenomena.
(i) The impact of self-initiated, Extramural English on L2 writing development
Collaborators: Henrik Kaatari (University of Gävle), Ying Wang (Karlstad University), Pia Sundqvist (University of Oslo), Terry Kim (Northern Arizona University)
While classroom instruction remains an important source of L2 development strategies for students, extramural activities such as gaming and social media have been found to play an important role as well. Using a new corpus of high school student writing, the Swedish Learner English Corpus (SLEC; Kaatari, Wang, & Larsson, 2024), we explore the role of extramural activities for students’ English language development. More information about the project can be found on the project website.
Kim, T., Larsson, T., Kaatari, H., Wang, Y., & Sundqvist, P. (in press). The Effects of Extramural English Reading on Phraseology in L2 Writing: A Key Phrase Frames Approach. System.
Wang, Y., Kaatari, H., Larsson, T., Xiong, H., & Liu, F. (in press). Introducing the Chinese Learner English Corpus (CLEC): A resource for exploring the role of extramural activities and beyond in L2 writing. Studies in Second Language Acquisition.
Kaatari, H., Larsson, T., Wang, Y., Acikara-Eickhoff, S., & Sundqvist, P. (2023). Exploring the effects of target-language extramural activities on students’ written production. Journal of Second Language Writing, 62, 101062.
Kaatari, H., Wang, Y., & Larsson, T. (2024). Introducing the Swedish Learner English Corpus: A corpus that enables investigations of the impact of extramural activities on L2 writing. Corpora, 19(1), 17-30.
(ii) Register in L1 and L2 writing
While the importance of taking register variation into consideration has been stressed in multiple studies on native-speaker production, it is not yet the norm in learner corpus research to take this variable into account. In a series of papers, my collaborators and I investigate (a) the relative importance of register in learner writing vis-à-vis learner-internal factors such as first-language background, and (b) students’ register awareness. Due to its importance as a moderating variable, we stress the importance of taking register into consideration in studies of learner corpus data.
Demir, N. Y., Bartholomew, R., & Larsson, T. (2024). “I’m on retreat and will respond to messages after 7/6”: A register analysis of out-of-office emails. Register Studies, 6(2), 175-199.
Larsson, T. Paquot, M., & Biber, D. (2021). On the importance of register in learner writing: A multi-dimensional approach. In E. Seoane & D. Biber (Eds.), Corpus based approaches to register variation. Amsterdam: Benjamins.
Larsson, T., & Kaatari, H. (2020). Syntactic complexity across registers: Investigating (in)formality in second-language writing. Journal of English for Academic Purposes, 45.
Larsson, T. (2019). Grammatical stance marking in student and expert production: Revisiting the informal-formal dichotomy. Register Studies, 1 (2), 243–268.
Larsson, T., & Kaatari, H. (2019). Extraposition in learner and expert writing: Exploring (in)formality and the impact of register. International Journal of Learner Corpus Research, 5 (1), 33–62.
(iii) Adverb placement in spoken and written L2 production
These projects look at what linguistic and extralinguistic factors impact adverb placement in L1 and L2 spoken and written production.
Larsson, T., Callies, M., Dixon, T., Hasselgård, H., Hober, N., Laso, N. J., Van Vuuren, S., Verdaguer, I., & Paquot, M. (2025). Adverb placement in L1 and L2 spoken production: The effect of linguistic and extralinguistic factors. International Journal of Corpus Linguistics, 30(1), 79-105.
Özkan Miller, V., & Larsson, T. (2024). The effect of linguistic and extralinguistic features on EFL adverb placement: A partial replication study of Larsson et al. (2020). International Journal of Learner Corpus Research, 10(2), 10(2), 338-364.
Hober, N., Dixon, T., & Larsson, T. (2023). Toward increased reliability and transparency in projects with manual linguistic coding. Corpora, 18(2), 245-258.
Larsson, T., Paquot, M., & Plonsky, L. (2020). Inter-rater reliability in learner corpus research: Insights from a collaborative study on adverb placement. International Journal of Learner Corpus Research, 6(2), 237-251.
Larsson, T., Callies, M., Hasselgård, H., Laso, N. J., Van Vuuren, S., Verdaguer, I., & Paquot, M. (2020). Adverb placement in EFL academic writing: Going beyond syntactic transfer. International Journal of Corpus Linguistics, 25(2), 155-184.
(e) Research methods and research ethics
(i) Benefits of structural equation modeling for corpus linguistics research
Despite recent advancements in statistical techniques used in corpus linguistics, there are still questions pertaining to the multivariate nature of language that our current methods cannot accommodate. In an effort to expand our analytic repertoire, this project seeks to introduce Structural Equation Modeling (SEM) and discuss its great potential for corpus linguistic analysis. SEM is a powerful analytical framework that encompasses a large set of statistical techniques (e.g., path analysis, confirmatory factor analysis). Compared to traditional approaches, structural equation models are highly flexible in that they not only allow for investigation of a variety of different variables and relations, but also enable examination of relations among unobserved variables (latent variables). Despite these and many other strengths, however, SEM remains largely unknown in corpus linguistics.
Larsson, T., & Hancock, G. R. (in press). What if we want test for similarities between groups, rather than differences? Equivalence testing techniques for corpus linguistics. Corpora, 21(1).
Larsson, T., & Hancock, G. R. (2024). Exploring potential unknown subgroups in your data: An introduction to finite mixture models for applied linguistics. Research Methods in Applied Linguistics, 3(2), 100117.
Larsson, T., Plonsky, L., & Hancock, G. R. (2022). On learner characteristics and why we should model them as latent variables. International Journal of Learner Corpus Research, 8(2), 237-260.
Larsson, T., Plonsky, L., & Hancock, G. R. (2021). On the benefits of structural equation modeling for corpus linguists. Corpus Linguistics and Linguistic Theory, 17(3), 683-714.
In addition, techniques from this framework are applied in the following studies:
Biber, D., Larsson, T., Hancock, G. R., Reppen, R., Staples, S., & Gray, B. (2025). Comparing theory-based models of grammatical complexity in student writing. International Journal of Learner Corpus Research.
Biber, D., Larsson, T., & Hancock, G. R. (2024). The linguistic organization of grammatical text complexity: Comparing the empirical adequacy of theory-based models. Corpus Linguistics and Linguistic Theory, 20(2), 347-373.
Biber, D., Larsson, T., & Hancock, G. R. (2024). Dimensions of text complexity in the spoken and written modes: A comparison of theory-based models. Journal of English Linguistics, 52(1), 65-94.
Larsson, T., Biber, D., & Hancock, G. R. (2024). On the role of cumulative knowledge building and specific hypotheses: The case of grammatical complexity. Corpora, 19(3).
Kaatari, H., Larsson, T., Wang, Y., Acikara-Eickhoff, S., & Sundqvist, P. (2023). Exploring the effects of target-language extramural activities on students’ written production. Journal of Second Language Writing, 62, 101062.
(ii) Questionable Research Practices: The (un)ethical handling of data in quantitative humanities research
Collaborators: Luke Plonsky (NAU), Scott Sterling (Indiana State University), Merja Kytö (Uppsala University), Kate Yaw (University of South Florida), Margaret Wood (NAU)
Questionable Research Practices (QRPs) are often viewed as the “murky waters” of research ethics. Steneck (2006: 54) describes QRPs as practices that fall between “ideal behavior” and absolute misconduct such as Fabrication, Falsification and Plagiarism (FFP). Applying a mixed-methods approach, the proposed project investigates QRPs in the context of quantitative humanities research (e.g. linguistics, languages, digital humanities) to explore activities that researchers engage in, whether knowingly or unknowingly, that run counter to standards of rigor and transparency. More specifically, we seek to (a) uncover, define, and develop a taxonomy for the range and severity of QRPs faced by quantitative researchers in the humanities; and use this information to (b) survey researchers’ experiences with different QRPs, and (c) assess the extent to which methodological training for PhD students in the field addresses these QRPs. Based on the results of (a–c), one of the outcomes of the project will be a set of humanities-specific materials for researcher training related to QRPs. More information here: https://sites.google.com/view/qrp-humanities/home
Sterling, S., Kytö, M., Plonsky, L., Yaw, K., & Larsson, T. (Forthcoming, 2025). Sampling through the lens of QRPs. Research Methods in Applied Linguistics, 4(3), 100280.
Plonsky, L., Sterling, S., Yaw, K., Larsson, T., & Kytö, M. (Forthcoming, 2025). Expanding the scope of questionable research practices in applied linguistics. Journal of Second Language Studies.
Sterling, S., Yaw, K., Plonsky, L., Larsson, T., & Kytö, M. (Forthcoming, 2025). Investigating researcher perceptions of Questionable Research Practices. Journal of Second Language Studies.
Wood, M., Sterling, S., Larsson, T., Plonsky, L., Kytö, M., Yaw, K. (2025). Researchers training researchers: Ethics training in quantitative Applied Linguistics. TESOL Quarterly, 59(3), 1077-1833.
Plonsky, L., Larsson, T., Sterling, S., Kytö, M., Yaw, K., & Wood, M. (2024). Developing a taxonomy of ethical decisions in applied linguistics research. In P. I., De Costa, A., Rabie-Ahmed, & C., Cinaglia (Eds.), Ethical issues in applied linguistics scholarship (pp. 10-27). John Benjamins.
Wood, M., Larsson, T., Plonsky, L., Sterling, S., Kytö, M., Yaw, K. (2024). Addressing Questionable Research Practices in Applied Linguistics: A practical guide. Applied Linguistics Press.
Larsson, T., Plonsky, L., Sterling, S., Kytö, M., Yaw, K., & Wood, M. (2023). On the frequency, prevalence, and perceived severity of questionable research practices. Research Methods in Applied Linguistics, 2(3), 100064.
Yaw, K., Andringa, S., Gass, S., Hancock, G., Isbell, D., Kim, J., Kytö, M., Larsson, T., Plonsky, L., Sterling, S., & Wood, M. (2023). Research in progress: Discussions on the past, present, and future of quantitative research ethics in applied linguistics. Language Teaching, 56, 557–561.
Sterling, S., Plonsky, L., Larsson, T., Kytö, M., & Yaw, K. (2023). Introducing the Delphi method for applied linguistics research. Research Methods in Applied Linguistics, 2(1), 100040.
Yaw, K., Plonsky, L., Larsson, T., Sterling, S., & Kytö, M. (2023). Research ethics in applied linguistics. Language Teaching, 56, 478–494.