Douglas Biber

The Lancaster-Northern Arizona Corpus of American Spoken English (LANA-CASE)

Collaborators at NAU: Doug Biber, Jesse Egbert, Tove Larsson, Randi Reppen, Lizzy Hanks

Collaborators at Lancaster University: Tony McEnery, Paul Baker, Vaclav Brezina, Gavin Brookes, Isobelle Clarke, Raffaella Bottini

The goal of this project is to compile a comparable American English counterpart to the widely known Spoken BNC2014 (Love et al., 2017). While there are several spoken corpora that represent specific subsets of the United States population, this corpus will be the first publicly available, large-scale corpus that represents general conversational American English. More details are available on our website and Twitter, @LANA_corpus.



Writing trajectories of grammatical complexity at the university


Collaborators: Shelley Staples, University of Arizona; Bethany Gray, Iowa State University; Jesse Egbert, NAU


Grammatical complexity has been established as a key indicator of language and writing development (Ortega 2003; Bulté and Housen, 2014; Crossley and McNamara 2014). The present study uses the Register-Functional approach to complexity (Biber et al. 2022) to compare the development of L1 English and L2 English writers across year of study and discipline in the British Academic Written English corpus. It also introduces a novel method of examining developmental trajectories that uses both inferential statistics and descriptive measures to account not only for relationships between the year of study and use of linguistic features, but also for the shape of the trajectories and frequencies of occurrence over time. Statistical analyses reveal significant relationships for most of the complexity features. Means and confidence intervals indicate overall similarities in trends from early undergraduate to graduate-level writing across L1/L2 English writers, but also key differences in the developmental trajectories, including greater use of phrasal complexity features and less use of clausal features by L2 English writers in early levels, as well as distinct use of complexity features across certain disciplines.


Staples, S., B. Gray, D. Biber, and J. Egbert. (2022). Writing trajectories of grammatical complexity at the university: Comparing L1 and L2 English writers in BAWE. Applied Linguistics.



Investigating grammatical complexity in L2 English writing research


Collaborators: Bethany Gray, Iowa State University; Shelley Staples, University of Arizona; Jesse Egbert, NAU


We argue that the (socio)linguistic description of grammatical complexity provides a necessary complement to predictive omnibus measures as an analytical approach for the study of student writing proficiency and development. That is, while omnibus measures can be effective for predicting student performance, we argue that a comprehensive grammatical description is required to fully understand and interpret the linguistic characteristics of written texts produced by students. The logic of our argument is simple: Descriptions of grammatical complexity in student writing must be based on linguistically-interpretable analyses of grammar (including syntactic differences).


We develop this argument from several perspectives, including a survey of the structural/syntactic features relating to the construct of grammatical complexity in English, an overview of corpus-based research showing that these distinctions truly matter for the description of academic writing, and a critical evaluation of the descriptive adequacy of omnibus measures when considered from this linguistic perspective. In summary, although we recognize the utility of omnibus complexity measures for purely predictive purposes (e.g., to assess L2 writing proficiency), we argue that a comprehensive linguistic description of grammatical structures and uses is required to fully understand the characteristics of student texts and the nature of student writing development.


Biber, D., B. Gray, S. Staples, and J. Egbert. (2021). Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes, 46.


Designing and evaluating language corpora

Collaborators: Jesse Egbert, NAU; Bethany Gray, Iowa State University

We define a corpus as a sample of natural texts drawn from a larger population. or target discourse domain. Whereas many other fields have established methods for sampling, most corpus compilers have typically ignored such methods and focused instead on collecting very large convenience samples. Drawing on theory and methods from other disciplines, as well as from extensive empirical research, we introduce methods and best practices for designing, collecting, and evaluating corpora for the extent to which they are situationally and linguistically representative of the target discourse domain.

Egbert, J., Biber, D., & Gray, B. (2022). Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness. Cambridge University Press.


Reconceptualizing register in a continuous situational space

Collaborators: Jesse Egbert, NAU; Daniel Keller, NAU

Corpus-based methods for the quantitative linguistic description of registers are well established. In contrast, situational analyses of registers have been based on qualitative descriptions of categorical situational characteristics. We address this inconsistency by describing the variation among texts and registers in a continuous (quantitative) situational space. We describe ‘registers’ as categorical constructs – culturally-recognized categories of texts – but propose that they should be described in continuous terms. Such descriptions allow quantitative comparisons of registers, as well as analysis of the extent to which a register is well-delimited in terms of its situational characteristics. These ideas were first introduced in Biber & Egbert (2018). In Biber, Egbert & Keller (in press), we describe how the situational characteristics of texts and registers can be analyzed in a continuous multi-dimensional space. And finally, we propose analysis of situational text types – categories that are statistically well-defined in their situational characteristics – as an approach to describing all texts, including texts that do not belong to a culturally-recognized register category. We have now turned our attention to exploring the quantitative relationships between continuous linguistic variables and continuous situational variables using correlations, multiple regression, and canonical correlation analysis.

Biber, D. & Egbert, J. (2018). Register Variation Online. Cambridge: Cambridge University Press.


Biber, D., Egbert, J., & Keller, D. (2020). Reconceptualizing register in a continuous situational space. Corpus Linguistics and Linguistic Theory, 16(3), 581-616.


Exploring variation among web registers at the intersection of continuous linguistic and situational spaces

Collaborator(s):, Jesse Egbert, NAU; Daniel Keller, NAU;

In previous research, we have explored the multi-dimensional patterns of variation among web registers in a continuous space of linguistic variation (see Biber and Egbert 2018) and in a continuous space of situational variation (see Biber, Egbert, Keller, 2020). In this ongoing project, we are bringing those analyses together, exploring how web registers can be described simultaneously with respect to both continuous linguistic and situational parameters. The theoretical contribution of this project is to illustrate how overall descriptions of register variation are more informative when they integrate situational and linguistic analyses, rather than treating the two as sequential steps in the analysis.

Biber, D. & Egbert, J. (2018). Register Variation Online. Cambridge: Cambridge University Press.

Biber, D., Egbert, J., & Keller, D. (2020). Reconceptualizing register in a continuous situational space. Corpus Linguistics and Linguistic Theory, 16(3), 581-616.


Functional units of conversational discourse

Collaborator(s): Jesse Egbert, NAU; Stacey Wizner, NAU; Daniel Keller, NAU; Tony McEnery, Lancaster University; Paul Baker, Lancaster University; Frazer Heritage, Lancaster University; Gill Phillips, Lancaster University; Ed Finegan, University of Southern California

Conversations can typically be segmented into multiple parts or units that are each characterized by an overarching communicative purpose (e.g. telling a story, expressing an opinion, figuring things out). In order to investigate linguistic variation across these functional units, as well as possible interactions with demographic speaker variables, we developed a new method to manually segment transcribed conversations into conversation units and code those units for one or more communicative functions. We have completed the development, piloting, and validation phases and are now coding a large sub-sample of the conversational files in the British National Corpus Spoken 2014 (see Egbert, Wizner, Keller, Biber & Baker, 2021).

Egbert, J., Wizner, S., Keller, D., Biber, D., McEnery, T., & Baker, P. (2021). Identifying and describing functional discourse units in the BNC Spoken 2014. Text & Talk, 41(5-6), 715-737.



Recent Books


Biber, D., B. Gray, S. Staples, and J. Egbert. (2022). The Register-Functional approach to grammatical complexity: Theoretical foundation, descriptive research findings, applications. Routledge.


Seoane, E., and D. Biber (Eds.). (2021). Corpus-based approaches to register variation. Amsterdam: John Benjamins.


Biber, D., S. Johansson, G. Leech, S. Conrad, E. Finegan. (2021). Grammar of spoken and written English. Amsterdam: John Benjamins.


Egbert, J., T. Larsson, and D. Biber. (2020). Doing linguistics with a corpus: Methodological considerations for the everyday user. Cambridge: Cambridge University Press.


Recent Articles


Goulart, L., D. Biber, and R. Reppen. To appear (2022). In this essay, I will…: Examining variation of communicative purpose in student written genres. Journal of English for Academic Purposes.

Egbert, J., and D. Biber. To appear (2023). Key feature analysis—A simple, yet powerful method for comparing text varieties. Corpora 18(1).

Larsson, T., J. Egbert, and D. Biber. (2022). On the status of statistical reporting versus linguistic description in corpus linguistics: A ten-year perspective. Corpora, 17, 137-157.

Fahy, M., J. Egbert, B. Szmrescanyi, and D. Biber. (2022). Comparing logistic regression, multinomial regression, classification trees and random forests applied to ternary variables: Three-way genitive variation in English. In O. Schützler and J. Schlüter (eds.), Data and Methods in Corpus Linguistics Comparative Approaches, pp. 194-223. Cambridge: CUP.

Omidian, T., A. Siyanova-Chanturia, and D. Biber. (2021). A new multidimensional model of writing for research publication: An analysis of disciplinarity, intra-textual variation, and L1 versus LX expert writing. Journal of English for Academic Purposes, 53.

Pan, F., R. Reppen, and D. Biber. (2020). Methodological issues in contrastive lexical bundle research: The influence of corpus design on bundle identification. International Journal of Corpus Linguistics, 25, 214–228.

Egbert, J., and D. Biber. (2020). It’s just words folks. It’s just words: Donald Trump’s distinctive linguistic style. In M. Eitelmann and U. Schneider (eds.), Linguistic inquiries into Donald Trump’s language: From ‘fake news’ to ‘tremendous success’, pp. 17-40. London: Bloomsbury.

Biber, D., and J. Egbert. (2020). Orality on the searchable web: A comparison of involved web registers and face-to-face conversation. In E. Jonsson and T. Larsson (eds.), Voices of the past and present – studies of involved, speech-related and spoken texts, pp. 315-334. Amsterdam: John Benjamins.

Gray, B., and D. Biber. (2020). Corpus-based discourse analysis. In K. Hyland and B. Paltridge (eds.) The Continuum Companion to Discourse Analysis. London: Continuum.

Biber, D. (2020). Corpus analysis of spoken discourse. In O. Kang, S. Staples, K. Yaw, & K. Hirschi (eds.), Proceedings of the 11th Pronunciation in Second Language Learning and Teaching conference, pp. 5–7. Ames, IA: Iowa State University.

Goulart, L., B. Gray, S. Staples, A. Black, A. Shelton, D. Biber, J. Egbert, S. Wizner. (2020). Linguistic perspectives on register. Annual Review of Linguistics, 6, 435–455.

Egbert, J., B. Burch, and D. Biber. (2020). Lexical Dispersion and Corpus Design. International Journal of Corpus Linguistics, 25(1), 89-115.

Biber, D., R. Reppen, S. Staples, J. Egbert. (2020). Exploring the longitudinal development of grammatical complexity in the disciplinary writing of L2-English university students. International Journal of Learner Corpus Research, 6, 38-71. https://doi.org/10.1075/ijlcr.18007.bib