Word Counting Units

Word counting units: What are lemma, flemmas and word families?

When creating a vocabulary word list from a corpus of texts the first step is to decide the word counting unit on which the list is based. While not used to create corpus-based word lists, tokens describe the total number of running words that make up a corpus. To establish the number of tokens in a text or corpus simply count every word in a text and if the same word form occurs more than once, then each occurrence is counted separately. Thus, the question, What is the most common word in the English language? contains ten tokens, despite two of them being the same word form, the. Sometimes the term running words is used instead of tokens.


Types refer to the number of different word forms within a text. Thus, the question, What is the most common word in the English language? contains nine types. This is because the two occurrences of the are only counted once.


The lemma is the next largest commonly used word counting unit. The lemma consists of a headword and its inflected, irregular, and reduced forms (n’t) that are of the same part of speech (Francis & Kučera, 1982). The English inflected forms are plural, third-person singular present tense, past tense, past participle, -ing, comparative, superlative, and possessive forms. The lemma use consists of verbs use, uses, used, and using. Swenson and West’s (1934) concept of learning burden underpins the lemma, positing that the learning burden or difficulty in inferring the meaning of inflected forms from the base form is minor. The flemma, unlike the lemma, groups identical forms of different parts of speech, as well as inflectional forms. Thus, flemma use includes verbs uses, used, and using, and also the noun use (Pinchbeck, 2014).


The word family is the most encompassing of the commonly used word counting units. The word family consists of the base word and inflectional and derivational forms from Levels two to six of Bauer and Nation’s (1993) affix criteria. Bauer and Nation’s criteria and first language (L1) studies (Tyler and Nagy 1989; Nagy et al. 1993) have been used to justify the use of the word family as a general word-counting unit in both L1 and L2 settings for creating word lists (Nation 2006), vocabulary size tests (i.e., Nation and Beglar 2007), and vocabulary levels tests (i.e., McLean and Kramer 2015), and lexical research (Nation 2006; 2014). The use word family includes the verb use and the nouns use, misuse, misused, misuser, misusers, misuses, misusing, reusable, reuse, reused, reuses, reusing, unusable, unused, usability, usable, useable, used, useful, usefully, usefulness, useless, uselessly, uselessness, user, users, uses, and using. Bauer and Nation’s criteria, by which English inflectional and derivational affixes were arranged into a graded set of seven levels, were created to guide teaching and learning and standardize vocabulary load and size research. Level ordering divides affixes into several types based on their phonological and morphological behavior in order to determine the level at which a particular affix should be placed.

· Frequency: The frequency with which an affix occurs.

· Productivity: The probability that the affix will be used to form new words.

· Predictability: The degree of predictability of the meaning of the affix.

· Written base form regularity: The predictability of changes in the written form of the base when the affix is added.

· Spoken base form regularity: The amount of change in the spoken form of the base when the affix is added.

· Affix spelling regularity: The predictability of written forms of the affix.

· Affix form regularity: The predictability of spoken forms of the affix.

· Regularity of function: The degree to which the affix attaches to a base of known form-class and produces a word of known form-class.

There are literally hundreds of word lists in existence based on word counting units ranging from the smallest possible word counting unit (type) to the largest (word family) (Nation, 2013; 2016). Once a word-counting unit is selected, word forms are grouped according to the selected word-counting unit, and the word frequencies are established. Depending on the word-counting unit used, the size of the grouping of words differs. For example, if the word list is based on the flemma, then instances of useable and other derivational forms are not counted as instances of the word use. Instead, each derivational form is treated as a separate word. In contrast, if the word-counting unit used is the word family, then each instance of derivational forms of the word use are counted and added together to determine the frequency of the word use.


Reference

Bauer, L., & Nation, P. (1993). Word families. International journal of Lexicography, 6, 253–279.

Francis, W. N., & Kučera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.

McLean, S., & Kramer, B. (2015). The Creation of a New Vocabulary Levels Test. Shiken, 19, 1–11.

Nation, P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63, 59–82.

Nation, P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge, England: Cambridge University Press.

Nation, P. (2014). How much input do you need to learn the most frequent 9,000 words?. Reading in a Foreign Language, 26, 1–16.

Nation, P. (2016). Making and using word lists for language learning and testing. John Benjamins Publishing Company.

Nation, P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31, 9–13.

Nagy, A., Rossant, J., Nagy, R., Abramow-Newerly, W., & Roder, J. C. (1993). Derivation of completely cell culture-derived mice from early-passage embryonic stem cells. Proceedings of the National Academy of Sciences, 90(18), 8424-8428.

Pinchbeck, G.G. (2014). Lexical frequencies profiling of Canadian high school diploma exam expository writing: L1 and L2 academic English. Roundtable presentation at American Association of Applied Linguistics, Toronto, Canada.

Swenson, E., & West, M. (1934). On the counting of new words in textbooks for teaching foreign languages (No. 1). University of Toronto.

Tyler, A., & Nagy, W. (1989). The acquisition of English derivational morphology. Journal of memory and language, 28(6), 649-667.


Papers investigating the most appropriate word counting unit amongst different learners

Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(4), 253-279.

Bertram, R., Laine, M., & Virkkala, M. M. (2000). The role of derivational morphology in vocabulary acquisition: Get by with a little help from my morpheme friends. Scandinavian Journal of Psychology, 41(4), 287-296.

Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the new general service list. Applied Linguistics, 36(1), 1-22.

Brown, D. (2018). Examining the word family through word lists. Vocabulary Learning and Instruction, 7(1), 51-65.

Brown, D., Stoeckel, T., Mclean, S., & Stewart, J. (2020). The most appropriate lexical unit for L2 vocabulary research and pedagogy: A brief review of the evidence. Applied Linguistics. Currently available in advance access at https://doi.org/10.1093/applin/amaa061

Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions, TESOL Quarterly, 50, 976–87.

Laufer, B., & Cobb, T. (2020). How much knowledge of derived words is needed for reading? Applied Linguistics, 41(6), 971-998.

McLean, S. (2018). Evidence for the adoption of the flemma as an appropriate word counting unit. Applied Linguistics, 39(6), 823-845.

Mochizuki, M., & Aizawa, K. (2000). An affix acquisition order for EFL learners: An exploratory study. System, 28(2), 291-304.

Nagy, W., Anderson, R. C., Schommer, M., Scott, J. A., & Stallman, A. C. (1989). Morphological families in the internal lexicon. Reading Research Quarterly, 262-282.

Reynolds, B. L. (2013). Comments on Stuart Webb and John Macalister's" Is text written for children useful for L2 extensive reading?" TESOL Quarterly, 47(4), 849-852.

Reynolds, B. L. (2015). The effects of word form variation and frequency on second language incidental vocabulary acquisition through reading. Applied Linguistics Review, 6(4), 467-497.

Reynolds, B. L., & Wible, D. (2014). Frequency in incidental vocabulary acquisition research: An undefined concept and some consequences. TESOL Quarterly, 48(4), 843-861.

Schmitt, N., & Meara, P. (1997). Researching vocabulary through a word knowledge framework: Word associations and verbal suffixes. Studies in second language acquisition, 17-36.

Ward, J., & Chuenjundaeng, J. (2009). Suffix knowledge: Acquisition and applications. System, 37(3), 461-469.