What band size is best for what purposes?

Like so much of testing the answer to "What band size is best?" will depend on the purpose for testing and setting. However, in general ...

  • If you are testing learners’ lexical level to match them with lexical appropriate materials ensure that your testing band size fits the text profiling band-size. If the same band size is not used, one band size needs to be a multiple of the other.

  • The narrower the bands, the more detail you get from your data.

  • The lower the proficiency the learners, the smaller the bands we want to use.

  • The higher the frequency the vocabulary, the smaller the band you want to use. See extracts below from Kremmel (2016). The complete paper is here.

  • If the goal of testing is to establish a lexical mastery level so (a) learners can be matched with lexical appropriate materials or (b) learners’ progress can be charted, the smaller the bands the better.

  • If the goal is simply separate learners in order by their lexical proficiency larger bands are fine.

  • Convention has led us to use 1,000-word bands. In a Japanese setting for most learners this is too coarse as Japanese learners often only have mastery of the first 1000 words.

Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions. tesol QUARTERLY, 50(4), 976-987.

Extracts from Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions. tesol QUARTERLY, 50(4), 976-987.

ASSUMPTION 2: “THE FREQUENCY CONTINUUM SHOULD BE DIVIDED INTO 1,000-WORD BANDS”


Since frequency is a continuum, the question arises how to divide this continuum into bands, at least for operationalisations in vocabulary test development. Arguably, any system of banding or fixed categories will be somewhat arbitrary and its boundaries will unavoidably create anomalies. However, although vocabulary learning does not follow a strict frequency order for each individual word, it certainly does in larger clusters (bands), at least at the higher frequency levels. Therefore, a system of fixed categories makes sense both from an assessment as well as a theoretical and pedagogical perspective.


Dividing the frequency continuum into bands seems particularly important because “all vocabulary is not created equal” (Gardner, 2013, p. 12). However, this insight has not been taken up by vocabulary test developers. Schmitt and Schmitt (2014) argue that we need to reassess the notion of word frequency in relation to teaching and testing value. However, little research has looked into the usefulness of frequency as a sampling criterion across different frequency levels. Frequency clearly has utility for clustering words together into manageable and useful groups for pedagogic and assessment purposes, but this utility (i.e., the clustering power of frequency) might be variable along the frequency continuum. However, no research to date has investigated this empirically. Instead vocabulary test developers have mostly relied on the pragmatic decision to group items together holistically into bands of 1,000-word families and sample in equal amounts from these arbitrary bands. Through the use in tests such as the VLT, the VST, and the CATSS, this convention has become a tradition, arguably only for the sake of being able to work with round numbers.


However, looking at the coverage provided by items in lemmatised frequency lists, it seems clear that bands differ in their importance in terms of the coverage they provide. This awareness should be incorporated in vocabulary test item sampling. The purchasable frequency list from the Corpus of Contemporary American English (COCA) (Davies, 2008), the largest, most up-to-date, systematic collection of texts in English from a variety of genres, including spoken language in the globally most prominent variation of English, can illustrate this. Figure 1 shows the coverage in percent (of the COCA) provided by the lemmas of the COCA frequency list in increments of 500. The top line shows the coverage based on the original list (which includes function and content lemmas), while the bottom line shows the coverage contributed solely by the content lemmas (nouns, verbs, adjectives, adverbs).

The above figure was taken from page 981 of Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions. tesol QUARTERLY, 50(4), 976-987.

Figure 1 highlights the need to break up the traditional 1,000- lemma levels into finer grained bands at the high-frequency end of the continuum. These high-frequency lemmas are, based on the coverage they provide, simply more useful and important for language learners. It would thus make sense to sample more, and in more detail, at this end of the frequency continuum. Conversely, it might also make sense to cluster lemmas together in bigger bands toward the lower frequency end, because they are of limited use in the additional coverage they provide.


To consider coverage percentages, we also need to consider function words. Function words are typically not included in vocabulary tests, and the unspoken assumption is that they are simply known. But Figure 1 illustrates that the 127 function words that are found among the first 500 lemmas account for about 40% of the coverage (see also Schmitt & Schmitt, 2014). This means that vocabulary tests are essentially “giving” learners

credit for 40% coverage of an average text. Compared with this, we would want the coverage of the content lemmas to also be quite substantial. The content lemmas in the first 500 lemmas account for a considerable

26% coverage. The levels until around 3K add smaller, but meaningful, amounts of coverage, but after this taper off. Given that the function words account for 40% of coverage, it does not make sense to retain 1,000-lemma bands at the low-frequency end of the continuum if they add only a fraction of a percentage to the overall total coverage.