Approaches‎ > ‎DATA DRIVEN LEARNING‎ > ‎

Teaching with Concordancers: just imagine!

Imagine a group of intermediate ESL students being able to present the following to their class: “based on the evidence we have before us, the way how to do something is not used in English, whereas the way to do something or a way of doing something are both very frequent”. Native speakers do not produce way how, because they have not encountered it, whereas way how is sometimes heard in interlanguage because of mother tongue interference.

Imagine writing a letter in English if it is not your first language and you suddenly find that you don’t know which preposition to use after dream: is it dream of or dream about? You think you have heard both and now you have to make a choice, but based on what criteria? Can you recall enough occurrences of dream to resolve this issue?

Imagine you are a teacher of English as a foreign language and you are marking your students’ written work. By the time you have encountered I and my brother and we with my sister a dozen times, you start to wonder if it isn’t right. How do you check this?

Imagine not being sure whether whether can be used without or not or not.

Imagine having a corpus of millions of words of English text, and a special computer program that searches the corpus and presents you with lines of text that contain your search item so that you can see way how, dream, we with my brother/sister and whether in context. If the concordancer does not find any way how to, or we with my brother in its millions of words, you can be as confident as the above-mentioned group of intermediate students that it cannot be regarded as probable English. If it finds a significant number of the way + to-infinitive and my brother/sister and I, you can be equally confident that the pattern is probable English.

Here are a few lines extracted from the Cobuild Corpus Sampler (as discussed below).


get cocky or you'll find out the hard 
way how
long you can hold your breath.   
   in the host country. Put another 
way, how
much of your salary will you    
but did not learn along the 
way how
to talk about sex.  Behavioral   
understanding in some deep, emotional 
way how
they could work. It made him a   
None of these examples illustrate linking way to a verb group, i.e. * the way how to do something. 
normal for you to have this recurring
dream about
her former therapist the      
 She reported that she had had a
dream about
some slasher stalking you     
Wales tomorrow, he will realise his
dream of
tea with a member of the Royal   
Most architects can only ever
dream of
pulling off such a stunt as

The first two extracts exemplify having a dream about something and the last pair exemplify dream of in the sense of an ambition.

All of these imaginings can be realized in a variety of ways. Fundamental to all of them is a computer, since all corpus linguistics work, whether it be research in phonology, semantics, syntax, pragmatics or language acquisition, for example, starts with looking into text corpora with a concordancer, the program which searches the corpora according to your criteria.

The use of concordancers for pedagogical purposes has been pioneered most notably by Tim Johns (Birmingham University), and for which he coined the term Data Driven Learning (DDL). It derives from both the linguistic and pedagogical streams emanating from Birmingham University in the early 1980s. The high international profile of this work was achieved largely through the Cobuild dictionary (John Sinclair, Patrick Hanks, et al) and related course books, resource books and lexicographic products. The introduction to this Dictionary makes for fascinating reading.

Stemming from the English lexico-grammar tradition (Firth, Halliday, Sinclair), DDL allows the student to focus on words, be they lexical or functional, and explore their use in the language. The grammar patterns that particular words operate within, e.g. dream, or a words’ typical collocations, can be discovered through searching a corpus. Furthermore, many grammatical structures in English manifest through words, for example, have + been + -ing forms the present perfect continuous, if + noun phrase + have + past participle forms a conditional clause, have + possessive pronoun + noun phrase + past participle forms the causative. In many languages, morphology does this work.

Here are two concordance extracts for each of the above three structures:

For the past month and more I
have been
enjoying a magnificent display of    
so far. European governments 
have been
fighting back since the early 1980s  
    I wouldn't have minded if the leaflet
said that. But to ask 
   last night wondering if her husband
got away with murder.
perhaps seventeen years. I can
it checked for you." That won't 
that her repeated requests to
him removed from her class had been 

 Discovery learning, to take another pedagogical point of view, has a long and honourable tradition in pedagogy. DDL is in fact discovery learning par excellence, since it does not in itself contain the answers, rather it provides the evidence and allows the students and teachers themselves to draw conclusions about such things as collocation, complementation, connotation, frequency and morphology: some of the things we speak of when defining ‘what it is to know a word’. At syntactic and discourse levels of language, systematic observations of word order, cohesion and punctuation can be made. In DDL it is usual for students to then check their findings with grammars, dictionaries and experts. And as a by-product of these investigations, students are exposing themselves to a great deal of authentic language: input. In DDL, students do not undertake an investigation for its own sake – it is a method of problem solving which starts with a research question, such as those we saw at the opening of this article.

 Michael Lewis’ The Lexical Approach (1993) divides grammar for pedagogical purposes into Patterns, Facts and Choices. Patterns are the regularities within a language such as morphological forms (plural s, past –ed), word order (SVOMPT) and the gerund following prepositions (as occurs in approx. 3% of all preposition deployment). Facts, on the other hand, are pertinent to individual items. For example, the use of the subjunctive with certain words; irregular forms of some high frequency verbs, the comparative and superlative forms of some high frequency adjectives, and the irregular plurals of some mostly low frequency nouns all have to be learned as they are.

Choice presents the greatest challenge for non-native speakers because selecting the most probable form from possible forms involves weighing up the constraints on each one: Caudry (1998) points out that one needs “the ability to select the appropriate content and language to suit the communicative task on hand”. In contrast to facts and patterns, this is not a black and white issue. Jean Aitchison (1989) refers to the psychological trigger that performs this choice in native speakers. Non-native speakers, however, draw on whatever criteria they have available to them, and thus the choice is made more consciously.

Furthermore, it is worth mentioning that the constraints that do exist on the possible verbal realisations of a proposition mean that no two renderings can have exactly the same meaning: they may very well convey the same propositional content, but differences exist. Transformation exercises which begin with an instruction to rewrite a sentence so that it has  exactly the same meaning are disingenuous. e.g.

I’m sorry that I didn’t manage to call you last night..

I regret …                                                                              

 Choice also pertains to choosing le mot juste from among synonyms e.g. discover, ascertain, determine, unearth, learn all mean ‘to find out what one previously did not know’, (Merriam-Webster 1995) although each word is constituted of other meaning components that distinguish one from the other.

Thus whenever language is produced, patterns and facts are being constantly invoked, and choices are being made based on whatever criteria the speakers have at their disposal. The data on which all this real-time processing is based derives from the individual’s processing of whatever language input they have been exposed to: for native speakers the input is random and massive whereas for most non-native speakers, selected  facts and patterns have constituted a structured course where typically the input is small and often stripped of reality. Accessing data from a corpus is almost instant access to language phenomena that are by and large otherwise inaccessible; and in addition, they appear in authentic contexts, albeit disconnected. It is not always possible to recall enough examples of dream of/about or other language features to base the right choice for a particular context on, and this is where a screen of concordance lines can provide the necessary data.

From a practical point of view, the question of access to corpora and concordancers has to be considered. In relation to my own DDL work with students and teachers, I have created a website called A Ten-step Introduction to Concordancing through the Collins Cobuild Corpus Concordance Sampler , which is intended to train people in using this online sampler programme and to guide them towards asking valuable and answerable questions. The ten steps involve undertaking dozens of tasks that demonstrate search strategies and at the same time address meaningful questions for learners of English. The Cobuild site itself is at The Cobuild Sampler is just that – a sampler, which means that there are important corpus linguistics tasks which cannot be undertaken. But it does serve as a very adequate introduction to concordancing and its limitations are not necessarily disadvantageous to students. If one needed more, full access can be negotiated with the publisher. There are other online corpus-concordancers, and CDs with corpora and concordancers are available for purchase. Further information about these can be found through links from the Ten Steps website.

In the classroom, it is not essential, that students be sitting at computers performing searches themselves. Teachers can use concordancers in their preparation: finding real examples and creating exercises and activities. As well, it was providing groups of students with concordance printouts and tasks that led to the students quoted in the opening paragraph being able to say, “based on the evidence we have before us …”.

When students are directly interpreting concordance data, an important consideration is the extent of their vocabulary. Since corpora are created from naturally occurring texts, often including transcribed speech, they contain ungraded language, idiomatic language, cultural references and errors! In observing many aspects of language at work, understanding the context can be decisive. Student success, as usual in everything we do, can depend on the task. Links to DDL tasks can be found at the Ten Steps site.

The use of corpora in language teaching is spreading across the world. TALC (Teaching and Language Corpora) had its fifth bi-annual conference in Bertinoro, Italy in summer 2002, bringing together applied linguists and language education professionals from four continents to share their experiences.

For the many of us who now have ready access to computers connected to the internet, studying language through language is a very real possibility.     



Short link to this page:

Author: James Thomas

The Proceedings of the ATECR 3rd International and 7th National Conference. s. 134-138, 2003.


Aitchison, Jean (1989) The Articulate Mammal. 3rd Edition. Unwin Hyman

Caudery, Tim (1998) Increasing students' awareness of genre through text transformation exercises: An old classroom activity revisited. In TESL - EJ Vol.3, No.3 []

Collins Cobuild Corpus Concordance Sampler []

Harley, T. A. (1995) The Psychology of Language. Psychology Press, UK

Lewis, Michael (1993) The Lexical Approach. Hove: Language Teaching Publications.

Merriam-Webster (1995) Merriam-Webster’s Pocket Guide to Synonyms. Merriam-Webster

Thomas, James (ed.) (2000) A Guide to the Entrance Procedures of the English Department. Masarykova univerzita v Brně

Thomas, James (2002) A Ten-step Introduction to Concordancing through the Collins Cobuild Corpus Concordance Sampler