Imagine a group of
intermediate ESL students being able to present the following to their class:
“based on the evidence we have before us, the way how to do something is
not used in English, whereas the way to do something or a way of
doing something are both very frequent”. Native speakers do not produce way
how, because they have not encountered it, whereas way how is
sometimes heard in interlanguage because
of mother tongue interference.
Imagine writing a letter
in English if it is not your first language and you suddenly find that you
don’t know which preposition to use after dream: is it dream of
or dream about? You think you have heard both and now you have to make a
choice, but based on what criteria? Can you recall enough occurrences of dream
to resolve this issue?
Imagine you are a
teacher of English as a foreign language and you are marking your students’
written work. By the time you have encountered I and my brother and we
with my sister a dozen times, you start to wonder if it isn’t right. How do
you check this?
Imagine not being sure
whether whether can be used without or not or not.
Imagine having a corpus of millions of words of English text, and a special computer program that searches the corpus and presents you with lines of text that contain your search item so that you can see way how, dream, we with my brother/sister and whether in context. If the concordancer does not find any way how to, or we with my brother in its millions of words, you can be as confident as the above-mentioned group of intermediate students that it cannot be regarded as probable English. If it finds a significant number of the way + to-infinitive and my brother/sister and I, you can be equally confident that the pattern is probable English.
Here are a few lines extracted from the Cobuild Corpus Sampler (as discussed below).
The first two extracts exemplify having a dream about something and the last pair exemplify dream of in the sense of an ambition.
All of these imaginings can be realized in a variety of ways. Fundamental to all of them is a computer, since all corpus linguistics work, whether it be research in phonology, semantics, syntax, pragmatics or language acquisition, for example, starts with looking into text corpora with a concordancer, the program which searches the corpora according to your criteria.
use of concordancers for pedagogical purposes has been pioneered most notably
by Tim Johns (Birmingham University), and for which he coined the term Data
Driven Learning (DDL). It derives from both the
linguistic and pedagogical streams emanating from Birmingham University in the
early 1980s. The high international profile of this work was achieved largely
through the Cobuild dictionary (John Sinclair, Patrick Hanks, et al) and related
course books, resource books and lexicographic products. The introduction to
this Dictionary makes for fascinating reading.
Here are two
concordance extracts for each of the above three structures:
Discovery learning, to take another pedagogical point of view, has a long and honourable tradition in pedagogy. DDL is in fact discovery learning par excellence, since it does not in itself contain the answers, rather it provides the evidence and allows the students and teachers themselves to draw conclusions about such things as collocation, complementation, connotation, frequency and morphology: some of the things we speak of when defining ‘what it is to know a word’. At syntactic and discourse levels of language, systematic observations of word order, cohesion and punctuation can be made. In DDL it is usual for students to then check their findings with grammars, dictionaries and experts. And as a by-product of these investigations, students are exposing themselves to a great deal of authentic language: input. In DDL, students do not undertake an investigation for its own sake – it is a method of problem solving which starts with a research question, such as those we saw at the opening of this article.
Michael Lewis’ The
Lexical Approach (1993) divides grammar for pedagogical purposes into
Patterns, Facts and Choices. Patterns are the regularities within a
language such as morphological forms (plural s, past –ed), word order (SVOMPT)
and the gerund following prepositions (as occurs in approx. 3% of all
preposition deployment). Facts, on the other hand, are pertinent to
individual items. For example, the use of the subjunctive with certain words;
irregular forms of some high frequency verbs, the comparative and superlative
forms of some high frequency adjectives, and the irregular plurals of some
mostly low frequency nouns all have to be learned as they are.
Choice presents the greatest challenge for non-native speakers
because selecting the most probable form from possible forms involves weighing
up the constraints on each one: Caudry (1998) points out that one needs
“the ability to select the appropriate content and language to suit the
communicative task on hand”. In contrast to facts and patterns, this is not a
black and white issue. Jean Aitchison (1989) refers to the psychological
trigger that performs this choice in native speakers. Non-native speakers,
however, draw on whatever criteria they have available to them, and thus the
choice is made more consciously.
Furthermore, it is worth mentioning that the constraints that do exist on the possible verbal realisations of a proposition mean that no two renderings can have exactly the same meaning: they may very well convey the same propositional content, but differences exist. Transformation exercises which begin with an instruction to rewrite a sentence so that it has exactly the same meaning are disingenuous. e.g.
I’m sorry that I didn’t manage to call you last night..
I regret …
Choice also pertains to
choosing le mot juste from among synonyms e.g. discover, ascertain,
determine, unearth, learn all mean ‘to find out what one previously did not
know’, (Merriam-Webster 1995) although each word is constituted of other
meaning components that distinguish one from the other.
Thus whenever language
is produced, patterns and facts are being constantly invoked, and choices are
being made based on whatever criteria the speakers have at their disposal. The
data on which all this real-time processing is based derives from the
individual’s processing of whatever language input they have been exposed to:
for native speakers the input is random and massive whereas for most non-native
speakers, selected facts and patterns
have constituted a structured course where typically the input is small and
often stripped of reality. Accessing data from a corpus is almost instant
access to language phenomena that are by and large otherwise inaccessible; and
in addition, they appear in authentic contexts, albeit disconnected. It is not
always possible to recall enough examples of dream of/about or other
language features to base the right choice for a particular context on, and
this is where a screen of concordance lines can provide the necessary data.
From a practical point of view, the
question of access to corpora and concordancers has to be considered. In
relation to my own DDL work with students and teachers, I have created a
website called A Ten-step Introduction to Concordancing through the Collins
Cobuild Corpus Concordance Sampler ,
which is intended to train people in using this online sampler programme and to
guide them towards asking valuable and answerable questions. The ten steps
involve undertaking dozens of tasks that demonstrate search strategies and at
the same time address meaningful questions for learners of English. The Cobuild
site itself is at http://www.collins.co.uk/Corpus/CorpusSearch.aspx.
The Cobuild Sampler is just that – a sampler, which means that there are
important corpus linguistics tasks which cannot be undertaken. But it does
serve as a very adequate introduction to concordancing and its limitations are
not necessarily disadvantageous to students. If one needed more, full access
can be negotiated with the publisher. There are other online
corpus-concordancers, and CDs with corpora and concordancers are available for
purchase. Further information about these can be found through links from the Ten
In the classroom, it is
not essential, that students be sitting at computers performing searches
themselves. Teachers can use concordancers in their preparation: finding real
examples and creating exercises and activities. As well, it was providing
groups of students with concordance printouts and tasks that led to the
students quoted in the opening paragraph being able to say, “based on the evidence we have before
When students are directly interpreting
concordance data, an important consideration is the extent of their vocabulary.
Since corpora are created from naturally occurring texts, often including
transcribed speech, they contain ungraded language, idiomatic language,
cultural references and errors! In observing many aspects of language at work,
understanding the context can be decisive. Student success, as usual in
everything we do, can depend on the task. Links to DDL tasks can be found at
the Ten Steps site.
The use of corpora in language teaching
is spreading across the world. TALC (Teaching and Language Corpora) had its
fifth bi-annual conference in Bertinoro, Italy in summer 2002, bringing
together applied linguists and language education professionals from four
continents to share their experiences.
For the many of us who now have ready access to computers connected to the internet, studying language through language is a very real possibility.
Author: James Thomas
The Proceedings of the ATECR 3rd International and 7th National Conference. s. 134-138, 2003.
Aitchison, Jean (1989) The Articulate Mammal. 3rd Edition. Unwin Hyman
Caudery, Tim (1998) Increasing students' awareness of genre through text transformation exercises: An old classroom activity revisited. In TESL - EJ Vol.3, No.3 [http://www-writing.berkeley.edu/TESL-EJ/ej11/a2.html]
Collins Cobuild Corpus Concordance Sampler [http://titania.cobuild.collins.co.uk/form.html]
Harley, T. A. (1995) The Psychology of Language. Psychology Press, UK
Lewis, Michael (1993) The Lexical Approach. Hove: Language Teaching Publications.
Merriam-Webster (1995) Merriam-Webster’s Pocket Guide to Synonyms. Merriam-Webster
Thomas, James (ed.) (2000) A Guide to the Entrance Procedures of the English Department. Masarykova univerzita v Brně
Thomas, James (2002) A Ten-step Introduction to Concordancing through the Collins Cobuild Corpus Concordance Sampler