Approaches‎ > ‎

DATA DRIVEN LEARNING

This term was coined by Tim Johns, working at Birmingham University during the COBUILD era, 1980s in particular. Tim's death in 2009 inspired a wealth of tributes, some of  which can be found on BU's page here. His DDL webpage is here.  And Mike Scott, of Wordsmith fame, 's tribute is here.

Tim developed a teaching procedure referred to as the kibbitzer  (click for an explanation of this lovely sounding word). Tim's page is here.


In order to kibbitz, two things are needed: a corpus and a concordancer. The former is an electronic database of text and the latter is a program that searches it, to put it simply.

Despite the relatively small number of teachers, including university teachers and teacher trainers, using this approach or anything similar, thousands of corpora have been created in the last few decades and at least dozens of concordancers have been programmed. And the work on both is ongoing. One of the main reasons for this is that they facilitate a wide range of linguistic analyses. Without data, we only have intuition about how language works, what are the patterns, what is typical and what is exceptional, how pre-programmed our minds seem to be, etc.

Most, if not all, modern dictionaries are written by linguists working with corpora. They observe patterns of collocation and colligation, contexts, frequencies of words, multi-word units, grammatical patterns, pragmatic uses of language, cultural factors, etc. And then they decide which of their findings they should include in their dictionary. Similar things are being done now as descriptive grammars are coming out.

In contemporary pedagogy, with our belief in discovery learning, tasked-based learning, learner autonomy, group work, etc., we now like to let the students attempt to answer their own language questions by looking at concordances themselves, using printouts, class demonstrations and by working alone or in small groups at a computer. It is not always easy and it is not suited to all learning styles, but for those who get into it, they report great satisfaction. On showing corpus data to a class of vets some years ago, one said, "Thanks for making yourself redundant". This was a dream response,

Problems and Solutions

Much has been written about the problems that computers cause learners, how corpus software is not always user-friendly or intuitive, and how data can be easily misinterpreted. A solution to this is the use of prepared printouts with selected corpus lines from which the students can be more directly pointed in the right direction. Alex Boulton is keen proponent of this hands off method.

I would like to propose a intermediate solution. Clicking on this URL takes you to the phrase not at all in the BNC via the Sketch Engine. A sample of 100 lines has been taken and they are right sorted. The URL contains all this information. It is not very pretty and it certainly not typed in. This is where a URL shortening tool such as bit.ly comes in handy. This URL will take you to the same page as the full one: http://bit.ly/notatall_100.

Much of the groundwork is therefore done for the students, and once they are in the right place, they can then use some of the other aspects of the concordancer, such as Frequency, Collocation, Sorting.


http://ske.fi.muni.cz/bonito/run.cgi/sortx?q=alc%2C[lc%3D%22not%22|lemma%3D%22%28%3Fi%29not%22][lc%3D%22at%22|lemma%3D%22%28%3Fi%29at%22][lc%3D%22all%22|lemma%3D%22%28%3Fi%29all%22]&q=e50+&q=r100&q=r50;corpname=preloaded%2Fbnc2&viewmode=kwic&attrs=word&ctxattrs=word&structs=g&refs=%3Dbncdoc.genre&pagesize=50&copy_icon=1&gdex_enabled=1&gdexcnt=50&gdexconf=&iquery=not+at+all;skey=rc;sicase=i

Threshold Level

Students who have achieved the threshold level English can discover a great deal about how the language works for themselves, especially using such resources as those linked below. These resources also provide a lot more information than dictionaries can: in fact, modern dictionaries are created using such resources and are a distillation of the data. Corpus-based Language Study is a portal that includes links to software, various corpus-related tools and articles. There are also some other language tools of interest to teachers and students of English.


Terminology

Showing 10 items
CategoryTermDefinition
Sort 
 
Sort 
 
Sort 
 
CategoryTermDefinition
Language attested language that has been created for genuine communication, not as a teaching example 
Software concordance the lines of text that are produced when a search is done 
Software concordancer a program that searches corpora and presents the results in various ways 
General corpora plural of corpus 
General corpus a database of attested language (pl. corpora) assembled according to specific criteria 
Software Corpus Query Language  
Language Lemma  
Software Token  
Language Type  
Language Wordform  
Showing 10 items