5

Specialist area: using corpora in the language teaching classroom



A word list showing the top ten most frequent words in the English language
This is an example of a word list, which shows the most frequently occurring words in a corpus. This list focuses on nouns, which can provide a useful starting point for those starting to learn a language. Curious fact: time appears as the most commonly used noun in many languages around the world.

Context

The specialist area I have chosen to focus on is that of corpus linguistics, which is a quantitative, computer aided approach to textual analysis. I was first introduced to using corpora in language teaching as part of my Master's degree in linguistics just before I started my career in higher education, and since then I have been using it as a means to augment my classroom practice and course design.

In language teaching, a corpus refers to a large body of machine readable text, usually made up of many individual texts, that provides a representative sample of authentic language. One of the most well-known corpora is the British National Corpus, which consists of approximately 100 million words, and contains texts from a wide range of genres including newspapers, literature, essays as well as transcribed conversations, thus providing a broadly representative sample of spoken and written British English. The corpus is then fed into software that picks out statistically significant patterns in the language, and allows users to enter words or phrases to see how they're actually used in the language: where they appear in sentences, how frequently they occur, and what words they commonly appear next to. For language learners this represents a valuable opportunity to explore a target language and do things like:

  • extract a series of examples of a particular word or phrase they're interested in learning;

  • find all the most frequent nouns, verbs, adjectives etc. and focus their learning time on these;

  • explore common phrases and natural sounding collocations.

In language courses for higher education students, an important development was the creation of the Academic Word List (AWL) by Averil Coxhead (2000). Using a corpus of 3.5 million words comprised of academic articles across a range of disciplines, the AWL offered a list of 570 word families that appeared with statistically greater significance in academic writing than in the English language generally. The idea was that by focusing their energies on acquiring these particular words, international students would be better prepared linguistically for higher education in English speaking institutions. Although the AWL and the concept of 'academic English' in general have been critiqued and refined over the years, such lists have had a significant impact on course and materials design. However, during my time in higher education, I have developed lessons, study materials and workshops designed to get students interested in and using corpora for themselves. This section details some of the ways in which I have attempted to do this, how my teaching of them has changed and developed, and how I have encouraged and guided other teachers to incorporate these tools into their own teaching.

Using corpora in language teaching and course design

I have always found corpora to be an extremely interesting and useful way of learning about a language, and as such have made a great deal of effort throughout my career as a language teacher to share this interest with students. Trying to get language learners to engage directly with corpora (rather than just use corpus informed material) is also something that has long been advocated by researchers and scholars in the field of language education. It has the potential to enable students to explore language on their own, and to rapidly increase the language acquisition process by providing repeated and targeted exposure to specific words and phrases. In reality though, I have always found it challenging to get students to start using corpora independently as a study aid. This is for a number of reasons, the first of which is that there is a learning curve that students may be unwilling to invest sufficient time in navigating. Free apps on smartphones can emulate many of the surface level features of corpus software, like providing examples of key words in context and flagging collocations. Therefore, students whose interest in learning English is purely instrumental may not see time spent exploring patterns in language in such depth as being particularly useful.

Nevertheless, as a teacher I have had some success in getting students interested in using corpora. Below are some early examples of materials I produced for lessons with students and workshops for other teachers. Many language teachers are not confident in actively using corpora in the classroom. While most will know of the Academic Word List and will have used materials informed by the AWL, their working knowledge of corpus linguistics is often limited. As such, as a course designer, I periodically supplement my materials with training workshops to provide teachers with enough familiarity to help get students started.



Dashboard of the Sketch Engine showing tools such as concordancer, word sketch and sketch diff
This is the dashboard of the Sketch Engine, which overviews the various tools available to explore corpora. The interface has become much more user-friendly over recent years, which has helped encourage interest in students and teachers to start using corpora more in teaching and learning.

Developing specialisation

The slides and activity sheet on the left are an example of my initial attempt to bring my specialist knowledge and interest in corpora to the language classroom. They provide an overview of some its uses and introduce simple practice activities and example queries. I used these same materials in teacher training sessions where teachers played the role of students. These training sessions were received positively, and teachers reported feeling more confident with using corpora to inform their own language teaching. Some workshop attendees went on to adapt the materials with their own ideas which they then shared with the group.

These were early, prototype materials from before I arrived at York, and as such have some issues with accessibility if judged by today's standards!

Reflection and revisions

The approach to introducing corpora to students outlined above met with modest success, and was seen to be interesting and relevant to a small number of students, some of whom continued to use tools such as the Sketch Engine on a regular basis. However, I think on reflection that one of the problems was that these earlier materials were pitched at too high a level, and so the only students who were benefiting from them were the ones whose language proficiency was already at a sufficient level to grasp some of the concepts being discussed. Since coming to York, I have continued to develop the use of corpora in the classroom, but I have made several streamlining changes to the materials and the way they're presented, particularly now I'm working more with undergraduates. The example on the right shows how I've used Xerte to introduce the Sketch Engine to students via a flipped learning object. Rather than a standalone lesson as it used to be, the introduction is now embedded in a set of more general language tasks. Another advantage of presenting the material this way is that it removes the need for teachers to 'teach' corpora when they themselves may still be getting to grips with it. While I still run training sessions on the Sketch Engine for the teachers on the Language and Study Skills module, packaging the materials as independent study takes some of the stress away from the teaching team.


Screenshot of Xerte materials introducing corpora
This learning object also contains an embedded screencast that I recorded on Panopto to talk new users through the setup stages of the Sketch Engine and how to process simple queries.
Screenshot of a word cloud detailing the most frequently occurring words by size. 'materials', 'teachers', language', 'students' and 'corpora' feature heavily.

Informing course design

Despite the refinements that I have made to my teaching practice and the way I use corpora, experience has strongly indicated that a level of intrinsic interest in language generally is a requirement in order for students to see the value of using corpora. As such, I have developed my lessons and training sessions to include activities designed purely to generate engagement, and to explain the value of learning through observation rather than simply taking teachers' word for the ways in which language works. One example of this is the use of 'word clouds'. A word cloud generator turns a stretch of running words (i.e. a corpus) into a graphic where the frequency of a word is represented by size. The example on the left is based on the text from this page using Wordle. Although it's introduced as a bit of fun, it can be used to highlight key words to learn in more complicated texts such as journal articles. I started including word cloud generators in module orientation sessions where students are introduced to independent study tools that can help with their language learning. This learning object from week one is an example of how materials have been developed to gently introduce students to these tools and concepts (see the second tab labelled Lexical Frequency Tools).

Another example of how I have adapted the module to incrementally introduce simple corpus tools early on is one of the writing lessons, where I included some simple exercises using the Sketch Engine for Language Learning (SkELL) which, as the name suggests, is a simplified version of the Sketch Engine more specifically designed for language learners.

Feedback from my teaching team has also suggested that corpus tools such as the Sketch Engine do have potential value and are worth introducing to students, but they need to be introduced more gradually, in bite-sized form, revisited regularly and made a more intrinsic part of the course, and not embedded in already-complex lessons (see particularly the feedback from teachers two and three).

Professional development and maintaining specialist knowledge

I keep track of developments in corpus linguistics and language teaching via a range of academic journals and professional bodies. One of the most relevant and useful outlets within my own field of English for Academic Purposes (EAP) has been BALEAP (British Association of Lecturers in English for Academic Purposes) events, whose conferences and workshops showcase technological tools and pedagogical practice designed to help language learners make use of corpora. The most recent (in-person pre-pandemic) BALEAP conference, for example, included workshops on using corpora to help teach subject specific lexis, guiding students through building their own corpora and the launch of a course run by the University of Sheffield aimed at language teachers looking to develop their own proficiency in using corpora in the classroom. There is also a Technology Enhanced Learning Special Interest Group under the BALEAP banner, which has run webinars on corpus use.

Summary reflection

There is plenty of scope to further develop the concept of corpus linguistics in my own teaching and in the courses that I design and run. As I also supervise a teaching team that delivers the course material that I write, I'm well situated to utilise this specific area of knowledge to help other EAP practitioners incorporate the ideas discussed in this section into their own teaching. The training sessions that I have run over the years have been received positively, and teachers have reported that tools such as the Sketch Engine have informed both their own understanding of language and their classroom practice. As discussed, it can be difficult to get some students to run with the idea themselves and start exploring corpora, but those that have seen the value have had their view of language learning shifted. This is because corpus linguistics allows us to observe and describe language as a natural phenomenon, and advances in technology and open-access resources mean that the same tools that lexicographers use to write dictionaries are now available to language learners. Students of English often become particularly interested when they realise they can run the same queries on corpora in their own language. Institutions signed up to the Sketch Engine have access to corpora in an array of languages including Russian, Arabic and Mandarin. It is in these ways that I have observed the most impact on student engagement and interest in language learning, as learners realise that language can be studied objectively and independently without reliance on teacher-led instruction. Formal feedback from students via exit surveys on this topic has always been mixed. While the majority of students see the concept of corpora as intrinsically interesting, few of them reported having incorporated tools such as the Sketch Engine into their language learning routine. If nothing else though, the feedback indicates that students are more aware of language being an observable phenomenon rather than a set of arbitrary rules set down by native speakers of a language.

In terms of future directions, I have not so far developed lessons around tools such as AntConc, which is a freely available piece of software used by language students and researchers alike to build their own corpora. Many universities also have made their own resources available, such as the University of Michigan's corpora, which allow users to explore the language patterns specific to certain academic disciplines and genres such as humanities essays and chemistry lab reports. While these (and many others) represent opportunities for future development, they require a time investment that students and other teachers may feel detract from more immediate learning outcomes. As such, it is difficult to envision corpora playing more than a supplementary role in the language classroom for those students who display sufficient interest. Nevertheless, corpus tools have served to make my language courses distinctive and of developmental use to both students and colleagues.

4
Working with others
Future Plans