Corpus linguistics

click here to return to Michael Pearce's home page


The British National Corpus (BNC)

The Scottish Corpus of Texts and Speech (SCOTS project)

A number of corpora created by Mark Davies at Brigham Young University are available here. These include the Corpus of Contemporary American English (COCA), the Corpus of Historical American English (COHA), and the TIME magazine corpus. The site also has interfaces with the BNC. You can learn how to use this resource by watching this video.

The American National Corpus

British academic spoken English (BASE) corpus

Micase: Michigan corpus of academic spoken English

The Linguistic Data Consortium – supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.

OTA: the Oxford Text Archive – collects, catalogues, and preserves high-quality electronic texts for research and teaching.

The Survey of English Usage – carries out research in English language corpus linguistics.

The International Corpus of English – ICE began in 1990 with the primary aim of providing material for comparative studies of varieties of English throughout the world.

Try the Google Ngram Viewer.

Corpus tools

WordSmith Tools

AntConc - a useful free programme. There are several videos on youtube by the inventor of Antconc (Laurence Anthony) which show you how to use it. 

Text Analysis Portal for Research (Tapor)

Use Word Sketch Engine to explore the BNC. You can try it on these corpora too.

Web concordances and workbooks from Dundee University.

Phrases in English

The Compleat Lexical Tutor - for data driven language learning.

Concordance - software for text analysis.


Online journals

ICAME Journal – published once a year (in the spring) with articles, conference reports, reviews and notices related to corpus linguistics.

ELR (Empirical Language Research) Journal has downloadable papers.


David Lee's bookmarks for corpus-based linguistics are here.

Corpus Linguistics – a practical web-based course from the University of Lancaster.

Developing Linguistic Corpora: A Guide to Good Practice – edited by Martin Wynne.

UCREL (University Centre for Computer Corpus Research on Language) at Lancaster University.

Michael Stubbs has some downloadable papers here.


click here to return to Michael Pearce's home page