Intro to Corpus Linguistics

Stockholm University

***Program: Autumn 2007***

Corpus Linguistics:

Creating, Annotating, Evaluating and Using Corpora

Marina Santini

MarinaSantini.MS <<at>>

November-December 2007

Last Updated: 5 November 2007***

Corpus linguistics is the study of language as expressed in samples of real-world texts. These samples are called corpora. In principle, any collection of more than one text can be called a corpus (corpus means "body" in Latin, hence a corpus is any body of text). But the term corpus when used in the context of modern linguistics indicates a form of empirical linguistics. This course gives theoretical and practical training on how to build, annotate, evaluate and use different types of corpora. The aim of the course is to enable participants to create shareable and fully-documented corpora that will become re-usable and long-lasting resources and enlarge the university corpus repository. The course will be held in English.

Recommended Readings:

  1. McEnery, T., Xiao, R. & Tono, Y. (2006) Corpus-Based Language Studies - An Advanced resource book. Routledge Applied Linguistics (subject to availability)
    1. Wynne, M (ed.). (2005). Developing Linguistic Corpora: a Guide to Good Practice. Oxford: Oxbow Books. Available online from <> or <>.

    1. Journal articles and proceeding papers will be recommended in due time and made available in the course website.
    1. Backman, J (1998). Rapporter och uppsatser. Studentlitteratur;
      1. Bjork and Wikborg E. (1981). A Guide to Essay Writing. Stockholm Papers in English Language and Literature.

        Other Suggested Readings (Optional)

        • McEnery, Tony and Andrew Wilson (1996). Corpus Linguistics. Edinburgh University Press.
        • Garside R., Leech G., and McEnery T. (eds.) (1997). Corpus Annotation: Linguistic Information from Computer Text Corpora. Addison Wesley Longman, London.
        • Biber D., Conrad S., and Reppen R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press.
        • Meyer C. (2002). English Corpus Linguistics. An Introduction. University of Massachusetts, Boston.
        • Halliday, M. A. K. (1992). “Language as system and language as instance: The corpus as a theoretical construct”. In Jan Svartvik, editor, Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82. Mouton de Gruyter, Berlin, pages 61-77.


        Slides in pdf format will be downloadable from the course website,


        This course involves practical excercises without assessment (i.e. lab classes) to be done in groups or individually, 2 assessed exercises to be done individually or in group, and a final assessed essay written in English to be done individually.