Introduction to Statistics
for Corpus Linguistics and DIscourse Analysis

Braga Summer School in Linguistics 2023
Centre for Philosophical and Humanistic Studies
Faculty of Philosophy and Social Sciences, Portuguese Catholic University
Univrsité Paris 8

Location: TBA
08 July, 2024

Contact : Dylan Glynn, dsg.up8@gmail.com

Description
Statistics is arguably the cornerstone of all empirical science. From the visualisation of complex and often subtle interactions in our data to the probability that our observations and results represent the reality of the world we seek to explain, contemporary science could not exist without statistics. Especially important is the ability to model our data in ways that allow us to make predictions in order to test our hypotheses and calculate the accuracy of our descriptions.

The program R is the standard tool for performing statistics in corpus linguistics. Open source and cross-platform, this program is ideal for the kind of work that we, as social and cognitive scientists, need. However, the afternoon session will not be be a course in R, but rather in what you can use R for. No knowledge of mathematics or programming is required.

Materials
Computer with Internet access

Programme
1. Introduction - Categorical data and R
a. Fundamentals
Population and sample - why we can't use pourcentages
Signification and confidence intervals - chow to test and generalise
Patterns et predictions – why use statistics
b. R open-source, cross platform and cool :)
Data - clean and ordered
Visualising your counts - beyond pie charts
Chi-2 – a first step in significance

2. Correspondence Analysis
a. Associations and identifying patterns in complex data
b. coming

3. Cluster Analysis
a. Sorting and identifying structures in complex data
b. coming

4. Binary logistic regression
a. Fixed Effects
b. Mixed Effects

Slides

Slides 1 - Theoretical Assumptions

Slides 2 - Examples of Techniques

Commands for R - coming

R- Commands - Basic

R-Commands - Correspondence Analysis

R-Commands - Cluster Analysis

R-Commands - Logistic Regression

Data for Play

Semantics - Fate in English and Russian

Semantics - Happiness in Czech, English and Polish

Semantics - Women in Vogue, Cosmo and Closer

Grammar - Future constructions in English

Grammar - Future constructions in French

Grammar - Epistemic constructions in English

Apps

R: https://www.r-project.org/

Mac Only - BBEdit (free version) : https://www.barebones.com/products/bbedit/

References

Baayen 2008 - Analyzing Linguistic Data. A practical introduction to statistics using R. CUP

Glynn & Robinson 2014 - Corpus Methods for Semantics. Quantitative studies in polysemy and synonymy. JBs.

Gries 2009 - Quantitative Corpus Linguistics with R. A practical introduction. Routledge.

Gries 2013 - Statistics for Linguistics with R. A practical introduction. Mouton.

Page updated

Google Sites

Report abuse

Introduction to Statistics for Corpus Linguistics and DIscourse Analysis

Introduction to Statistics
for Corpus Linguistics and DIscourse Analysis