Description
Statistics is arguably the cornerstone of all empirical science. From the visualisation of complex and often subtle interactions in our data to the probability that our observations and results represent the reality of the world we seek to explain, contemporary science could not exist without statistics. Especially important is the ability to model our data in ways that allow us to make predictions in order to test our hypotheses and calculate the accuracy of our descriptions.
The program R is the standard tool for performing statistics in corpus linguistics. Open source and cross-platform, this program is ideal for the kind of work that we, as social and cognitive scientists, need. However, the afternoon session will not be be a course in R, but rather in what you can use R for. No knowledge of mathematics or programming is required.
Materials
Computer with Internet access
Programme
1. Introduction - Categorical data and R
a. Fundamentals
Population and sample - why we can't use pourcentages
Signification and confidence intervals - chow to test and generalise
Patterns et predictions – why use statistics
b. R open-source, cross platform and cool :)
Data - clean and ordered
Visualising your counts - beyond pie charts
Chi-2 – a first step in significance
2. Correspondence Analysis
a. Associations and identifying patterns in complex data
b. coming
3. Cluster Analysis
a. Sorting and identifying structures in complex data
b. coming
4. Binary logistic regression
a. Fixed Effects
b. Mixed Effects
Slides
Slides 1 - Theoretical Assumptions
Slides 2 - Examples of Techniques
Commands for R - coming
R-Commands - Correspondence Analysis
R-Commands - Logistic Regression
Data for Play
Semantics - Fate in English and Russian
Semantics - Happiness in Czech, English and Polish
Semantics - Women in Vogue, Cosmo and Closer
Grammar - Future constructions in English
Grammar - Future constructions in French
Grammar - Epistemic constructions in English
Apps
Mac Only - BBEdit (free version) : https://www.barebones.com/products/bbedit/
References
Baayen 2008 - Analyzing Linguistic Data. A practical introduction to statistics using R. CUP
Glynn & Robinson 2014 - Corpus Methods for Semantics. Quantitative studies in polysemy and synonymy. JBs.
Gries 2009 - Quantitative Corpus Linguistics with R. A practical introduction. Routledge.
Gries 2013 - Statistics for Linguistics with R. A practical introduction. Mouton.