Friday - 15h00, Chemicum 212
This course will build on what you have learnt in your NLP (Python) and Statistics (R) courses.
You will learn how to use the analytical corpus tools:
concordances, collocations, colligations
You will learn the technical NLP tools:
GREPS (regular expressions), corpus compilation (BootCaT), automatic annotation (TreeTagger)
You will learn the technical statistical tools
correspondence analysis (pattern identification), cluster analysis (classification), regression analysis (machine learning)
##########
Assessment
Assessment will be based on two reports, one treating the use of the corpus tools, the other statistical tools
Here is a link to the assessment description
Due Date: to be determined
Important : send reports to studentwork.glynnp8@gmail.com
#############
Class 1 - Introduction to Categorical Stats
This class will look at what categorical statistics is, what are assumptions it makes, why it is essential and what it can do to help us.
The class will be entirely discussion based and may (hopefully) overlap a little with what you have done in your 1st semester R course.
and if we have time
#############
Class 2 - Basics
a. Chi-Squared, statistical independence and statistical significance
This class will make sure that everyone is confident with the basics in R, leading data, examining the data in R and then running the most basic categorical test - the Chi2 Test. We will also make sure that everyone understands how it works, since the principles involved form the basis of all or most categorical statistics.
b. Correspondence analysis and indentifying structure in complex data
This class will introduce an exploratory method for complex categorical data - multiple correspondence analysis. The methods produce complex plots and confidence in interpreting these plots will be the aim of the class.
#############
Class 3 - Correspondence Analysis Part 1
Today we work on the basics, getting a data frame into R and running a Chi2
R-commands-Chi2
R Commands - Correspondence Analysis
#############
Class 4 - Correspondence Analysis Part 2
Today we will summarize bivariate analysis, and introduce multivariate analysis.
Please pre-install the packages:
ca
FactoMineR
explor
We will use these data from your first semester - the representation of women in magainzes
We will use the R Commands for correspondence analysis
#############
Class 5 - Cluster Analysis
We will look at classification of data
1. Looking again at results of MCA - trying to sort the output of correspondence analysis
2. Looking at Hierarchical Cluster Analysis - exploring how best to sort data
Please pre-install the packages:
pvclust
We will use the R Commands for cluster analysis
We will use these data from your first semester - the representation of women in magainzes
Other data for learning Cluster Analysis: Data for play in cluster analysis
#############
Class 6 - K-Means Cluster and Loglinear Analysis (4 April)
Please pre-install the packages:
cluster
vcd
Looking at K-medoid / K-Means Cluster Analysis - testing how best to sort data
We will use the R Commands for cluster analysis
We will use these data from your first semester - the representation of women in magainzes
Other data for learning Cluster Analysis: Data for play in cluster analysis
LogLinear analysis - basically this is just multinomial Chi2
We will use these R commands for loglinear analysis
We will use both sets of data
data - your lexical semantics results from semester 1 - lexical semantics
data - your discourse analysis results from semester 1
#############
Class 7 - Logistic Regression Part 1 (16 April)
We will start Logistic Regression today
Please pre-install the package:
rms
These are the R commands for LogReg
We will use both sets of data
data - your lexical semantics results from semester 1
data - your discourse analysis results from semester 1
#############
Class 8 - Logistic Regression Part 2
Today we will work on Logistic Regression and wrap up the statistics part of the course.
We will discuss and agree upon the assessment for this part of the course.
#############
Class 9 (23 May)
Aims:
Wrap up the stats part of the course with revision for assessment task.
Discuss and agree upon assessment task, Introduce LaTeX
Start collocations and colligations in SketchEngine
https://www.sketchengine.eu/
#############
Class 10 (30 May)
Aims: look at how collocations and colligations can help answer research questions
LaTeX - https://www.overleaf.com/project
Create account and use the "Association for Computational Linguistics (ACL) conference" template
#############
Class 11 (06 June)
Revision and Preparation for Reports 1
Chi2
MCA
HCA
#############
Class 12 (11 June)
Finish HCA
Modelling Grammar through Meaning - Bresnan et al 2005
Syntax- Future constructions in English - semantic features
Syntax- Future constructions in English - formal features
#############
Class 13 (13 June)
Revision and Preparation for Reports 2
LateX (Association for Computational Linguistics (ACL) conference)
Regression - commands
Collocation / Colligation - sketchengine
REPORTS
Stats
Corpora
Data for Stats Report
d. HAPPY in Czech, English and Polish
Commands for Stats Report
R-commands-Chi2
R Commands - Correspondence Analysis
R-Commands - Logistic Regression
Data for Collocation Report
https://www.sketchengine.eu/
LaTeX - for Reports
Create account :
https://www.overleaf.com/project
Use the "Association for Computational Linguistics (ACL) conference" template
##########
Data for learning
data - your lexical semantics results from semester 1
data - your discourse analysis results from semester 1
data - last years lexical semantic results
data - discourse analysis from UAM previous year
data - discourse analysis from P8 previous year
######################
OLD
STRUCTURE OF CLASSES BEFORE ROOM MIX UP
##########
Class 3 -
a. Correspondence analysis and exploring complex data B.
This class will examine in more detail how to interpret and judge the reliability of the correspondence plots. It will also introduce clustering tools for aiding in the interpretation of the plots.
b. Agglomerative Cluster analysis and sorting complex data
This class will examine agglomerative methods for "sorting" data discreetly, relative to a range of variables.
Class 4 -
a. K-Means Cluster analysis and confirming categories in complex data
This class will examine "top-down" methods for "sorting" data discreetly, relative to a range of variables.
data - Some more lexical semantic data to play with (happy in Czech, English and Polish)
b. Loglinear analysis and confirming complex associations
R commands - Loglinear Analysis
Class 5 - Logistic Regression and machine learning
R commands - Logistic regression 1 - NEW VERSION
To arrange
Logistic Regression - Diagnostics and Mixed Effects
Corpora - BYU English Corpora
Corpora - Sketch Engine
corpus compilation (BootCaT),
Collocations
Colligations
GREPS (regular expressions),
Automatic annotation (TreeTagger)
OLD link and resources
Discourse Analysis / Pragmatics
a. Representation of women in fashion magazines
b. Boasting strategies on Instagram
Lexical Semantics / Lexicology
a. Lexical Field of fate in Russian, English and Polish
b. Lexical Semantics - synonymy grateful vs thankful
Grammatical Constructions / Syntax
a. Epistemic constructions (I mean vs. I think vs. you know)
b. Future Constructions in Norwegian