Teaching‎ > ‎E 396L‎ > ‎

E 396L: Syllabus

posted Dec 15, 2017, 6:44 AM by Lars Hinrichs   [ updated Jan 3, 2018, 5:06 PM ]

Lars Hinrichs, Spring 2018

W 12-3, CAL 419

E 396 L "Quantitative Methods in the Language Sciences"

The course teaches computational methods in text mining, natural language processing, and applied statistics. As such, it provides training in the programming languages R and Python. Discussions and readings will foster critique of the process of matching up data, research questions, and methods. The course will benefit anyone working with and studying text, including both linguists and literary scholars interested in digital approaches.

The textbook for this course will be How to do Linguistics with R by Natalia Levshina (here). In addition there will be selected readings and online tutorials. The following topics will form foci of the course:

  • Building a social media corpus*

  • Turning electronic text into a corpus**

  • Mining text corpora**

  • Visualization: ggplot2*  ***

  • Automatic and semi-automatic markup of linguistic data*

  • Descriptive statistics*  ***

  • Association measures: collocation and collostructions*  ***

  • Predictive statistics: logistic and linear regression, mixed effects, multinomial Bayesian modeling, random forests and conditional inference trees*  ***

  • Latent Dirichlet topic models**

  • Distance metrics and cluster analysis*

  • Semantic vector spaces*

  • Phonetic analysis of audio data**

*    Topic requires R and RStudio.

** Topic may require software other than R, e.g. Python.

*** Includes visualization.

Students will be required to conduct a complete linguistic or language-based study that employs a range of the methods encountered in the course. One in-class progress report and one longer oral presentation are required during the semester, as well as submission of a write-up of the project with a script file containing any code that was written for the project. Most weeks there will also be requirements for occasional smaller assignments such as coding exercises and reading reports.