What to expect:

This is a hands-on course for linguists or CS majors new to NLP. It will provide the general overview of the field and cover the basic text processing pipelines (corpus collection, tokenization, stemming, POS-tagging, chunking, syntactic and semantic parsing). It will also introduce the basics of machine learning, including hands-on sentiment classification experiments with scikit-learn, and statistical analysis and visualization of the results.

Pre-requisites:

  • Basic programming concepts and some familiarity with Python (variables, loops, functions), such as can be obtained from this free introductory MITx course.
  • Prior knowledge of calculus and linear algebra is encouraged, but not required. The lectures aim to provide the high-level understanding of the basic machine learning concepts, although the formulas will be available to those of strong will.

Introduction to NLP with Python

The first week of the course will provide a hands-on introduction to the basics of NLP. We will cover corpus pre-processing pipelines, basic machine learning experiments on text classification, and analysis of their results. The course is aimed at linguists with only basic programming skills in Python, and will highlight in particular the problems with the current evaluation paradigms, dataset design, and general methodological challenges — the interdisciplinary areas in which their expertise is much needed.

List of topics

Monday: Introduction to NLP: a brief overview of the field: [tutorial] [slides]

Tuesday: Basic NLP pipeline with Python (NLTK, SpaCy): [tutorial] [slides]

Wednesday: Lexical resources in Python: [tutorial] [slides]

Thursday: Count-based vector space models with Python [tutorial] [slides]

Friday: Introduction to machine learning with Python [tutorial] [slides]

Extra: Python for data analysis