Teaching‎ > ‎

Unsupervised Methods for Syntactic Structure Induction

Monday 12.00 (c.t.) -14.00 C7 1 Besprechungsraum U15
STARTS 31.10.2016 - Held in English

Goal and Content of the Seminar:

Successful applications of unsupervised learning for natural language processing go back to at least the papers of Brown and colleagues in which they used word clusters to reduce the size of language models* and there are a number of earlier investigations. Since 2000 however the number of advancements in this area has increased and unsupervised approaches are becoming more and more useful in exploiting large amounts of data to e.g. supplement smaller annotated training sets. In this seminar we will focus on unsupervised algorithm for discovering syntactic structures. The structures we will discuss are Part-Of-Speech Annotations, Dependency Trees and Constituency Trees. All of which have been more or less successfully generated in an unsupervised way. The goal is to understand the connections between linguistic concepts and corpus statistics that are exploited by the different algorithms, as well as to understand how the information that results could be exploited in other tasks. In order to gain a better understanding of the algorithms presented and their limitations, you will try some of them out yourself and then analyze their behavior. The result will be a thorough comparison of different unsupervised systems.

* Brown, P.; deSouza, P.; Mercer, R.; Della Pietra, V. & Lai, J. (1992)  Class-based n-gram models of natural language

Format and Grading:

We will start with a few sessions in which we will discuss overview papers and systems that are a good introduction to unsupervised syntactic structure induction, during this phase you are expected to give an introductory presentation for one of the sessions (no longer than 10 minutes) and help guide the discussion of the paper. During this time you can also learn how these systems usually work and develop a plan for evaluating them.

After those introductory sessions, you will pick 3 existing systems (maybe 2 if there is not enough choice, but then your evaluation should be better) which solve the same task and try to evaluate/compare them yourself. During this phase the weekly meetings will be your chance to discuss any questions about the way these systems work and how to best evaluate them. To get your grade you will write a short (no more than 8 pages of single column text) summary comparing the systems you tested and the results of your evaluation. Grading will be based on the summary and the presentation.

You are allowed to work on tasks as a group, but then you will have to do slightly more work (i.e. evaluate some more systems). Talk to me about what you have planned. Many systems for which results have been published do not have a publicly available implementation and as an alternative to evaluating systems, you can also re-implement a single system from a recent publication, I would recommend doing this as a group project. In this case you will write a report on your implementation (and what evaluations you did for it). No matter what you choose to do,  please create a repository with all the code and documentation which you create during your work, which I can use to check your progress.

Creating your own implementation is of course a lot more work, so if you want to do this, I would suggest making the decision early in the semester and telling me about your plan so you can get started.

You have to hand in your paper 2 weeks before the end of the semester if you want to pass. If I do not have your paper (electronically (as a pdf(!)) sent to me or given to me or the secretary for Prof. Koller's group on paper) by midnight on 17.03.2017 then you fail. There will be no exceptions.


You need to be able to follow along with papers at the research level and you will often need to look up additional information in order to understand the papers. The discussion in the sessions will help with this, but it is still quite different from learning based on a textbook or tutorial.
Some background in machine learning is helpful but not necessary. We will try to work out everything else during the sessions.

You will also need a general programming background to figure out how to run the tools / write some scripts for their evaluation.

Your Questions (!!):

If you intend to attend one of the sessions in which a paper is discussed, please also send at least one question that you have concerning the paper to me 2 days (!!) before the start of the session. The question can be about anything in the paper, e.g. a technique used which you are not familiar with. This will hopefully allow us concentrate on the things that are most important in order for everyone to get the most out of each discussion.
At the end of each session we will take at least 15 minutes for you to ask any questions you have that don't relate to the paper we discussed. So you can send me questions that relate more generally to the content of the seminar as well.


31.10.2016: Introduction and Planning for the Seminar

07.11.2016: Paper discussion: Christodoulopoulos, C.; Goldwater, S. & Steedman, M. (2010) Two Decades of Unsupervised POS induction: How far have we come?
                           Hypothes.is Link
Presenter: Fraser Bowen - Slides

This provides a nice overview over some central ideas in unsupervised part-of-speech tagging and also introduces a lot of the modern evaluation techniques. But the paper reviews a lot of things, so it can only describe ideas on a very general level. Don't worry if some things are hard to follow. We will hopefully get to discuss the most important details during the session.

Further Reading

14.11.2016: Paper discussion: Christodoulopoulos, C.; Goldwater, S. & Steedman, M. (2011) A Bayesian Mixture Model for Part-of-Speech Induction Using Multiple Features
                           Hypothes.is Link
                           Slides - Answers to Questions
Presenter: Guido Linders - Slides

The approach presented here builds very nicely on the ideas that proved useful in the previous paper.  You can find an implementation of the algorithm here. At the end of this session you should have a general grasp of the ideas that make unsupervised part-of-speech induction work and an idea how Bayesian inference works, so I encourage all questions in that direction.

Further Reading
Lee, Y.K.; Haghighi, A. & Barzilay, R. (2010) Simple Type-Level Unsupervised POS Tagging

21.11.2016: Hänig, C.; Bordag, S. & Quasthoff, U. (2008) UnsuParse: Unsupervised Parsing with Unsupervised Part-of-Speech Tagging

There aren't really any good overviews for unsupervised constituency parsing so we'll dive right in with this paper. The core approach is based on the intuition that tags that form simple constituents should occur in a more regular pattern than those that do not. This assumes almost no complex theory, so I hope it is an easy read.

Further Reading

28.11.2016: Ponvert, E.; Baldridge, J. & Erk, K. (2011) Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

This paper exploits the information contained in punctuation very explicitly. The intuition is that punctuation coincides with constituent borders. Therefore a model is trained to predict, when such a boundary should occur. The code can be found here.

Further Reading

Presenter: Anastasija Amann and Ofek Lewinsohn - Slides

The work that Klein and Manning did on unsupervised dependency and constituency parsing in the early 2000s was the first time that unsupervised parsing broke a number of performance barriers. A lot of work is based on their models an so this paper is the key to understanding a lot of the other tools that deal with unsupervised dependency parsing in particular.

Further Reading
Golland, D.; DeNero, J. & Uszkoreit, J. (2012) A Feature-Rich Constituent Context Model for Grammar Induction
Spitkovsky, V.; Alshawi, H.; Chang, A. & Jurafski, D. (2011) Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Gelling, D.; Cohn, T.; Blunsom, P. & Graça, J. (2012) The PASCAL Challenge on Grammar Induction
Headden, W.; Johnson, M. & McClosky, D. (2009) Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing

12.12.2016: Naseem, T.;  Chen, H.; Barzilay, R & Johnson, M. (2010)  Using Universal Linguistic Knowledge to Guide Grammar Induction
Hypothes.is Link
Slides - Answers to Questions
Presenter: Sophie Henning and Jonáš Vidra - Slides

This paper actually tries to exploit some simple linguistic knowledge. When reading this I think you should concentrate more on the knowledge they are using than on the exact implementation. A little disappointing is the fact that they still need gold part-of-speech tags to make this work.

02.01.2017: Picking additional systems / something to implement:
At this point you should already have picked at least 3 systems (that do roughly the same) which you will discuss in detail in your final paper. You can start your work earlier, but at this point I will record your choice. You will evaluate these systems on a number of  corpora and compare their strengths. From this point on the sessions are intended to give you the chance to ask any questions you may have about the systems you are investigating. In this session you get to announce your pick and discuss questions.

If you are implementing a system I will also record your choice at this point and you should also ask any questions you may have.

09.01.2017: Question Session

Please send me your main questions 2 days before the session (!) so I have time to prepare.

16.01.2017: Question Session

Please send me your main questions 2 days before the session (!) so I have time to prepare.

23.01.2017: Question Session

Please send me your main questions 2 days before the session (!) so I have time to prepare.

30.01.2017: Question Session

Please send me your main questions 2 days before the session (!) so I have time to prepare.

06.02.2017: Question Session

Please send me your main questions 2 days before the session (!) so I have time to prepare.

13.02.2017: Question Session

Please send me your main questions 2 days before the session (!) so I have time to prepare.

Other Papers on Unsupervised Methods:

General Reading:
  • Kevin Knights tutorial on Bayesian Inference can be found here.
  • There are lots of tutorials on Gibbs Sampling. The first hit I got on Google for "Gibbs Sampling Tutorial" seems pretty good.
  • Neal and Hinton once wrote a paper that contains one of the most clear, yet detailed explanations of the Expectation Maximization Algorithm that I have seen. You can get a much shorter explanation based on MM-Algorithms, but then EM comes out of left field and that might confuse people who are unfamiliar.
  • There is a guide by David Blei which summarizes the main ideas behind Variational Inference.
  • Merialdo, B. (1994) Tagging English Text with a Probabilistic Model

Existing Tools (I have not [recently] tried compiling/running any of these tools):

Unsupervised Part-Of-Speech Taggers

Tools for Constituency Parsing

Tools for Dependency Parsing