Unsupervised Methods for Syntactic Structure Induction
Unsupervised Methods for Syntactic Structure Induction
Monday 12.00 (c.t.) -14.00 C7 1 Besprechungsraum U15
STARTS 31.10.2016 - Held in English
Goal and Content of the Seminar:
Successful applications of unsupervised learning for natural language processing go back to at least the papers of Brown and colleagues in which they used word clusters to reduce the size of language models* and there are a number of earlier investigations. Since 2000 however the number of advancements in this area has increased and unsupervised approaches are becoming more and more useful in exploiting large amounts of data to e.g. supplement smaller annotated training sets. In this seminar we will focus on unsupervised algorithm for discovering syntactic structures. The structures we will discuss are Part-Of-Speech Annotations, Dependency Trees and Constituency Trees. All of which have been more or less successfully generated in an unsupervised way. The goal is to understand the connections between linguistic concepts and corpus statistics that are exploited by the different algorithms, as well as to understand how the information that results could be exploited in other tasks. In order to gain a better understanding of the algorithms presented and their limitations, you will try some of them out yourself and then analyze their behavior. The result will be a thorough comparison of different unsupervised systems.
* Brown, P.; deSouza, P.; Mercer, R.; Della Pietra, V. & Lai, J. (1992) Class-based n-gram models of natural language
Format and Grading:
We will start with a few sessions in which we will discuss overview papers and systems that are a good introduction to unsupervised syntactic structure induction, during this phase you are expected to give an introductory presentation for one of the sessions (no longer than 10 minutes) and help guide the discussion of the paper. During this time you can also learn how these systems usually work and develop a plan for evaluating them.
After those introductory sessions, you will pick 3 existing systems (maybe 2 if there is not enough choice, but then your evaluation should be better) which solve the same task and try to evaluate/compare them yourself. During this phase the weekly meetings will be your chance to discuss any questions about the way these systems work and how to best evaluate them. To get your grade you will write a short (no more than 8 pages of single column text) summary comparing the systems you tested and the results of your evaluation. Grading will be based on the summary and the presentation.
You are allowed to work on tasks as a group, but then you will have to do slightly more work (i.e. evaluate some more systems). Talk to me about what you have planned. Many systems for which results have been published do not have a publicly available implementation and as an alternative to evaluating systems, you can also re-implement a single system from a recent publication, I would recommend doing this as a group project. In this case you will write a report on your implementation (and what evaluations you did for it). No matter what you choose to do, please create a repository with all the code and documentation which you create during your work, which I can use to check your progress.
Creating your own implementation is of course a lot more work, so if you want to do this, I would suggest making the decision early in the semester and telling me about your plan so you can get started.
You have to hand in your paper 2 weeks before the end of the semester if you want to pass. If I do not have your paper (electronically (as a pdf(!)) sent to me or given to me or the secretary for Prof. Koller's group on paper) by midnight on 17.03.2017 then you fail. There will be no exceptions.
Requirements:
You need to be able to follow along with papers at the research level and you will often need to look up additional information in order to understand the papers. The discussion in the sessions will help with this, but it is still quite different from learning based on a textbook or tutorial.
Some background in machine learning is helpful but not necessary. We will try to work out everything else during the sessions.
You will also need a general programming background to figure out how to run the tools / write some scripts for their evaluation.
Your Questions (!!):
If you intend to attend one of the sessions in which a paper is discussed, please also send at least one question that you have concerning the paper to me 2 days (!!) before the start of the session. The question can be about anything in the paper, e.g. a technique used which you are not familiar with. This will hopefully allow us concentrate on the things that are most important in order for everyone to get the most out of each discussion.
At the end of each session we will take at least 15 minutes for you to ask any questions you have that don't relate to the paper we discussed. So you can send me questions that relate more generally to the content of the seminar as well.
Sessions:
31.10.2016: Introduction and Planning for the Seminar
07.11.2016: Paper discussion: Christodoulopoulos, C.; Goldwater, S. & Steedman, M. (2010) Two Decades of Unsupervised POS induction: How far have we come?
Presenter: Fraser Bowen - Slides
This provides a nice overview over some central ideas in unsupervised part-of-speech tagging and also introduces a lot of the modern evaluation techniques. But the paperreviews a lot of things, so it can only describe ideas on a very general level. Don't worry if some things are hard to follow. We will hopefully get to discuss the most important details during the session.
Further Reading
Johnson, M. (2007) Why doesn’t EM find good HMM POS-taggers?
Gao, J.; Johnson, M. (2008) A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers
Gelling, D.; Cohn, T.; Blunsom, P. & Graça, J. (2012) The PASCAL Challenge on Grammar Induction
14.11.2016: Paper discussion: Christodoulopoulos, C.; Goldwater, S. & Steedman, M. (2011) A Bayesian Mixture Model for Part-of-Speech Induction Using Multiple Features
Presenter: Guido Linders - Slides
The approach presented here builds very nicely on the ideas that proved useful in the previous paper. You can find an implementation of the algorithm here. At the end of this session you should have a general grasp of the ideas that make unsupervised part-of-speech induction work and an idea how Bayesian inference works, so I encourage all questions in that direction.
Further Reading
Lee, Y.K.; Haghighi, A. & Barzilay, R. (2010) Simple Type-Level Unsupervised POS Tagging
Biemann, C. (2006) Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering
21.11.2016: Hänig, C.; Bordag, S. & Quasthoff, U. (2008) UnsuParse: Unsupervised Parsing with Unsupervised Part-of-Speech Tagging
Presenter: Julia Dembowski - Slides
There aren't really any good overviews for unsupervised constituency parsing so we'll dive right in with this paper. The core approach is based on the intuition that tags that form simple constituents should occur in a more regular pattern than those that do not. This assumes almost no complex theory, so I hope it is an easy read.
Further Reading
Hänig, C. (2010) Improvements in unsupervised co-occurrence based parsing
Klein, D. & Manning, C. (2002) A Generative Constituent-Context Model for Improved Grammar Induction
28.11.2016: Ponvert, E.; Baldridge, J. & Erk, K. (2011) Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
Presenter: Ben Peters - Slides
This paper exploits the information contained in punctuation very explicitly. The intuition is that punctuation coincides with constituent borders. Therefore a model is trained to predict, when such a boundary should occur. The code can be found here.
Further Reading
Seginer, Y. (2007) Fast Unsupervised Incremental Parsing
Reichart, R. & Rappoport, A. (2009) Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser
05.12.2016: Klein, D. & Manning, C. (2004) Corpus-based induction of syntactic structure: models of dependency and constituency
Slides - Answers with to Questions
Presenter: Anastasija Amann and Ofek Lewinsohn - Slides
The work that Klein and Manning did on unsupervised dependency and constituency parsing in the early 2000s was the first time that unsupervised parsing broke a number of performance barriers. A lot of work is based on their models an so this paper is the key to understanding a lot of the other tools that deal with unsupervised dependency parsing in particular.
Further Reading
Golland, D.; DeNero, J. & Uszkoreit, J. (2012) A Feature-Rich Constituent Context Model for Grammar Induction
Spitkovsky, V.; Alshawi, H.; Chang, A. & Jurafski, D. (2011) Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Gelling, D.; Cohn, T.; Blunsom, P. & Graça, J. (2012) The PASCAL Challenge on Grammar Induction
Headden, W.; Johnson, M. & McClosky, D. (2009) Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
12.12.2016: Naseem, T.; Chen, H.; Barzilay, R & Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction
Presenter: Sophie Henning and Jonáš Vidra - Slides
This paper actually tries to exploit some simple linguistic knowledge. When reading this I think you should concentrate more on the knowledge they are using than on the exact implementation. A little disappointing is the fact that they still need gold part-of-speech tags to make this work.
02.01.2017: Picking additional systems / something to implement:
At this point you should already have picked at least 3 systems (that do roughly the same) which you will discuss in detail in your final paper. You can start your work earlier, but at this point I will record your choice. You will evaluate these systems on a number of corpora and compare their strengths. From this point on the sessions are intended to give you the chance to ask any questions you may have about the systems you are investigating. In this session you get to announce your pick and discuss questions.
If you are implementing a system I will also record your choice at this point and you should also ask any questions you may have.
09.01.2017: Question Session
Please send me your main questions 2 days before the session (!) so I have time to prepare.
16.01.2017: Question Session
Please send me your main questions 2 days before the session (!) so I have time to prepare.
23.01.2017: Question Session
Please send me your main questions 2 days before the session (!) so I have time to prepare.
30.01.2017: Question Session
Please send me your main questions 2 days before the session (!) so I have time to prepare.
06.02.2017: Question Session
Please send me your main questions 2 days before the session (!) so I have time to prepare.
13.02.2017: Question Session
Please send me your main questions 2 days before the session (!) so I have time to prepare.
Other Papers on Unsupervised Methods:
Tran, K.; Bisk, Y.; Vaswani, A.; Marcu, D. & Knight, K. (2016) Unsupervised Neural Hidden Markov Models
Yatbaz, M.; Sert, E. & Yuret, D. (2012) Learning Syntactic Categories Using Paradigmatic Representations of Word Context
Cohen, S. & Smith, N. (2010) Covariance in Unsupervised Learning of Probabilistic Grammars
Grave, E. & Elhadad, N. (2015) A convex and feature-rich discriminative approach to dependency grammar induction
Solan, Z.; Horn, D.; Ruppin, E. & Edelman, S. (2005) Unsupervised Learning of Natural Languages
Huang, Y.; Zhang, M. & Lim Tan, C. (2014) Improved Constituent Context Model with Features
Spitkovsky, V.; Alshawi, H. & Jurafsky, D. (2009) Baby Steps: How “Less is More” in Unsupervised Dependency Parsing
Spitkovsky, V.; Alshawi, H. & Jurafsky, D. (2010) From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing
Spitkovsky, V.; Alshawi, H.; Jurafsky, D. & Manning, C. (2010) Viterbi Training Improves Unsupervised Dependency Parsing
Spitkovsky, V.; Alshawi, H. & Jurafsky, D. (2011) Punctuation: Making a Point in Unsupervised Dependency Parsing
Spitkovsky, V.; Alshawi, H. & Jurafsky, D. (2011) Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
Spitkovsky, V.; Alshawi, H. & Jurafsky, D. (2012) Three Dependency-and-Boundary Models for Grammar Induction
Spitkovsky, V.; Alshawi, H. & Jurafsky, D. (2012) Capitalization Cues Improve Dependency Grammar Induction
Le, P. & Zuidema, W. (2015) Unsupervised Dependency Parsing: Let’s Use Supervised Parsers
Christodoulopoulos, C.; Goldwater, S. & Steedman, M. (2012) Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction
Mareček, D. & Straka, M. (2013) Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
Mareček, D. & Zdeněk Žabokrtský (2012) Exploiting Reducibility in Unsupervised Dependency Parsing
Mareček, D. & Zdeněk Žabokrtský (2012) Unsupervised Dependency Parsing using Reducibility and Fertility features
Blunsom, P. & Cohn, T. (2010) Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
Rasooli, M. & Faili, H. (2012) Fast Unsupervised Dependency Parsing with Arc-Standard Transitions
Berg-Kirkpatrick, T. & Klein, D. (2010) Phylogenetic Grammar Induction
Lin, C.; Ammar, W.; Dyer, C. & Levin, L. (2015) Unsupervised POS Induction with Word Embeddings
Yatbaz, M.; Sert, E. & Yuret, D. (2014) Unsupervised Instance-Based Part of Speech Induction Using Probable Substitutes
Gimpel, K. & Smith, N. (2012) Concavity and Initialization for Unsupervised Dependency Parsing
Cohen, S.; Das, D. & Smith, N. (2011) Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Gelling, D.; Cohn, T.; Blunsom, P. & Graça, J. (2012) The PASCAL Challenge on Grammar Induction
Anders Søgaard (2012) Two baselines for unsupervised dependency parsing
Pate, J. & Goldwater, S. (2013) Unsupervised Dependency Parsing with Acoustic Cues
Schwartz, R.; Abend, O.; Reichart, R. & Rappoport, A. (2011) Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
Cohen, S. & Smith, N. (2010) Covariance in Unsupervised Learning of Probabilistic Grammars
Bisk, Y. & Hockenmaier, J. (2013) An HDP Model for Inducing Combinatory Categorial Grammars
Bisk, Y. & Hockenmaier, J. (2012) Induction of Linguistic Structure with Combinatory Categorial Grammars
Bod, R. (2006) An All-Subtrees Approach to Unsupervised Parsing
Bod, R. (2006) Unsupervised Parsing with U-DOP
Brody, S. (2010) It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment
Dennis, Simon (2005) An Exemplar-based Approach to Unsupervised Parsing
Reichart, R. & Rappoport, A. (2009) Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser
Headden, W.; Johnson, M. & McClosky, D. (2009) Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
Liang, P. & Klein, D. (2008) Analyzing the Errors of Unsupervised Learning
Reichart, R. & Rappoport, A. (2010) Improved Fully Unsupervised Parsing with Zoomed Learning
Reichart, R. & Rappoport, A. (2008) Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features
Abend, O.; Reichart, R. & Rappoport, A. (2010) Improved Unsupervised POS Induction through Prototype Discovery
Santamaría, J. & Araujo, L. (2010) Identifying Patterns for Unsupervised Grammar Induction
Parikh, A.; Cohen, S. & Xing, E. (2014) Spectral Unsupervised Parsing with Additive Tree Metrics
Smith, N. & Eisner, J. (2005) Guiding Unsupervised Grammar Induction Using Contrastive Estimation
Snyder, B.; Naseem, T.; Eisenstein, J. & Barzilay, R. (2008) Unsupervised Multilingual Learning for POS Tagging
Vlachos, A. (2011) Evaluating unsupervised learning for natural language processing tasks
Jiang, Y.; Han, W. & Tu, K. (2016) Unsupervised Neural Dependency Parsing
Tu, K. & Honavar, V. (2012) Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Stratos, K.; Collins, M. & Hsu, D. (2016) Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models
General Reading:
Kevin Knights tutorial on Bayesian Inference can be found here.
There are lots of tutorials on Gibbs Sampling. The first hit I got on Google for "Gibbs Sampling Tutorial" seems pretty good.
Neal and Hinton once wrote a paper that contains one of the most clear, yet detailed explanations of the Expectation Maximization Algorithm that I have seen. You can get a much shorter explanation based on MM-Algorithms, but then EM comes out of left field and that might confuse people who are unfamiliar.
There is a guide by David Blei which summarizes the main ideas behind Variational Inference.
Merialdo, B. (1994) Tagging English Text with a Probabilistic Model
Existing Tools (I have not [recently] tried compiling/running any of these tools):
Unsupervised Part-Of-Speech Taggers
Percy Liang's re-implementation of the clustering model from Class-based n-gram models of natural language can be found here.
A very well know unsupervised part-of-speech tagger by Alexander Clark implements the algorithm from Combining Distributional and Morphological Information for Part-of-Speech Induction and can be found here.
An implementation of the algorithm presented in A Bayesian Mixture Model for Part-of-Speech Induction Using Multiple Features is available.
You can find C++ code for the technique from Simple Type-Level Unsupervised POS Tagging.
As long as Google Code still hosts old repositories, you can find a (hopefully working) implementation of Posterior Regularization Tools for unsupervised POS tagging and dependency parsing here.
The implementation of Unsupervised Neural Hidden Markov Models is available online.
Here you can find the code for Learning Syntactic Categories Using Paradigmatic Representations of Word Context.
The code for A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers is hosted by Microsoft
There is a pretty good implementation for Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering on Chris Biemann's old homepage.
There is code that implements Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models
Tools for Constituency Parsing
An implementation for Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models can be found here.
The constituency parser from Fast Unsupervised Incremental Parsing can be obtained at Yoav Seginer's old site.
There is an implementation of the constituency and dependency code from Corpus-based induction of syntactic structure: models of dependency and constituency.
On this site there is an implementation for the ideas from Unsupervised Learning of Natural Languages.
Tools for Dependency Parsing
As long as Google Code still hosts old repositories, you can find a (hopefully working) implementation of Posterior Regularization tools for unsupervised POS tagging and dependency parsing here.
There is an implementation of the constituency and dependency code from Corpus-based induction of syntactic structure: models of dependency and constituency.
There is an implementation of the algorithms from Covariance in Unsupervised Learning of Probabilistic Grammars.
Allegedly there is an implementation for A convex and feature-rich discriminative approach to dependency grammar induction which can be obtained by writing an E-Mail to the first author, but I could not find an E-Mail address anywhere. If you think you have better web search skills, then it might be worth a shot.
You can find an implementation for Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing on this site.
Yonatan Bisk has published the code used for An HDP Model for Inducing Combinatory Categorial Grammars.
On Samuel Brody's publication list there is an implementation for the ideas from It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment.
You can find the code for Using Universal Linguistic Knowledge to Guide Grammar Induction under this link.