Algorithms for Predicting Structured Data

(a tutorial at ECML PKDD 2010)

Thomas Gärtner (Fraunhofer IAIS, Sankt Augustin, Germany)
Shankar Vembu (University of Illinois at Urbana-Champaign, USA)

Tutorial Description
Structured prediction is the problem of predicting multiple outputs with complex internal structure and dependencies among them. Algorithms and models for predicting structured data have been in use for a long time. For example, recurrent neural networks and hidden Markov models have found interesting applications in temporal pattern recognition problems such as speech recognition. With the introduction of support vector machines in the 1990s, there has been a lot of interest in the machine learning community in discriminative models of learning. In this tutorial, we plan to cover recent developments in discriminative learning algorithms for predicting structured data.

We believe this tutorial will be of interest to machine learning researchers including graduate students who would like to gain an understanding of structured prediction and state-of-the-art approaches to solve this problem. Structured prediction has several applications in the areas of natural language processing, computer vision and computational biology, just to name a few. We believe the material presented in this tutorial will also be of interest to researchers working in the aforementioned application areas.

The tutorial is divided into three parts.

Part 1 (Binary --> Structured)
Focuses on describing how to extend / generalize binary classification algorithms to structured prediction. Introduces various loss functions and algorithms from the structured prediction literature. Specific topics include perceptron (constraint classification, structured  perceptron), regression (kernel dependency estimation), SVMs (struct-SVM, max-margin Markov network), logistic regression (exponential family, conditional random field), learning reductions etc.

Part 2 (Exact and Approximate Inference)
Describes the assumptions made by structured prediction algorithms to ensure efficient learning, positive and negative results for approximate inference, techniques that cope with approximate inference, and also algorithms that do *not* use inference during training.

Part 3 (some recent topics)
Focuses on some recent developments in weakly-supervised learning.

Part 1 [pdf]
Part 2 [draft]
Part 3 [pdf]

References (TBU) 
  • Gökhan H. Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar, and S.V.N. Vishwanathan. Predicting structured data. MIT Press, Cambridge, Massachusetts, USA, 2007.
  • James Clarke, Dan Goldwasser, Ming-Wei Chang, and D. Roth. Driving Semantic Parsing from the World's Response. In Proceedings of the 14th Conference on Computational Natural Language Learning, 2010.
  • Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. In Proceedings of the 14th Annual Conference on Computational Learning Theory, and 5th European Conference on Computational Learning Theory, 2001.
  • Ming-Wei Chang, Lev-Arie Ratinov, and Dan Roth. Guiding Semi-Supervision with Constraint-Driven Learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2007.
  • Ming-Wei Chang, Vivek Srikumar, Dan Goldwasser, and Dan Roth. Structured output learning with indirect supervision. In Proceedings of the International Conference on Machine Learning, 2010.
  • Michael Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002.
  • Corinna Cortes, Mehryar Mohri, and Jason Weston. A general regression technique for learning transductions. In Proceedings of the International Conference on Machine Learning, 2005.
  • Hal Daumé III, John Langford, and Daniel Marcu. Search-based structured prediction. Machine Learning 75(3):297--325, 2009.
  • Thomas Finley and Thorsten Joachims. Training structural SVMs when exact inference is intractable. In Proceedings of the 25th International Conference on Machine Learning, 2008.
  • Kuzman Ganchev, João Graça, Jennifer Gillenwater, and Ben Taska. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001--2049, 2010
  • Thomas Gärtner and Shankar Vembu. On structured output training: Hard cases and an efficient alternative. Machine Learning Journal (Special Issue of ECML PKDD), 76(2):227--242, 2009.
  • Sariel Har-Peled, Dan Roth and Dav Zimak. Constraint Classification: A New Approach to Multiclass Classification. In Proceedings of the 13th International Conference on Algorithmic Learning Theory, 2002.
  • Alex Kulesza and Fernando Pereira. Structured learning with approximate inference. In Advances in Neural Information Processing Systems 20, 2007.
  • John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, 2001.
  • André F. T. Martins, Noah A. Smith, and Eric P. Xing. Polyhedral outer approximations with application to natural language parsing. In Proceedings of the 26th Annual International Conference on Machine Learning, 2009.
  • Ofer Meshi, David Sontag, Tommi Jaakkola, and Amir Globerson. Learning Efficiently with Approximate Inference via Dual Losses. In Proceedings of the 27th International Conference on Machine Learning, 2010.
  • Vasin Punyakanok, Dan Roth, Wen-tau Yih, and Dav Zimak. Learning and inference over constrained output. In Proceedings of the International Joint Conference on Artificial Intelligence, 2005
  • Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin Markov networks. In Advances in Neural Information Processing Systems 16, 2003.
  • Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. Learning structured prediction models: A large margin approach. In Proceedings of the 22nd International Conference on Machine Learning, 2005.
  • Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453--1484, 2005.
  • Jason Weston, Olivier Chapelle, André Elisseeff, Bernhard Schölkopf, and Vladimir Vapnik. Kernel dependency estimation. In Advances in Neural Information Processing Systems 15, 2002.