Algorithms for Predicting Structured Data
(a tutorial at ECML PKDD 2010)
Thomas Gärtner (Fraunhofer IAIS, Sankt Augustin, Germany)
Shankar Vembu (University of Illinois at UrbanaChampaign, USA)
Tutorial Description
Structured prediction is the problem of predicting multiple outputs with complex internal structure and dependencies among them. Algorithms and models for predicting structured data have been in use for a long time. For example, recurrent neural networks and hidden Markov models have found interesting applications in temporal pattern recognition problems such as speech recognition. With the introduction of support vector machines in the 1990s, there has been a lot of interest in the machine learning community in discriminative models of learning. In this tutorial, we plan to cover recent developments in discriminative learning algorithms for predicting structured data.
We believe this tutorial will be of interest to machine learning researchers including graduate students who would like to gain an understanding of structured prediction and stateoftheart approaches to solve this problem. Structured prediction has several applications in the areas of natural language processing, computer vision and computational biology, just to name a few. We believe the material presented in this tutorial will also be of interest to researchers working in the aforementioned application areas.
Outline
The tutorial is divided into three parts.
Part 1 (Binary > Structured)
Focuses on describing how to extend / generalize binary classification algorithms to structured prediction. Introduces various loss functions and algorithms from the structured prediction
literature. Specific topics include perceptron (constraint classification, structured
perceptron), regression (kernel dependency estimation), SVMs (structSVM, maxmargin Markov network), logistic regression (exponential family, conditional random field), learning reductions etc.
Part 2 (Exact and Approximate Inference)
Describes the assumptions made by structured prediction algorithms to ensure efficient learning, positive and negative results for approximate inference, techniques that cope with approximate inference, and also algorithms that do *not* use inference during training.
Part 3 (some recent topics)
Focuses on some recent developments in weaklysupervised learning.
SlidesPart 1 [ pdf] Part 2 [ draft] Part 3 [ pdf]
References (TBU)
 Gökhan H. Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar, and S.V.N. Vishwanathan. Predicting structured data. MIT Press, Cambridge, Massachusetts, USA, 2007.
 James Clarke, Dan Goldwasser, MingWei Chang, and D. Roth. Driving Semantic Parsing from the World's Response. In Proceedings of the 14th Conference on Computational Natural Language Learning, 2010.
 Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. In Proceedings of the 14th Annual Conference on Computational Learning Theory, and
5th European Conference on Computational Learning Theory, 2001.
 MingWei Chang, LevArie Ratinov, and Dan Roth. Guiding SemiSupervision with ConstraintDriven Learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2007.
 MingWei Chang, Vivek Srikumar, Dan Goldwasser, and Dan Roth. Structured output learning with indirect supervision. In Proceedings of the International Conference on Machine Learning, 2010.

Michael Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002.
 Corinna Cortes,
Mehryar Mohri,
and Jason Weston.
A general regression technique for learning transductions.
In Proceedings of the International Conference on Machine Learning, 2005.

Hal Daumé III, John Langford, and Daniel Marcu. Searchbased structured prediction. Machine Learning 75(3):297325, 2009.

Thomas Finley and Thorsten Joachims. Training structural SVMs when exact inference is intractable. In Proceedings of the 25th International Conference on Machine Learning, 2008.
 Kuzman Ganchev, João Graça, Jennifer Gillenwater, and Ben Taska. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:20012049, 2010
 Thomas Gärtner and Shankar Vembu. On structured output training: Hard cases and an efficient alternative. Machine Learning Journal (Special Issue of ECML PKDD), 76(2):227242, 2009.
 Sariel HarPeled, Dan Roth and Dav Zimak. Constraint Classification: A New Approach to Multiclass Classification. In Proceedings of the 13th International Conference on Algorithmic Learning Theory, 2002.
 Alex Kulesza and Fernando Pereira. Structured learning with approximate inference. In Advances in Neural Information Processing Systems 20, 2007.

John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, 2001.

André F. T. Martins, Noah A. Smith, and Eric P. Xing. Polyhedral outer approximations with application to natural language parsing. In Proceedings of the 26th Annual International Conference on Machine Learning, 2009.
 Ofer Meshi, David Sontag, Tommi Jaakkola, and Amir Globerson. Learning Efficiently with Approximate Inference via Dual Losses. In Proceedings of the 27th International Conference on Machine Learning, 2010.
 Vasin Punyakanok, Dan Roth, Wentau Yih, and Dav Zimak. Learning and inference over constrained output. In Proceedings of the International Joint Conference on Artificial Intelligence, 2005
 Ben Taskar, Carlos Guestrin, and Daphne Koller. Maxmargin Markov networks. In Advances in Neural Information Processing Systems 16, 2003.

Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. Learning structured prediction models: A large margin approach. In Proceedings of the 22nd International Conference on Machine Learning, 2005.

Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:14531484, 2005.

Jason Weston, Olivier Chapelle, André Elisseeff, Bernhard Schölkopf, and Vladimir Vapnik. Kernel dependency estimation. In Advances in Neural Information Processing Systems 15, 2002.

