Deep RL Meets Structured Prediction

Invited Speakers

Jessica B. Hamrick

Structured Computation and Representation in Deep Reinforcement Learning

Abstract: The world around us has rich structure, corresponding to objects and entities, the relationships between them, and rules for composing them into new objects and entities. As such, incorporating structure into deep learning architectures often affords stronger performance and greater generalization; the proliferation of convolutional and recurrent architectures (which both assume a particular type of structure) is a testament to this claim. In this talk, I will argue for thinking about structure more explicitly, and will distinguish between two types of structure: structured computation (how individual computations or functions are composed into more complex structures), and structured representation (the format of the data that computations are performed over, such as sequences, sets, graphs, and programs). I will present results from two projects which make interesting choices about both of these types of structure. In particular, I will show that agents which use structured representations (e.g., objects and scene graphs) and structured policies (e.g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training.

Bio: http://www.jesshamrick.com/

Anima Anandkumar

Abstract: Standard deep learning algorithms are based on a function-fitting approach that do not exploit any domain knowledge or constraints. This has several shortcomings: high sample complexity, and lack of robustness and generalization, especially under domain or task shifts. I will show several ways to infuse structure and domain knowledge to overcome these limitations, viz., tensors, graphs, symbolic rules, physical laws, and simulations.

Bio: http://tensorlab.cms.caltech.edu/users/anima/

Graham Neubig

Abstract: In 2016, I co-authored a wide-sweeping survey on training techniques for statistical machine translation (http://www.phontron.com/paper/neubig16cl.pdf). This survey was promptly, and perhaps appropriately, forgotten in the tsunami of enthusiasm for new advances in neural machine translation (NMT). Now that the dust has settled after five intense years of research into NMT and training methods therefore, perhaps it is time to revisit our old knowledge and see what it can teach us with respect to training techniques for NMT. In this talk, I will broadly overview several years of research into sequence-level training objectives for NMT, then point out a several areas where our understanding of training techniques for NMT still lags significantly behind what we knew for more traditional approaches to SMT.

Bio: Graham Neubig is an assistant professor at the Language Technologies Institute of Carnegie Mellon University. His work focuses on natural language processing, specifically multi-lingual models that work in many different languages, and natural language interfaces that allow humans to communicate with computers in their own language. Much of this work relies on machine learning to create these systems from data, and he is also active in developing methods and algorithms for machine learning over natural language data. He publishes regularly in the top venues in natural language processing, machine learning, and speech, and his work occasionally wins awards such as best papers at EMNLP, EACL, and WNMT. He is also active in developing open-source software, and is the main developer of the DyNet neural network toolkit.

Mohammad Norouzi

Abstract: Many recent papers cast structured prediction as reinforcement learning (RL) and use off-the-shelf sample-based RL algorithms to address machine translation, speech recognition, semantic parsing, etc. I will argue that off-the-shelf RL algorithms are not desirable for structured prediction mainly because they do not make use of the determinism and availability of the search space for structured prediction tasks. Then, I will introduce two general approaches to structured prediction, inspired by value-based and policy-based RL, with applications to speech recognition and semantic parsing. For structured losses such as word error rate in speech recognition, I present a dynamic programming algorithm to identify the optimal extension for each partial output sequence and a way to distill the knowledge of optimal extensions into a neural network. This results in a significant reduction of word error rate compared to standard MLE solutions. For unstructured losses (e.g., sparse success-failure feedback), I present a novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimates. This yields state-of-the-art performance in weakly supervised semantic parsing.

Bio: Mohammad Norouzi is a Senior Research Scientist at Google Brain in Toronto, working in Geoff Hinton’s group. He received his PhD in computer science at the University of Toronto in 2016, under the supervision of David Fleet, working on scalable similarity search algorithms. His PhD was supported by a Google US/Canada PhD fellowship in machine learning. His research interests span a broad range of topics in deep learning, natural language processing, and computer vision, with a focus on statistical generative models and reinforcement learning algorithms and applications.