Deep Learning and Formal Languages: Building Bridges

Invited Speakers

We are delighted to announce the following invited speakers:

Rémi Eyraud (Aix-Marseilles University)

Distilling computational models from Recurrent Neural Networks

Recent practical successes of machine learning - for instance in signal processing, natural language processing, or image retrieval - heavily rely on the training of powerful models such as deep neural networks. However, the decision taken by these models are hard to interpret - they are usually seen as black boxes - and using them can require important amount of computation power.

This talk will focus on the link between computational models and Recurrent Neural Networks (RNN). These latter models are used for sequential data: text, speech, temporal series, etc. I'll give an overview of the different results that target the distillation of well-known computational models from an already learned RNN. While most of these works considered the distillation of Deterministic Finite state Automata (DFA), I will detail the extraction of Weighted Automata using a spectral approach, following a work in collaboration with Stephane Ayache and Noé Goudian.


Rémi Eyraud is a junior professor at the Aix-Marseille University, France. After the defense of his Ph.D (2006) at the University of Saint-Etienne (France), he moved for a post-doc at the University of Amsterdam (The Netherlands). After being hired at his current position, he spent a couple of years as an invited researcher at two US West Coast universities: the University of Maryland, Baltimore County and the University of Delaware.

His research interests are mainly within the field of grammatical inference with a collateral interest for computational linguistics.

Robert Frank (Yale University)

Beyond testing and acceptance: On the study of formal and natural languages in neural networks

The abilities of neural network language models to represent formal and natural languages have been probed largely through tests of string completion or acceptance decisions at the end of a string. While certainly revealing, it is hard to take such explorations as truly indicative of the properties of a neural architecture. Results can vary because of differences in the effectiveness of the associated training regimens, and it is not always clear how regularities are being encoded (e.g., the use of a counter or a stack). More fundamentally, these kind of studies ignore the basic question that has for some time been the focus of researchers working at the nexus of formal language theory and natural language: how good is the fit between abstract properties of natural language and those of the languages that are representable by a formal system like a neural network? In this talk, I will lay out some of the key properties of natural language that I believe need to be explored in the neural network context, discuss some potential obstacles for progress, and present some preliminary results.


Robert Frank is Professor of Linguistics at Yale University. He received his PhD from the University of Pennsylvania (Computer and Information Science) and has taught at Johns Hopkins University (Cognitive Science) and the University of Delaware (Linguistics). His research explores models of language learning and processing and the role of computationally constrained grammar formalisms, especially Tree Adjoining Grammar, in linguistic explanation.

John Kelleher (Technological University Dublin)

Using formal grammars to test ability of recurrent neural networks to model long-distance dependencies in sequential data.

From the early 1990s on, it has been known that long-distance dependencies (LDDs) pose particular difficulties for recurrent neural networks trained using gradient descent. Over the last number of decades various recurrent neural architectures have been proposed to overcome this problem. However, most of the research on developing computational models capable of processing sequential data fails to explicitly analyze, in terms of presence and/or degree, the LDDs within the datasets used to train and evaluate these models. This lack of understanding of the LDDs within benchmark datasets necessarily limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use formal languages to generate benchmark datasets with specific and well-understood properties. For example, when using Strictly k-Piecewise languages to generate datasets the degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. This talk will present research we have carried out using formal languages to explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.


Prof. John D. Kelleher ( is the Academic Leader of the ICE Research Institute at the Technological University Dublin ( He is also the head of the ADAPT Research Centre at TU Dublin (, funded via Science Foundation Ireland grant No. 13/RC/2106, and leads the machine learning research focused on developing clinical decision support systems for the treatment of stroke as part of the the PRECISE4Q project (, funded un the EU's H2020 programme grant No. 777107. He has published extensively in the areas of Artificial Intelligence, Natural Language Processing, and Machine Learning, recent highlights include the following books: Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015; Data Science, MIT Press, 2018; and Deep Learning, MIT Press, 2019 (forthcoming). In recent years, a point of convergence for much of John's research is the challenge of modelling long-distance dependencies in sequential/time-series data.

Kevin Knight (Didi)

Do Simpler Automata Learn Better?

Linguists have long searched for the simplest framework that can explain human language, eager to discover strong constraints that make human language learning possible. Computational linguists have also sought out simple automata, because they admit efficient algorithms. But in the current era of neural networks, is this just old-think?


Kevin Knight is Chief Scientist for Natural Language Processing at DiDi. He received a PhD in computer science from Carnegie Mellon University and a bachelor's degree from Harvard University. Dr. Knight's research interests include human-machine communication, machine translation, language generation, automata theory, and decipherment. He has co-authored over 150 research paper on natural language processing, as well as the widely-adopted textbook "Artificial Intelligence" (McGraw-Hill). In 2001, he co-founded Language Weaver, Inc., a machine translation company acquired by SDL plc in 2010. Dr. Knight served as President of the Association for Computational Linguistics (ACL) in 2011, as General Chair for ACL in 2005, and as General Chair for the North American ACL in 2016. He is a Fellow of the ACL, USC/ISI, and AAAI.

Ariadna Quattoni (dMetrics)

A story about weighted automata (WFAs), RNNs and low-rank Hankel Matrices.

It has been shown that WFAs are in essence a sub-class of RNNs with linear activation functions. And there is an implicit belief in the NLP community that the non-linear expressivity of general RNNs results in improvements over WFAs for sequence modeling. However, there is a lack of side-to-side comparisons that consider all the relevant variables when training WFAs, such as the choice of loss function and the size of the longest statistics used to train the model. In this talk I will present results on this direction. During the talk I will also discuss how the fundamental concepts of the spectral learning theory of WFAs, a.k.a. low-rank Hankel Matrices, can be used to better understand WFAs and by extension RNNs.


Ariadna is the cofounder and CTO of dMetrics, a machine learning and natural processing company that has received multiple awards from NSF and serves major clients in the financial and pharmaceutical sectors. She received her PhD in Computer Science from MIT in 2009 and has worked as a Research Scientist at Xerox Research Centre Europe and the Technical University of Catalonia, focused on machine learning with applications to computer vision and natural language processing. Her main research interests include latent variable models for structured prediction and spectral learning techniques for weighted non-deterministic automata and grammars and most recently interactive and collaborative machine learning. She has co-authored over 40 research articles on machine learning, natural language processing and computer vision.

Noah Smith (University of Washington | Allen Institute for Artificial Intelligence)

Rational Recurrences for Empirical Natural Language Processing

Despite their often-discussed advantages, deep learning methods largely disregard theories of both learning and language. This makes their prediction behavior hard to understand and explain. In this talk, I will present a path toward more understandable (but still "deep") natural language processing models, without sacrificing accuracy. Rational recurrences comprise a family of recurrent neural networks that obey a particular set of rules about how to calculate hidden states, and hence correspond to parallelized weighted finite-state pattern matching. Many recently introduced models turn out to be members of this family, and the weighted finite-state view lets us derive some new ones. I'll introduce rational RNNs and present some of the ways we have used them in NLP. My collaborators on this work include Jesse Dodge, Hao Peng, Roy Schwartz, and Sam Thomson.


Noah Smith is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, as well as a Senior Research Manager at the Allen Institute for Artificial Intelligence. Previously, he was an Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, machine learning, and applications of natural language processing, especially to the social sciences. His book, Linguistic Structure Prediction, covers many of these topics. He has served on the editorial boards of the journalsComputational Linguistics (2009–2011), Journal of Artificial Intelligence Research (2011–present), and Transactions of the Association for Computational Linguistics (2012–present), as the secretary-treasurer of SIGDAT (2012–2015 and 2018–present), and as program co-chair of ACL 2016. Alumni of his research group, Noah's ARK, are international leaders in NLP in academia and industry; in 2017 UW's Sounding Board team won the inaugural Amazon Alexa Prize. Smith's work has been recognized with a UW Innovation award (2016–2018), a Finmeccanica career development chair at CMU (2011–2014), an NSF CAREER award (2011–2016), a Hertz Foundation graduate fellowship (2001–2006), numerous best paper nominations and awards, and coverage by NPR, BBC, CBC, New York Times, Washington Post, and Time.