Memory Augmented Neural Networks for Natural Language Processing

Caglar Gulcehre and Sarath Chandar

When? 8 Sep 2017, Afternoon

Where? EMNLP 2017


Designing of general-purpose learning algorithms is a long-standing goal of artificial intelligence. A general purpose AI agent should be able to have a memory that it can store and retrieve information from. Despite the success of deep learning in particular with the introduction of LSTMs and GRUs to this area, there are still a set of complex tasks that can be challenging for conventional neural networks. Those tasks often require a neural network to be equipped with an explicit, external memory in which a larger, potentially unbounded, set of facts need to be stored. They include but are not limited to, reasoning, planning, episodic question-answering and learning compact algorithms. Recently two promising approaches based on neural networks to this type of tasks have been proposed: Memory Networks and Neural Turing Machines.

In this tutorial, we will give an overview of this new paradigm of "neural networks with memory". We will present a unified architecture for Memory Augmented Neural Networks (MANN) and discuss the ways in which one can address the external memory and hence read/write from it. Then we will introduce Neural Turing Machines and Memory Networks as specific instantiations of this general architecture. In the second half of the tutorial, we will focus on recent advances in MANN which focus on the following questions: How can we read/write from an extremely large memory in a scalable way? How can we design efficient non-linear addressing schemes? How can we do efficient reasoning using large scale memory and an episodic memory? The answer to any one of these questions introduces a variant of MANN. We will conclude the tutorial with several open challenges in MANN and its applications to NLP.

We will introduce several applications of MANN in NLP throughout the tutorial. Few examples include language modeling, question answering, visual question answering, and dialogue systems.

Outline of the tutorial:

1. Introduction and Basics [30 mins]

a. Neural Networks and Backpropagation

b. Recurrent Neural Networks

c. Long Short Term Memory (LSTM) Networks

d. Encoder-Decoder Paradigm

2. Memory Augmented Neural Networks (MANNs) [60 mins]

a. Why Memory Augmented Neural Networks?

b. General Paradigm: Neural Networks with Memory

c. Neural Turing Machines

d. Memory Networks

e. Neural Networks with Stack/Queue

3. MANNs and long term dependencies [30 minutes]

a. Discrete Vs. Continuous Addressing

b. Sparse Access Memory

c. MANNs with Wormhole Connections

4. MANNs and external knowledge [30 minutes]

a. External Knowledge Base as Memory

b. Scalable Memory Access for MANNs with extremely large memory

5. MANNs and Reasoning [20 minutes]

a. End to end memory networks

b. Dynamic memory networks

6. Challenges and Open Questions [10 minutes]


Caglar Gulcehre is currently a research scientist at Deepmind. He finished his PhD in University of Montreal under the supervision of Yoshua Bengio. His work mainly focuses on applications of neural networks, in particular recurrent architectures such as GRU and LSTMs on NLP and sequence to sequence learning tasks. His research also investigates different optimization approaches and architectures which are easier to optimize for neural networks. His recent research focuses on building neural network models that have external memory structures. He has done research internships at IBM Watson Research Center, Google Deep Mind. He was a PC Member at ECML and IJCAI 2016 Deep Reinforcement Learning Workshop. Prior to joining MILA as a PhD student, he finished his master degree in Middle East Technical University in Cognitive Science department. The complete list of his publications can be found at here.

Sarath Chandar is currently a PhD student in University of Montreal under the supervision of Yoshua Bengio and Hugo Larochelle. His work mainly focuses on Deep Learning for complex NLP tasks like question answering and dialog systems. He also investigates scalable training procedure and memory access mechanisms for memory network architectures. In the past, he has worked on multilingual representation learning and transfer learning across multiple languages. His research interests includes Machine Learning, Natural Language Processing, Deep Learning, and Reinforcement Learning. Before joining University of Montreal, he was a Research Scholar in IBM Research India for a year. He has previously given a tutorial on "Multilingual Multimodal Language Processing using Neural Networks" at NAACL 2016. To view the complete publication list and presenter profile, please visit here.