Interactive Language Acquisition

Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game

Haichao Zhang Haonan Yu Wei Xu

The 56th Annual Meeting of the Association for Computational Linguistics (ACL) 2018

Abstract

Building intelligent agents that can communicate with and learn from humans in natural language is of great value. Supervised language learning is limited by the ability of capturing mainly the statistics of training data, and is hardly adaptive to new scenarios or flexible for acquiring new knowledge without inefficient retraining or catastrophic forgetting. We highlight the perspective that conversational interaction serves as a natural interface both for language learning and for novel knowledge acquisition and propose a joint imitation and reinforcement approach for grounded language learning through an interactive conversational game. The agent trained with this approach is able to actively acquire information by asking questions about novel objects and use the just-learned knowledge in subsequent conversations in a one-shot fashion. Results compared with other methods verified the effectiveness of the proposed approach.

Task Setup

Formulation: Joint Imitation and Reinforcement Learning

Results

Evolution of reward during training for the word-level task without image variations.

Test performance for the word-level task without image variations.

Models are trained on the Animal dataset and tested on the Fruit dataset.

Visualization of the CNN features with t-SNE. Ten classes randomly sampled from (a-b) the Animal dataset and (c-d) the Fruit dataset,

with features extracted using the visual encoder trained without (a, c) and with (b, d) image variations on the the Animal dataset.

Example results of the proposed approach on novel classes. The learner can ask about the new class and use the interpreter to extract

useful information from the teacher’s sentence via word-level attention η and content importance gmem jointly. The speaker uses the

fusion gate g to adaptively switch between signals from RNN (small g) and external memory (large g) to generate sentence responses.

Related Publications and Resources

Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game

Haichao Zhang, Haonan Yu and Wei Xu

The 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

[PDF] [Baidu Research Blog] [Poster] [Code]

Listen, Interact and Talk: Learning to Speak via Interaction

Haichao Zhang, Haonan Yu and Wei Xu

NIPS Workshop on Visually-Grounded Interaction and Language, 2017

[PDF] [Baidu Research Blog] [Poster]

XWorld Simulation Environment

https://github.com/PaddlePaddle/XWorld