Intrinsically Motivated Compositional Language Emergence
Paper Link - https://arxiv.org/abs/2012.05011
Environment Link - https://github.com/SonuDixit/gComm
gComm Environment
The environment consists of a 2d grid, communication channel, and two agents 1) a stationary Speaker and 2) a mobile Listener. The speaker's input is a natural language instruction, and the listener's input is the Gridview. The agents must develop a form of communication to complete the task.
Intrinsic Reward for Compositional Language Emergence
Below we show some examples of our Model trained with Intrinsic reward.
In each example, we have a target object and some distractor objects. Distractor objects have either the shape or the color same as the target object.
Instruction = Walk to a red square
Distractor objects have either the same shape or same colour as the target object (red square). The red square was not shown to the model(both speaker and the listener) during training.
Instruction = Push a red square
Distractor objects have either same shape or same colour as the target object (red square). The red square was not shown to the model(both speaker and the listener) during training.
Instruction = Pull a red square
Distractor objects have either same shape or same colour as the target object (red square). The red square was not shown to the model(both speaker and the listener) during training.
Instruction = Pull (a dax) twice
Here we show generalisation to task. The models have seen push, push_twice and pull. It has never seen pull_twice. The models can complete the task, only if they can compose the meaning of pull_twice from the [push, pull, push_twice].
Attention analysis
Target Encoder (See Figure 3) takes (speaker messages and the grid) as the input and gives the attention weights for likely position of the target object. In the below images, the dark color represents higher weightage. We can see the target encoder is able to identify the target objects.