Paper Link - https://arxiv.org/abs/2012.05011
Environment Link - https://github.com/SonuDixit/gComm
The environment consists of a 2d grid, communication channel, and two agents 1) a stationary Speaker and 2) a mobile Listener. The speaker's input is a natural language instruction, and the listener's input is the Gridview. The agents must develop a form of communication to complete the task.
Below we show some examples of our Model trained with Intrinsic reward.
In each example, we have a target object and some distractor objects. Distractor objects have either the shape or the color same as the target object.
Distractor objects have either the same shape or same colour as the target object (red square). The red square was not shown to the model(both speaker and the listener) during training.
Distractor objects have either same shape or same colour as the target object (red square). The red square was not shown to the model(both speaker and the listener) during training.
Distractor objects have either same shape or same colour as the target object (red square). The red square was not shown to the model(both speaker and the listener) during training.
Here we show generalisation to task. The models have seen push, push_twice and pull. It has never seen pull_twice. The models can complete the task, only if they can compose the meaning of pull_twice from the [push, pull, push_twice].
Target Encoder (See Figure 3) takes (speaker messages and the grid) as the input and gives the attention weights for likely position of the target object. In the below images, the dark color represents higher weightage. We can see the target encoder is able to identify the target objects.