The paper I read, titled “Visual Behavior Modelling for Robotic Theory of Mind” was published in 2021 by Columbia researchers Boyuan Chen, Carl Vondrick, and Hod Lipson. It details the use of an AI observer to predict the actions of simple robots that would only move on a flat plane. Researchers wanted to see how accurate the behavior prediction would be, and if the AI observer would be able to develop an extremely basic version of Theory of Mind. The AI observer is a Convolutional Neural Network (CNN), a model in which an input image is fed in and data is morphed to produce the predicted scene output. The model is “trained” and its results become more accurate as the amount of training data it is given increases. One of the most interesting aspects of the paper is that the AI observer predicts behavior fully visually; it is fed images of a scene and outputs its prediction as a representation of the scene after the action(s) have occurred. The model is able to predict the behavior of a simple observed robot with a success rate of 98.5% across four different types of actions. The success rate is reduced for more complex scenarios, but the model still demonstrates very impressive accuracy. There are a few caveats to this model that require future research to be improved upon. For one, the images sent into the CNN are from a top down view of the scene, while humans see from a first person view. Additionally, the scene is limited to only one observed robot, as opposed to a group of multiple interacting with one another. This work has far reaching applications, and could be seen as a precursor to Theory of Mind for AI. In addition, it is conjectured that this work may pave the way for more socially capable machines.
Press the pop-out button to view: