Neural Logic Machines
The website includes the demos of agents sorting integers, finding shortest path in graphs and moving objects in the blocks world. All agents are trained by reinforcement learning. Agents iteratively choose the action based on the current state until the end of the episode (the task is finished or timed-out).
See the description below for further details.
Problem. Sort an array (permutation) a of n integers in ascending order by swapping elements. The numbers are labeled on the blocks.
State. The pairwise comparison matrix of the input numbers in array a. We compare each pair of elements by their indices (i, j) and values (a[i], a[j]) The comparison applies to the index and the value separately, each having one of the three outcomes: smaller, equal, and larger. All comparison outcomes are therefore represented by a boolean tensor of shape [n, n - 1, 2 * 3]. The comparison matrix of values are illustrated below the array in the demo (black means smaller or equal, white means larger).
Action. A pair of indices (x, y). Swap a[x] and a[y].
Result. The NLM agent finds a solution very similar to the selection sorting algorithm, yielding an optimal number (O(n)) of swaps to complete the task.
Problem. The environment is based on a randomly generated undirected graph. The agent needs to reach a targeting node starting from the stating node, step by step.
An instance of the task is defined by the starting node s (colored blue) and the destination t (colored purple). We choose s to t such that their distance is exactly d. d is uniformly sampled in the range [1, 5] during training and fixed as 4 during testing.
State. The adjacency matrix which encodes edges in the graph. Each node contains two properties indicating whether it is the current node (of the agent) or the destination.
Action. The next node to go. If there is no edge connecting the current node with the next node, then the agent will not move.
Result. The NLM agent finds the shortest path.
Problem. The environment contains two worlds, the operating world (upper one in the demo) and the target world (lower one in the demo). Each world contains n blocks and the ground. The task is to take a sequence of actions to transform the initial operating world to the target world.
State. Each object (block or ground) is represented by 4 properties: world_id, object_id (labeled on the blocks in the demo), coordinate_x, coordinate_y. The ground has coordinates (0, 0). We compare each pair of 2n+2 objects (n blocks and 1 ground per world) in two worlds by their properties. Each comparison result can be smaller, equal, or larger, resulting in a boolean tensor of shape [2n+2, 2n+1, 4 * 3].
Action. MOVE(x, y). Move object x onto object y in the operating world, if object x is moveable and object y is placeable. Here, "moveable" means the object is not the ground and there are no other objects on it; "placeable" means either the object is ground or there are no other objects on it.
Result. The agent finds a successful way to move blocks, transforming the initial state to the target state.