Domains w/ Information Gathering and Memorization
A finger is velocity-controlled on a 1D track to perform a dimension check of two boxes. Since the finger is always compliant, it will be deflected from the vertical axis when it glides over a box. The agent observes the finger's position and angle but not the positions of the two boxes. Therefore, an optimal agent must localize both boxes and determine their sizes using the history of angles and positions. When the two boxes have the same size, the agent must go to the right end to get a non-zero reward and otherwise to the left end.
An ant with four legs moving in a 2D T-shaped world will receive a non-zero reward by reaching a green area (heaven) that can be on the left or the right corner of a junction. The ant receives a penalty when entering a red area (hell). When it stays in the blue ball, it can observe heaven's side (left/right/null). The ant starts randomly around the bottom corner. An optimal agent must visit the blue region to observe heaven's side, memorize the side while going to heaven, and finally goes to heaven.
The ant now has to search and ``tag'' a moving opponent by having the opponent inside the green area centered at the ant. Both start randomly but not too close to each other. The opponent follows a fixed stochastic policy, moving a constant distance away from the ant 75% of the time or staying otherwise. Observation includes the joints' angles & velocities of four legs and the 2D coordinate of the opponent, containing the opponent's position only when it is inside the visibility (blue) area centered at the ant.
A 3-DoF gripper in 3D must successfully push a door to receive a non-zero reward. The door, however, can only be pushed in one direction (front-to-back or vice versa), and the correct push direction is unknown. The agent can observe the joints' angles and velocities and the door's angle. Starting each episode, the door is present to the gripper, initialized with a random pose. An optimal agent must infer the correct push direction from the history of observations.