Restraining bolts

Foundations for Restraining Bolts:

Reinforcement Learning with LTLf/LDLf Restraining Specifications

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi

DIAG - Sapienza University of Rome, Italy

Scenarios used for evaluation

Breakout: a paddle agent has to bounce a ball to break all the bricks in the environment in a specified order: all bricks on each column i must be removed before removing all bricks from column j > i.

Variants of the game considered in the experiments:

size of the environment
fire action available

Sapientino: a robot has to visit (i.e., reach and beep or mark) N consecutive cells of the same color in a predefined order of colors (red, green, blue, pink, brown, gray, purple.). In the videos below the robot is shown as an orange circle and the action of beeping on a cell is shown with a black dot left on the cell.

Variants of the game considered in the experiments:

2 or 3 cells per color to be visited
omni-directional or differential drive robot

Minecraft: the agent has to fulfil multiple tasks in a grid environment where different colors represent resources to be collected and tools to be used. In the videos, blue asterisks in the top are the tasks accomplished.

Implementation

The proposed approach (being based on algorithms such as Sarsa that are designed for non-deterministic environments) can be applied to situations where the transition function of the world in non-deterministic (while the transition of the DFA is deterministic).

The proposed approach can be easily extended to different games and, by exploiting the modularity of the code (i.e., separation between the world and the RL algorithm) and the separation of state variables (those needed to drive the agent and those needed to evaluate the LTL formula), the implementation of a new scenario becomes easy to design and develop.

Example: Sapientino-Minecraft, the same agents used in Sapientino scenarios (same representation of the states) have been used to learn 10 tasks in a Minecraft environment. Both the configurations of the Sapientino agent (omni-directional and differential-drive) as explained in the submitted paper have been tested.

Source code, installation and use instructions

Clone the repository:

git clone https://github.com/iocchi/RLgames.git

cd RLgames

Install the requirements:

python2 -m pip install -r requirements.txt

Execute the experiments:

Sapientino:

python2 experiment1.py

Breakout:

python2 experiment2.py

Minecraft:

python2 experiment3.py

For specific configurations, please follow the instructions in the README.

Results

The results presented in the paper can be reproduced by running the code linked above.

The script run.py contains code to run the experiments reported in the paper. All the commands are commented. Please uncomment the experiments you want to run and run the script.

p of goals reported every 100 episodes shows the percentage of episodes that achieved the LTLf/LDLf specification. Notice that a policy correctly respecting the specifications is computed even when the value of p of goals is not 100%, since the RL agent keeps exploring the state-action configuration space.

Feel free to change configurations to test other scenarios.

Policies found

For each setting described below, we show the policy found after a predefined amount of time. Notice that the policies may not be optimal (because of time limit), but they all satisfy the LTLf goal.

Restraining bolts

Foundations for Restraining Bolts:

Reinforcement Learning with LTLf/LDLf Restraining Specifications

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi

DIAG - Sapienza University of Rome, Italy

Scenarios used for evaluation

Implementation

Source code, installation and use instructions

Results

Policies found

Breakout: 4 x 4, normal mode

Breakout: 4 x 4, fire enabled

Breakout: 4 x 4, non-deterministic bounces

Breakout: 4 x 5, normal mode

Breakout: 4 x 5, fire enabled

Breakout: 4 x 6, normal mode

Breakout: 4 x 6, fire enabled

Sapientino: 2 visits , omni-directional

Sapientino: 2 visits, differential

Sapientino: 3 visits, omni-directional

Sapientino: 3 visits, differential

Minecraft: 10 tasks, omni-directional

Minecraft: 10 tasks, differential