Image-To-Control Training with Dreamer V3

Abstract

In recent years, research into autonomous control has expanded greatly. Large autonomy stacks, integrated Reinforcement Learning(RL) and POMDP's are just some examples of approaches that have been used. Generally speaking, at least part of these approaches require careful hand-crafting of their algorithmic frameworks. In an attempt to find a solution that works out-of-the-box, we explore the feasibility of using a pixel-to-control solution for search-and-rescue applications. More specifically, the performance of the novel RL algorithm DreamerV3 is first validated on the OpenAI 2D pendulum environment, after which it is applied to two custom search-and-rescue 3D environments in Webots. The first environment contains static obstacles, and the second has moving pedestrians that need to be avoided by the agent. On the 2D Pendulum environment, we discovered that the DreamerV3 algorithm is just as sample efficient when learning from pixels as it is learning from vector inputs, although it does take longer due to the additional required compute. As for the search-and-rescue environments, the agent was able to learn decent strategies from a single 64-by-64 camera worth of pixel data. Future improvements in the search-and-rescue objective include improving the reward function, adding more sensors for depth information and giving the algorithm more training time.

Presentation

This presentation was made for an audience with high technical skills in Reinforcement AI who have read the Dreamer V3 Paper. It is still interesting to look thought if you don't have this background because the technical knowledge is only used in some sections.

David, Goliath and the Dragonball: A DreamerV3 Exploration

Paper

Version 1 (Focused on the DreamerV3 Algorithm)

DreamerV3___Finding_the_Dragonballs.pdf

Version 2 (Focused on the Problem of Exploring an Environment)

DreamerV3___Finding_the_Dragonballs___Science-1.pdf

Code

The code is all available on GitHub. We hopefully went back and cleaned it up, but if not, it's a development codebase. Please be gentle with the harsh thoughts because we found it to be a mess too.

Page updated

Google Sites

Report abuse