Great things are done by a series of small things brought together. Vincent Van Gogh
We, David Ivorra-Piqueres and Martin-Philipp Irsch, are both students enrolled in the Artificial Intelligence MSc at the University of Edinburgh. As part of the Machine Learning Practical course , students are free to choose a self-proposed research project in a machine learning context. The following results and videos present our research results. At this point, we thank Antreas Antoniou, Pavlos Andreadis, and Steve Renals for their ongoing support and helpful discussion to deliver this project.
On this page, we present visualisations of our scaled-down world models research for the popular CarRacing environment. We will shortly release our report, our code and also, installations instructions for virtual machines.
Schematic showing the World Model architecture, adapted from Ha & Schmidhuber 2018.
World Models are a recent model-based reinforcement learning framework that has been shown to achieve state-of-the-art results in a variety of popular reinforcement learning environments. However, training them requires extreme amounts of computing availability, often only found in very high profile private-sector companies.
In this research work, we attempt to make this framework accessible to a larger fraction of the reinforcement learning research community by investigating how scaled-down variants of World Models perform.
As a consequence, we discover competitive high-performance low-budget World Models that are trainable on limited computational resources and are individually competitive. However, when compared to the state-of-the-art on standardized evaluation metrics, our scaled-down World Models drop in performance. In this regard, we point towards modifications in our training algorithm that improve the evaluation score while maintaining reduced complexity.
Scaled-down World Model with latent embedding size of 4 and recurrent size of 4. Videos shows progress during training with CMA-ES. Fitness function computed over 3 rollouts during training.
Scaled-down World Model with latent embedding size of 4 and recurrent size of 4. Videos shows progress during training with CMA-ES. Fitness function computed over 16 rollouts during training.
Further results analysis will be released in two weeks.
World Model with latent embedding size of 32 and recurrent size of 32. Videos shows controller progress during training with CMA-ES. Fitness function computed over 3 rollouts during training.