Few shot evolutionary reinforcement learning in uncertain and dynamic environments

Project FERLUDE

"Few shot evolutionary reinforcement learning in uncertain and dynamic environments" (FERLUDE) is a project funded within the SMASH postdoctoral program.

SMASH is an innovative, intersectoral, career-development training program co-funded by the Marie Skłodowska-Curie COFUND Actions (MSCA COFUND) for the 2023-2028 period.

Full title: Few shot evolutionary reinforcement learning in uncertain and dynamic environments

Acronym: FERLUDE

Investigator: Bruno Gašperov, PhD

Host institution: University of Ljubljana, Faculty of Computer and Information Science, Laboratory for Adaptive Systems and Parallel Processing

Research area: Data Science - Machine Learning for Scientific Applications (1)

SMASH supervisor: Prof. Branko Šter

Duration of fellowship: 19. 6. 2024 – 18. 6. 2026

Extended summary: Over the last several years, reinforcement learning (RL) has achieved a number of exciting breakthroughs in a wide variety of sequential decision-making tasks under uncertainty, especially in combination with deep learning (deep RL). However, the vast majority of such successes have been confined to highly controlled and static environments characterized by several favorable properties. These properties include the stationarity and determinism of the underlying state transition matrix and the reward function. Yet RL agents trained under such circumstances often fail to generalize to real-world environments. To a large extent, this problem stems from the fact that real-world problems often involve dynamic (non-stationary) and uncertain (noisy) environments, undergoing distributional shifts and exhibiting relatively low signal-to-noise ratios. RL agents deployed for real-world tasks must be capable of quickly (i.e., from only a few samples or interactions with the environment) adapting their behavior to environmental changes in order to minimize regret. The ability to update existing knowledge and adapt to changes in the environment is argued to be a pivotal property of intelligent systems. Consequently, there is a need for novel (deep) RL approaches for training and deploying RL agents under such circumstances in a few-shot manner.

The primary goal of this research project is to develop and implement a novel class of methods based on evolutionary computation (EC) that are suitable for training reinforcement learning (RL) agents in uncertain and dynamic environments in a few-shot and model-agnostic manner (without strong distributional assumptions). To achieve this goal, special attention will be paid to a novel class of EC algorithms and techniques based on the ideas of divergent search, evolvability, and open-endedness, including quality-diversity approaches capable of generating a wide range of mutually diverse yet high-performing solutions. Such diversity in the population is hypothesized to facilitate tracking moving optima in non-stationary environments while also achieving improved sample efficiency with proper modifications. The proposed research will impact the state-of-the-art in the burgeoning field of evolutionary RL, shed new light on the issue of meta-learning, and touch upon multiple related topics such as stochastic multi-armed bandits, explainability in RL, and neuroevolution. A significant focus will be put on investigating and designing exploration strategies and policy regularization techniques suitable for this context. Furthermore, techniques such as fitness landscape analysis will be explored, and novel ERL operators will be studied. Finally, the project has a clear multidisciplinary nature, with two secondments aimed at applying the developed methods to two domains in the area of climate research, particularly flash floods over the mainland, and remote sensing in the context of the study of aerosols.

Keywords: reinforcement learning, evolutionary computation, meta-learning, few shot learning, non-stationarity

Publications:

Additional: