IEEE WCCI 2020  Tutorial

Advances in Deep Reinforcement Learning

This tutorial has been accepted. You're welcome to attend the tutorial by registering via the conference website. It has been scheduled 11:30-13:30 UK Time on 19th July 2020: https://wcci2020.org/tutorials/

Aim and Scope

Deep reinforcement learning (deep RL) has attracted enormous attention of the research community as it has been able to solve complex problems in many areas. Its extensions to deal with multi-objective and multi-agent problems have also been introduced. Deep RL is transforming and stimulating growth in many industries such as autonomous vehicles, (in-house) robotics, defence and security, retail, supply chains, smart manufacturing, medical diagnosis systems, (remote) aged care and health care systems, cancer treatment planning, autonomous surgery, smart grid control, telecommunications, software engineering, Internet of Things, and so on. The goal of this tutorial is to present in detail the current state-of-the-art deep RL theory and application. We highlight the differences between different types of deep RL methods and their appropriate applications. We present our recent algorithms in the multi-objective and multi-agent domains. Real-world examples for each type of deep RL methods are given along with their demonstrations via our friendly deep RL framework Fruit API http://fruitlab.org/

This tutorial provides background and insights of recent development of deep RL theory and application. It is helpful for researchers who struggle to find appropriate deep RL methods for their problems. We present advantages and disadvantages, and show how to apply different types of deep RL methods effectively. We show instructions on how to start solving a problem by deep RL methods and how to implement the algorithms in Python. This tutorial also discusses current trend and future research directions on the deep RL topic. This facilitates future development of more robust and highly useful deep RL methods for solving real-world problems. 

This tutorial is particularly helpful to a broad range of audiences, including professionals, researchers from academia, students, practitioners, and engineers who wish to enhance their knowledge in the field of deep RL methods and their applications. It offers a unique opportunity to disseminate in-depth knowledge on deep RL and how to use those algorithms to solve real-world problems such as autonomous vehicles (cars and drones), autonomous surgical robotics, applications in finance, cybersecurity, and Internet of Things.


Contents

Part 1: State-of-the-art deep RL methods and applications

In this part, we present briefly background of deep RL and discuss the advantages and disadvantages of different types of deep RL methods. This tutorial systematically separates deep RL methods into different types, e.g. we compare pros and cons of model-free vs model-based methods, value-based vs policy-based methods, temporal difference learning vs Monte-Carlo methods, continuous action space vs discrete action space, deterministic policy vs stochastic policy methods, on-policy vs off-policy methods, and fully observable vs partially observable algorithms. We particularly present and discuss real-world examples for each type of deep RL methods. We then present an overview of recent development, current trend and future research directions of deep RL.


Part 2: Deep RL methods in the multi-objective and multi-agent domains

Many decision making problems in the real world requires the consideration of more than one objective. Multi-objective deep RL (MODRL) extends the conventional single-objective RL methods to characterize two or more objectives simultaneously. In this part, we present a survey of MODRL methods and metrics used to evaluate their performance such as hypervolume indicator, accumulated reward, regret metric, user-based testing or simulated user testing. We differentiate between linear (e.g. weighted sum) and non-linear (e.g. thresholded lexicographic ordering) methods and their merits and demerits in expressing the desired trade-off between objectives. We discuss our recent MODRL framework (one of the first works in this domain) that implements both single-policy and multi-policy strategies. Several video demonstrations on the Deep Sea Treasure and MO-Mountain-Car problems are also given to compare pros and cons of single-policy vs multi-policy methods.

As real-world problems have become increasingly complex, there are many situations where a single deep RL agent is not able to cope with. In such situations, the applications of a multi-agent system are indispensable. We present an overview of technical challenges in multi-agent deep RL (MADRL) as well as solutions for these challenges. We cover numerous MADRL perspectives, including non-stationarity, partial observability, multi-agent training schemes, multi-agent transfer learning, and continuous state and action spaces challenges. The merits and demerits of the MADRL methods will be discussed in detail. Applications of MADRL in various fields are also reviewed and analysed. We also present extensive discussions and interesting future research directions of MADRL along with several video demonstrations of our recent MADRL algorithms.


Part 3: Demonstrations based on our deep RL framework Fruit API

In this part, we introduce our friendly deep RL framework Fruit API http://fruitlab.org/started.html, and its features. This framework facilitates the development of deep RL in many environments as we have incorporated into Fruit API the Arcade Learning Environment (Atari 2600), OpenAI Gym, DeepMind Lab, Carla (self-driving car), TensorForce's environments (by using TensorForce plugin), OpenAI Retro, Deepmind Pycolab, Unreal Engine, Maze Explorer, Robotics – OpenSim, Pygame Learning Environment, and ViZDoom.

We also introduce extra environments as a testbed to examine different deep RL methods and their video demonstrations:

-        Grid World (graphical support)

-        Puddle World (graphical support)

-        Mountain car (multi-objective environment/graphical support)

-        Deep sea treasure (multi-objective environment/graphical support)

-        Tank battle (multi-agent/multi-objective/human-agent cooperation environment)

-        Food collector (multi-objective environment)

-        Milk factory game (multi-agent/heterogeneous environment)

We then show instructions on how to integrate any new environments to the Fruit API framework.


Organizers