Muhammad Burhan Hafez

Postdoctoral Associate at the University of Hamburg

UHH page - Google Scholar - ResearchGate - GitHub

I am currently a Postdoctoral Associate at the Knowledge Technology Group, University of Hamburg. My research focuses on developing data-efficient deep reinforcement learning algorithms for robot motor control by applying biological principles of self-organization and intrinsic motivation. I also work on meta-decision making, strategy selection and algorithms that integrate model-based and model-free control for robot skill learning. My research interests include:

  • Neural Networks

  • Reinforcement Learning

  • Cognitive Robotics


Behavior Self-Organization Supports Task Inference for Continual Robot Learning

Hafez and Wermter (2021). IROS.

In this paper, we propose an unsupervised task inference approach for continual, multi-task robot learning, inspired by goal-directed imitation learning, a cognitive process by which humans can infer a task by observing a demonstration of the desired behavior.

Our approach learns a behavior embedding space by self-organizing visual demonstrations of behaviors. Task Inference is made by finding the nearest behavior embedding to a given demonstration. The embedding is used together with the environment state as input to a multi-task policy trained with reinforcement learning to optimize performance over tasks.

Unlike previous approaches, our approach makes no assumptions about task distribution or policy architecture and requires no task exploration at test time to infer tasks. We show that our approach achieves better generalization performance and convergence speed than the state of the art in experiments with concurrently and sequentially presented tasks.

Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination

Hafez et al. (2020). Robotics and Autonomous Systems.

We found that the learning progress of a world model that is computed locally in self-organized regions of a learned latent space provides a spatially and temporally local estimate of the reliability in model predictions. This estimate is used to arbitrate between model-based and model-free decisions and compute an adaptive prediction horizon for model predictive control and experience imagination.

Our approach improves the efficiency of learning visuomotor control in simulation and real world. Policy networks trained in simulation with our approach are shown to perform well on the physical robot using a simple simulation-to-real transfer, without fine-tuning of the policy parameters.

Check out our 2-min video summary here.

Link to the code on GitHub:

Efficient Intrinsically Motivated Robotic Grasping with Learning-Adaptive Imagination in Latent Space

Hafez et al. (2019). ICDL-EpiRob.

Inspired by human mental simulation of motor behavior and its role in skill acquisition, we show that:

(1) The sample efficiency of learning vision-based robotic grasping can be greatly improved by performing experience imagination in a learned latent space and using the imagined data for training grasping policies.

(2) The proposed adaptive imagination, where imagined rollouts are generated with probability proportional to the prediction reliability of the local world model in the traversed latent-space regions, outperforms fixed-depth imagination.

(3) Using intrinsic reward based on model learning progress leads to data that improves future predictions necessary for imagination.

Curious Meta-Controller: Adaptive Alternation between Model-Based and Model-Free Control in Deep Reinforcement Learning

Hafez et al. (2019). IJCNN.

In this paper, we show that using a curiosity feedback based on prediction learning progress to arbitrate between model-based and model-free decisions accelerates learning pixel-level control policies.

J.Behav.Robot 2019.mp4

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

Hafez et al. (2019). J. Behav. Robot.

This work demonstrates that spatially and temporally local learning progress in a growing ensemble of local world models provides an effective intrinsic reward, enabling directed exploration for vision-based grasp learning on a developmental humanoid robot. The work also suggests that training a small actor network on low-dimensional feature representations learned for self-reconstruction and reward prediction leads to a fast and stable learning performance.