(2021.03 – 2023.09, AIARA Project)
This research investigates how learning-based methods can enable adaptive agents, defined as agents capable of adapting to new tasks and constraints with minimal retraining effort [6].
The central research question is:
How can an agent adapt its existing policy to potentially new tasks and safety constraints with minimal additional learning effort?
While many transfer learning approaches, such as meta-learning, curriculum learning, and task-conditioned policies, assume that the target task is known or explicitly specified, adaptive systems often operate under uncertain or partially unknown task conditions.
This research therefore explores how structured priors, randomization, and task invariants can improve the efficiency, robustness, and adaptability of reinforcement learning when facing unknown or evolving tasks and constraints. Three complementary strategies are investigated:
Embedding structured priors (e.g., abstractions, existing policies, motion primitives) to accelerate policy adaptation
Domain randomization and stochastic simulation to improve robustness and sim-to-real transfer
Extracting task invariants through reward shaping and inverse reinforcement learning to capture underlying behavioral structure
Together, these approaches aim to enhance reinforcement learning systems so they can adapt to new operational conditions and evolving safety boundaries without extensive retraining or explicit task specification.
Learning-based planning for adaptive agents faces several fundamental challenges:
Policy adaptation under task uncertainty: Reinforcement learning typically assumes a fixed task specification, while real-world agents must adapt to previously unseen tasks and evolving operational constraints.
Sample inefficiency of policy learning: Standard reinforcement learning methods require large amounts of interaction data, making policy adaptation costly when tasks or environments change.
Lack of structural abstraction in learned policies: Purely data-driven policies often fail to capture reusable structure, limiting transfer across tasks, environments, and platforms.
Robustness and sim-to-real generalization: Policies trained in simulation may degrade when deployed in real environments due to modeling errors and unmodeled stochasticity.
These challenges highlight the need for learning mechanisms that exploit structure, invariants, and stochastic diversity to enable efficient and robust policy adaptation.
This research develops learning frameworks for adaptive agents by integrating structural priors, stochastic training environments, and task-invariant representations.
The methodological contributions can be organized into three complementary directions.
1. Structured Priors for Efficient Policy Adaptation
To improve learning efficiency, this work introduces structured priors into reinforcement learning, allowing agents to exploit existing knowledge during policy optimization.
Several forms of priors are explored:
Geometric abstractions, such as symmetry-guided learning, to reduce policy search space [1]
Demonstration policies, integrating implicit behavior cloning with reinforcement learning [2]
Motion primitives, enabling policy learning within structured motion representations [2][4]
These mechanisms accelerate learning and improve policy stability by constraining exploration to effective regions of the policy space.
2. Stochastic Simulation for Robust Learning and Sim-to-Real Transfer
To address robustness and sim-to-real generalization, the research investigates stochastic simulation as an intrinsic component of learning environments [5].
Instead of relying on deterministic simulation, the approach introduces structured stochasticity and domain randomization into training processes. This enables agents to learn policies that remain robust under modeling errors, environmental variability, and sensor noise.
Such stochastic training environments help bridge the simulation–reality gap, improving the reliability of learned policies when deployed in physical systems.
3. Extraction of Task Invariants for Adaptive Behavior
Beyond accelerating learning, the research explores how task-invariant structures can be extracted and embedded into learning objectives.
Two complementary techniques are investigated:
Risk-aware reward shaping, embedding safety-related behavioral preferences into reinforcement learning [3]
Inverse reinforcement learning, identifying latent behavioral structures from observed trajectories [4]
By capturing invariant aspects of tasks and behaviors, these methods improve policy interpretability, transferability, and adaptability across varying operational conditions.
This research establishes a structured perspective on learning for adaptive agents, demonstrating that reinforcement learning can be significantly improved by incorporating prior knowledge, stochastic diversity, and task invariants.
The results show that:
Learning efficiency can be substantially increased through structured priors that constrain policy search and reuse existing knowledge.
Robustness to environmental uncertainty can be improved through stochastic simulation and domain randomization.
Safety-related behavioral preferences can be embedded into learning objectives via principled reward shaping and inverse reinforcement learning.
Together, these contributions illustrate how reinforcement learning systems can be designed to adapt to new tasks, environments, and safety constraints without extensive retraining or explicit task redefinition.
More broadly, this work highlights the importance of structure-aware learning mechanisms for building adaptive autonomous agents capable of operating in complex and evolving environments.
A. M. S. Enayati, Z. Zhang, K. Gupta, and H. Najjaran, "Sample-efficient reinforcement learning with symmetry-guided demonstrations for robotic manipulation", in Neural Computing & Applications, Vol. 38, no. 24, Jan 2026, DOI: 10.1007/s00521-025-11733-1. [Springer][arXiv]
Z. Zhang, J. Hong, A. M. S. Enayati, and H. Najjaran, “Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning”, in IEEE Transactions on Robotics, vol. 40, pp. 4733 - 4749, Sept 2024, DOI: 10.1109/TRO.2024.3468770. [IEEE Xplore] [ArXiv]
L. C. Wu, Z. Zhang*, S. Haesaert, Z. Ma, and Z. Sun, "Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous Driving", 49th Annual Conference of the IEEE Industrial Electronics Society (IECON 2023), Singapore, 16-19 Oct 2023.
N. Dang, T. Shi, Z. Zhang, W. Jin, M. Leibold, and M.Buss, "Identifying Reaction-Aware Driving Styles of Stochastic Model Predictive Controlled Vehicles by Inverse Reinforcement Learning", 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023), Bilbao, Spain, 24-28 Sept 2023.
A. M. S. Enayati, R. Dershan, Z. Zhang, D. Richert, and H. Najjaran, "Facilitating Sim-to-real by Intrinsic Stochasticity of Real-Time Simulation in Reinforcement Learning for Robot Manipulation", in IEEE Transactions on Artificial Intelligence, vol 5, no.4, pp. 1791 - 1804, Jul 2023, DOI: 10.1109/TAI.2023.3299252. [IEEEXplore][arXiv]
Z. Zhang, R. Dershan, A. M. S. Enayati, M. Yaghoubi, D. Richert, and H. Najjaran, "A High-Fidelity Simulation Platform For Industrial Manufacturing by Incorporating Robotic Dynamics Into an Industrial Simulation Tool", in IEEE Robotics and Automation Letters, Vol. 7, no. 4, pp. 9123-9128, Jul 2022, DOI: 10.1109/LRA.2022.3190096. [IEEEXplore] [ResearchGate]
A. M. Soufi Enayati, Z. Zhang, and H. Najjaran*, "A methodical interpretation of adaptive robotics: Study and reformulation", in Neurocomputing, Vol. 512, no. 2022, pp. 381-397, 2022, doi: 10.1016/j.neucom.2022.09.114. [ScienceDirect][ResearchGate]