Reinforcement Learning Projects

1) Model-Based RL (World Models / Hierarchical Latent Dynamics)

Comparative Study of World Models, NVAE-Based Hierarchical Models, and NoisyNet-Augmented Models in CarRacing-V2
In continuous-control settings like CarRacing-V2, RL must solve both world modeling and exploration. This project compares (i) standard World Models, (ii) NVAE-based hierarchical world models, and (iii) NoisyNet-augmented exploration, highlighting trade-offs in reward performance, training stability, and compute. The results clarify when to prioritize stronger representations versus exploration mechanisms.

Deep Hierarchical Variational Autoencoders for World Models in Reinforcement Learning
This project explores NVAE-style hierarchical VAEs as the world model component in model-based RL, improving representation quality and latent dynamics so agents can learn more efficiently with fewer real environment interactions.

Tags: Model-Based RL, World Models, VAE, Hierarchical VAE (NVAE), Exploration, OpenAI Gym

2) Robust & Secure RL (Multimodal RL + Adversarial Attacks/Defenses)

Robust Multimodal Reinforcement Learning
Multimodal agents can solve harder problems by fusing inputs like vision and state features, but they introduce new security risks. This project builds an open-source testbed to generate datasets and evaluate adversarial attacks and defenses on multimodal RL agents, uncovering cross-modal effects and showing that attack success varies strongly by modality and defense choice.

Tags: Robust RL, Multimodal RL, PPO, Diffusion Models

Shayan Jalalipour, Danielle Justo, and Banafsheh Rekabdar. Understanding adversarial vulnerabilities and emergent patterns in multimodal RL. Accepted in Proceedings of the IEEE International Conference on Semantic Computing (ICSC), 2025. GitHub
Shayan Jalalipour, Danielle Justo, and Banafsheh Rekabdar. Understanding adversarial vulnerabilities and emergent patterns in multimodal RL. Neurips 2025 UniReps workshop (blog post), 2025. GitHub

3) In-Context Reinforcement Learning (Online Adaptation with Long-Context Sequence Models)

Online Decision Mamba
We developed Online Decision Mamba (ODM), an online in-context RL architecture that replaces attention in Online Decision Transformers with the Mamba module for improved long-context modeling. ODM fine-tunes offline-trained policies online and was evaluated on MuJoCo and Atari, where it matched or exceeded strong baselines—especially when datasets lacked expert demonstrations. We also analyzed context-length sensitivity and showed how delta-parameter initialization can mitigate degradation.

Tags: In-Context RL, Online Adaptation, Offline RL, MuJoCo, Atari, Sequence Models

Trenton Ruf and Banafsheh Rekabdar. Online decision mamba. In Proceedings of the IEEE International Conference on Cognitive Machine Intelligence (CogMI), 2025. IEEE CogMI 2025.

4) Efficient RL Representations (Beyond Pixels)

RAM-Based Deep Reinforcement Learning for Atari (Deep RAM Network)
Most Atari deep RL relies on stacked pixel frames, which are high-dimensional and model-heavy. This project revisits Atari RAM (128 bytes) and develops a RAM-only agent (Deep RAM Network, DRN) using DQN-style training. DRN achieved competitive performance and outperformed pixel-based DQN in 9/14 games in our experiments, using ~50× fewer parameters and a 220× smaller input size. We also explored a hybrid RAM+pixel agent that exceeded DQN in 11/14 games with minimal overhead.

Tags: Deep RL, DQN, Atari, Efficient Representations, RAM

Wagner, Andrew J. "Digging Deeper with Deep RAM Networks." Master's thesis, Portland State University, 2025.

5) RL for Time-Series Anomaly Detection (LLM Reward Shaping + VAE Signals + Active Learning)

Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach Github
This work combines RL with a VAE reconstruction signal to provide an unsupervised anomaly cue and supports dynamic reward scaling, improving learning when labeled anomalies are scarce.

DRTA: Dynamic Reward Scaling for Reinforcement Learning in Time Series Anomaly Detection
A dynamic reward-scaling framework designed to stabilize RL training and improve sample efficiency for time-series anomaly detection under limited labels.

LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection Github
A unified framework where LLM-derived semantic reward potentials guide exploration, while VAE reconstruction and active learning (uncertainty sampling + label propagation) improve detection performance under small labeling budgets.

Anomaly Detection in Time Series Data Using Reinforcement Learning, Variational Autoencoder, and Active Learning
An earlier framework integrating RL, VAE signals, and active learning to detect anomalies with minimal labeled data, leveraging sequential modeling and uncertainty-driven sample selection.

Tags: Model-Free RL, Anomaly Detection, VAE, LLMs, Reward Shaping, Active Learning, DQN

6) RL for Recommender Systems (Group Recommendation + Attention)

Group Recommendation via Deep Reinforcement Learning
We introduced a deep RL-based group recommendation system that adapts its aggregation strategy to group size and data modality. For smaller groups, it uses weighted preference averaging to preserve individual influence, while for larger groups it uses multi-head attention to capture diverse member preferences and dynamic member–item interactions. We further extend this framework by integrating multi-modal signals, including visual, textual, and behavioral features, into an actor–critic RL formulation. Evaluations on MovieLens-style dataset show consistent improvements in ranking and retrieval metrics over strong baselines.

Tags: Model-Free RL, Deep RL, Recommender Systems, Multi-Head Attention, Group Recommendation, Multi-Modal Recommendation

Izadkhah, Saba, and Banafsheh Reakbdar. Multi-modal group recommendation with visual and textual fusion via deep reinforcement learning. In Proceedings of the AIxSET Conference, September 2025.
Izadkhah, Saba, and Banafsheh Reakbdar. "Deep Reinforcement Learning Based Group Recommendation System with Multi-Head Attention Mechanism." In 2023 Fifth International Conference on Transdisciplinary AI (TransAI), pp. 120-127. IEEE, 2023.
Izadkhah, Saba, and Banafsheh Rekabdar. "Enhanced Deep Reinforcement Learning based Group Recommendation System with Multi-head Attention for Varied Group Sizes." In ESANN. 2024.

7) Uncertainty-Aware Decision-Making (Planning)

Uncertainty Measured Markov Decision Process in Dynamic Environments (ICRA 2020)
Robot path planning becomes challenging in dynamic environments with visual occlusions and moving targets. This work proposes a predictive planning approach that explicitly measures uncertainty during motion planning using a variant of subjective logic combined with an MDP formulation. The model outputs belief/disbelief/uncertainty over candidate trajectories and selects the best planning strategy for target tracking/pursuit-evasion scenarios.

Tags: MDP, POMDP, Uncertainty Quantification, Robotics, Motion Planning, Decision-Making

Dutta, Sourav, Banafsheh Rekabdar, and Chinwe Ekenna. "Uncertainty measured markov decision process in dynamic environments." In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 962-968. IEEE, 2020.