Comparative Study of World Models, NVAE-Based Hierarchical Models, and NoisyNet-Augmented Models in CarRacing-V2
In continuous-control settings like CarRacing-V2, RL must solve both world modeling and exploration. This project compares (i) standard World Models, (ii) NVAE-based hierarchical world models, and (iii) NoisyNet-augmented exploration, highlighting trade-offs in reward performance, training stability, and compute. The results clarify when to prioritize stronger representations versus exploration mechanisms.
Deep Hierarchical Variational Autoencoders for World Models in Reinforcement Learning
This project explores NVAE-style hierarchical VAEs as the world model component in model-based RL, improving representation quality and latent dynamics so agents can learn more efficiently with fewer real environment interactions.
Tags: Model-Based RL, World Models, VAE, Hierarchical VAE (NVAE), Exploration, OpenAI Gym
Robust Multimodal Reinforcement Learning
Multimodal agents can solve harder problems by fusing inputs like vision and state features, but they introduce new security risks. This project builds an open-source testbed to generate datasets and evaluate adversarial attacks and defenses on multimodal RL agents, uncovering cross-modal effects and showing that attack success varies strongly by modality and defense choice.
Tags: Robust RL, Multimodal RL, PPO, Diffusion Models
Shayan Jalalipour, Danielle Justo, and Banafsheh Rekabdar. Understanding adversarial vulnerabilities and emergent patterns in multimodal RL. Accepted in Proceedings of the IEEE International Conference on Semantic Computing (ICSC), 2025.
Online Decision Mamba
We developed Online Decision Mamba (ODM), an online in-context RL architecture that replaces attention in Online Decision Transformers with the Mamba module for improved long-context modeling. ODM fine-tunes offline-trained policies online and was evaluated on MuJoCo and Atari, where it matched or exceeded strong baselines—especially when datasets lacked expert demonstrations. We also analyzed context-length sensitivity and showed how delta-parameter initialization can mitigate degradation.
Tags: In-Context RL, Online Adaptation, Offline RL, MuJoCo, Atari, Sequence Models
Trenton Ruf and Banafsheh Rekabdar. Online decision mamba. In Proceedings of the IEEE International Conference on Cognitive Machine Intelligence (CogMI), 2025. IEEE CogMI 2025.
RAM-Based Deep Reinforcement Learning for Atari (Deep RAM Network)
Most Atari deep RL relies on stacked pixel frames, which are high-dimensional and model-heavy. This project revisits Atari RAM (128 bytes) and develops a RAM-only agent (Deep RAM Network, DRN) using DQN-style training. DRN achieved competitive performance and outperformed pixel-based DQN in 9/14 games in our experiments, using ~50× fewer parameters and a 220× smaller input size. We also explored a hybrid RAM+pixel agent that exceeded DQN in 11/14 games with minimal overhead.
Tags: Deep RL, DQN, Atari, Efficient Representations, RAM
Wagner, Andrew J. "Digging Deeper with Deep RAM Networks." Master's thesis, Portland State University, 2025.
Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach
This work combines RL with a VAE reconstruction signal to provide an unsupervised anomaly cue and supports dynamic reward scaling, improving learning when labeled anomalies are scarce.
DRTA: Dynamic Reward Scaling for Reinforcement Learning in Time Series Anomaly Detection
A dynamic reward-scaling framework designed to stabilize RL training and improve sample efficiency for time-series anomaly detection under limited labels.
LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection
A unified framework where LLM-derived semantic reward potentials guide exploration, while VAE reconstruction and active learning (uncertainty sampling + label propagation) improve detection performance under small labeling budgets.
Anomaly Detection in Time Series Data Using Reinforcement Learning, Variational Autoencoder, and Active Learning
An earlier framework integrating RL, VAE signals, and active learning to detect anomalies with minimal labeled data, leveraging sequential modeling and uncertainty-driven sample selection.
Tags: Model-Free RL, Anomaly Detection, VAE, LLMs, Reward Shaping, Active Learning, DQN
Group Recommendation via Deep Reinforcement Learning
We introduced a deep RL-based group recommendation system that adapts its aggregation strategy to group size. For smaller groups it uses weighted preference averaging, while for larger groups it uses multi-head attention to capture diverse member preferences and dynamic member–item interactions. On MovieLens-style data, the approach improves ranking and retrieval metrics over strong baselines.
Tags: Model-Free RL, Deep RL, Recommender Systems, Multi-Head Attention, Group Recommendation
Izadkhah, Saba, and Banafsheh Reakbdar. Multi-modal group recommendation with visual and textual fusion via deep reinforcement learning. In Proceedings of the AIxSET Conference, September 2025.
Uncertainty Measured Markov Decision Process in Dynamic Environments (ICRA 2020)
Robot path planning becomes challenging in dynamic environments with visual occlusions and moving targets. This work proposes a predictive planning approach that explicitly measures uncertainty during motion planning using a variant of subjective logic combined with an MDP formulation. The model outputs belief/disbelief/uncertainty over candidate trajectories and selects the best planning strategy for target tracking/pursuit-evasion scenarios.
Tags: MDP, POMDP, Uncertainty Quantification, Robotics, Motion Planning, Decision-Making