Nair, Ashvin, et al. "Awac: Accelerating online reinforcement learning with offline datasets." arXiv preprint arXiv:2006.09359(2020).
Junseok Lee, 2026-02-23
Takeaway
- AWAC enables stable and data-efficient online fine-tuning by combining off-policy critic learning with advantage-weighted policy updates.
Fujimoto, Scott, David Meger, and Doina Precup. "Off-policy deep reinforcement learning without exploration." International conference on machine learning. PMLR, 2019.
Junseok Lee, 2026-01-28
Takeaway
- BCQ stabilizes offline Q-learning by restricting policy actions to the support of the behavior dataset.
Lectured by prof. Emma Brunskill
Stanford (US), 2019 winter.
Composed by prof. Haim Permuter
Ben Gurion University (Israel)