Our recent work
Slides for our recent works: [link]
Switching System Analysis of Q-learning
이 연구는 대표적인 강화학습 알고리즘인 Q-learning의 수렴성을 새로운 방식으로 증명한 연구입니다.
Q-learning을 제어시스템의 스위칭 시스템으로 모델링 하여 제어이론을 활용하여 해석하고 수렴성을 증명 하였으며 기존의 접근법과 전혀 다른 새로운 관점을 제시한 연구입니다. 논문의 결과는 NeurIPS2020에 발표 되었습니다:
Han-Dong Lim, Do Wan Kim, Donghwan Lee, "Regularized Q-learning," submitted [link]
Han-Dong Lim, Donghwan Lee, "Finite-time analysis of asynchronous Q-learning under diminishing step-size from control-theoretic view" submitted [link]
Donghwan Lee and Niao He, ``A unified switching system perspective and convergence analysis of Q-learning algorithms,'' NeurIPS2020 [link] [Online extension].
Donghwan Lee, Jianghai Hu, and Niao He, “A discrete-time switching system analysis of Q-learning,” submitted [link]
Control System Analysis of Reinforcement Learning
다양한 강화학습 알고리즘, 특히 TD-learning의 수렴성 등을 선형시스템 또는 비선형 시스템모델링을 활용하여 해석하고 이를 통해 새로운 강화학습 알고리즘을 개발합니다. 이를 통해 새로운 관점과 해석을 제시했습니다.
Han-Dong Lim, Donghwan Lee, "Backstepping temporal-difference learning " submitted
Han-Dong Lim, Do Wan Kim, Donghwan Lee, "Regularized Q-learning," submitted [link]
Han-Dong Lim, Donghwan Lee, "Finite-time analysis of asynchronous Q-learning under diminishing step-size from control-theoretic view" submitted [link]
Donghwan Lee and Niao He, ``A unified switching system perspective and convergence analysis of Q-learning algorithms,'' NeurIPS2020 [link] [Online extension].
Donghwan Lee, Jianghai Hu, and Niao He, “A discrete-time switching system analysis of Q-learning,” submitted [link]
Donghwan Lee and Niao He , "Target-based temporal difference learning," ICML2019, Long beach, CA, June 11-15, 2019 [link] [online extension].
Donghwan Lee and Do Wan Kim, "Analysis of temporal-difference learning: linear system approach," submitted [link]
Saddle Point Perspective of Reinforcement Learning
다양한 강화학습문제를 최적화와 안장점문제 (saddle point problem)로 바꾼 후에 이를 최적화문제를 풀기 위한 다양한 기법으로 풀 수 있습니다.
Donghwan Lee, Han-Dong Lim, Jihoon Park, and Okyong Choi, "New versions of gradient temporal-difference learning," IEEE Transactions on Automatic Control (accepted) [link]
Donghwan Lee, Do-Wan Kim, and Jianghai Hu, "Distributed off-policy temporal difference learning using primal-dual method," IEEE Access (accepted).
Donghwan Lee and Niao He, ``Stochastic primal-dual Q-learning,'' [link]
Donghwan Lee and Niao He, ``Periodic Q-learning,'' L4DC2020, Berkeley, CA, June, 2020 [link].
New Versions of Temporal-Difference Learning
Donghwan Lee, Han-Dong Lim, Jihoon Park, and Okyong Choi, "New versions of gradient temporal-difference learning," IEEE Transactions on Automatic Control (accepted 2022) [link]
Target-Based TD-Learning
Donghwan Lee and Niao He , "Target-based temporal difference learning," ICML2019, Long beach, CA, June 11-15, 2019 [link] [online extension].
Multi-Agent Reinforcement Learning
다중에에전트 강화학습은 하나 이상의 강화학습 에이전트가 경쟁 또는 협동을 통해서 다양한 테스트를 수행 하는 강화학습입니다.
Donghwan Lee, Niao He, Kamal Parameswaran, and Volkan Cevher, "Optimization for reinforcement learning: from single agent to cooperative agents," IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 123-135, 2020 [link].
Donghwan Lee, Do-Wan Kim, and Jianghai Hu, Distributed off-policy temporal difference learning using primal-dual method," IEEE Access (accepted).