Costa Huang, HuggingFace

Talk Date and Time: April 23, 2024 at 4:00 pm - 4:45 pm EST followed by 15 minutes of Q&A in IRB-5105 and on Google Meet

Topic: CleanRL / Cleanba: The most readable and hackable RL codebase

Abstract:

The talk will be on CleanRL + Cleanba. CleanRL is a highly transparent and hackable Deep Reinforcement Learning library. By leveraging single-file implementations, CleanRL has significantly fewer lines of code for its DRL implementations compared to many other libraries. CleanRL also has an interesting RLops approach to ensure no regression is introduced during refactoring. CleanRL-style approach can scale, too — Cleanba is a sister codebase that focuses on distributed DRL. Cleanba outperforms torchbeast and moolib while addresses reproducibility issues arising from distributed DRL.

Bio:

Costa Huang is a Machine Learning Engineer at Hugging Face working on RLHF and RL. He holds a Ph.D. from Drexel University, with a focus on reproducible and efficient deep reinforcement learning. Notably, he is the creator of CleanRL, a researcher-friendly RL library. He also specializes in demystifying the implementation details of modern DRL algorithms – for example, he is the lead author of the blog post The 37 Implementation Details of Proximal Policy Optimization.

Page updated

Google Sites

Report abuse