Mirror Descent Policy Optimization

Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

Paper Code