Policy Gradient With 

Serial Markov Chain Reasoning

Edoardo Cetin, Oya Celiktutan

NeurIPS'22