[Fig C] (a) True Q at a fixed state; (b) Learned policy obtained using a highly expressive policy
[Table E] Performance improvement after applying PA and RS-LN to IQL. The scores are the averages of the final evaluations across five random seeds. (same as Table 5 in our manuscript)