PG-DPO: Pontryagin-Guided Direct Policy Optimization
Forward simulation. BPTT costates. Hamiltonian recovery.