Interestingly, our experiments also show that the agents trained on our physics-informed MDP learn faster and better how to minimize the dissimilarity from transform-limited. Indeed, even if entirely unaware of the very concept of transform-limited pulses, agents pursuing peak energy maximization learn to reproduce almost TL- shapes, reducing the value of FWHM up to ∼1.60ps (FWHM TL ≃ 1.58ps) and achieving loss values almost 50% lower than those achieved with direct loss minimization.
This finding suggests that directly using loss information might not be beneficial for learning control strategies and that, instead, using physical quantities characteristic of the actual pulse might yield better results. Our experiments indicate that fully-trained agents can reconstruct almost TL shapes in less than ten interactions. Moreover, our best-performing agents not only can reach the target shape but also learn to maintain in time said shape by performing the appropriate action.
We believe our demonstration of the feasibility and performance of DRL-based automated laser pulse shape optimization paves the way for further investigation of applying such technique in high-power laser facilities.
We plan to extend the work presented in this paper with an extensive study on how to transfer the control strategy agents learn in simulation to the real world, extending recent work in Sim2Real for Reinforcement Learning, with the goal of bridging the gap between the simulated environment and real-world laser systems.