TempoRL

TempoRL: laser pulse temporal shape optimization with
Deep Reinforcement Learning
Francesco Capuano (2), Davorin Peceli (1), Gabriele Tiboni (2), Raffaello Camoriano (2), Bedřich Rus (1)
(1): ELI Beamlines, Dolní Břežany, Czech Republic; (2): VANDAL Lab,
Politecnico di Torino, Turin, Italy

🚀 10/05/2023: Our paper won the Best Student Paper Award! 🚀

Introduction

Current limitations

Deep Reinforcement Learning

Experimental results & Conclusions

Ahoj/Ciao! 🤖
This page serves as an extended abstract and live-demo of the work presented in "TempoRL: laser pulse temporal shape optimization with Deep Reinforcement Learning".
This work has been submitted to the SPIE Optics+Optoelectronics 2023 conference, in Prague. The code base has been open-sourced and is currently available at github.com/fracapuano/TempoRL.

This project is a joint collaboration between the VANDAL Lab, Politecnico di Torino (Turin, Italy) and the ELI Beamlines, Extreme Light Infrastructure ERIC facility (Dolní Břežany, Czech Republic).

Introduction

Ultra-fast light-matter interactions, such as laser-plasma physics and nonlinear optics, require precise shaping and accurate knowledge of the temporal pulse profile.

Optimizing such profiles is one of the most critical tasks to establish control over these interactions. Albeit the highest intensities conveyed by laser pulses can usually be achieved by compressing a pulse to its transform-limited (TL) pulse shape, some interactions may require arbitrary temporal shapes different from the TL profile to protect the system from potential damage.

In this work, we show how changes in the spectral phase affect the temporal profile. We shaped the pulse varying the GDD, TOD and FOD coefficients for the phase parameters.
The effect of changes in the spectral phase on the temporal profile is shown in the following animation.

Effect of changes of the spectral phase on the temporal profile.

The most common automated laser pulse shape optimization approaches mainly employ black-box algorithms, such as Bayesian Optimization (BO) and Genetic Algorithms (GA). These algorithms are typically used in a closed feedback loop between the pulse shaper and various measurement devices.
Recently, approaches based on BO have gained popularity because of their broad applicability and sample efficiency. Indeed, in automated pulse shaping, each function evaluation requires one (or more) actual laser bursts. Therefore, methods that directly optimize real-world operational hardware should also be evaluated based on their efficiency in terms of number of required interactions.

In a previous work of ours we evaluated different black-box and optimization methods for laser pulse shaping by applying those to a custom simulator of the L1 Pump laser developed at ELI Beamlines, Czech Republic. The high-power laser system L1 Allegra is designed to deliver <20 fs pulses with energies higher than 100 mJ at a repetition rate of 1 kHz.
L1 pump lasers generate pulses at 1030 nm with a 1 kHz repetition rate. Before amplification through DIRA, pump pulses are stretched to about 500 ps and then compressed again before frequency doubling to 515 nm in LBO crystals for secod harmonic generation (SHG).
The SHG output efficiency can be significantly improved by tuning the laser pulse temporal profile, which can be done by manipulating the pulse's spectral phase imposed by a CFBG fiber Stretcher (by Teraxion). The final temporal profile can be monitored using single-shot SH FROG 1030 (by Femtoeasy).

Current limitations

We have identified two current limitations of standard implementation of black-box methods:

1) BO, especially at initialization, requires exploring the parameter space with a sequence of experimental samples. We have observed this sequence to exhibit erratic traits that result in potential harm to the operational safety of the laser system. In particular, we observed the succession of probed points to have a high variance, that would essentially require to drastically change the applied control from one iteration to another.

Distribution of actions in which controls exceed the safe-zone of 10% of the whole action range. As it is possible to see, more than half of the controls applied poses a risk for the actual laser.

2) Current approaches produce setting-specific solutions, which are not flexible to changes in the experimental setup. Since black-box laser optimization involves obtaining optimal control parameters to maximize a scalar target quantity, any changes in the experimental setup affecting said target quantity, even minor ones, would require repeating the whole optimization process from scratch.

Effect of the change of the environment parametrization on the loss profile. As it is possible to see, changes in the values of B affect the position of the minima, as well as the loss landscape.

A live-demo of how a naive implementation of BO would optimize the laser parameters is presented in the following animation.

BO, temporal profile obtained probing points from the parameters space.

BO, evolution of the probed points as the parameters space is explored.

Deep Reinforcement Learning

As expected, the exploration of the parameters' space indeed is very erratic, which poses a clear issue for what concerns the safety of deploying such an approach in the real world.

In this work, we explore a novel approach based on Deep Reinforcement Learning (DRL) to control ultra-fast laser temporal profiles, paving the way towards fully-automated robust laser control systems.

DRL can indeed be used as a framework for sequential optimal decision-making, allowing one to approach the optimal set of control parameters dynamically, adapting to the actual state of the laser system instead of obtaining a particular static configuration.

The RL framework consists of an agent interacting with the environment by means of actions it performs on it. In RL, actions are dynamically chosen based on the state of the environment. The agent is rewarded for its own actions based on their goodness, measured with a reward signal.

In DRL, agent-environment interactions are modelled using Markov Decision Processes (MDPs).
We designed two custom episodic MDPs for the L1 Pump laser to employ RL in this setting.
However, both MDPs use the actual control parameters on the spectral phase shaper as states, to reduce noise around the agent’s observation of the environment’s state.
Actions are defined as deltas applied to the control parameters, tasking the agent with choosing whether or not to change the control currently being applied. The actual actions’ magnitude is bounded to ensure operational safety, as the absolute value of the actions was limited to a maximum of 10% of the entire action range. Notably, both the state and action spaces are continuous and bounded spaces.

In the first MDP, the agent is rewarded for taking actions that minimize dissimilarity between the current temporal profile and a target temporal shapes, such as TL.
Each episode is terminated either when too many interactions have taken place or as soon as the agent performs an action resulting in a temporal profile with a dissimilarity over a predefined threshold. This incentivizes the agent to perform actions reducing the loss iteratively, and rewards those actions that do not worsen the agent’s current performance in terms similarity with respect to a target shape.

Our second proposed MDP we designed is more physics-informed.
Indeed, it directly rewards the agent with values related to the actual physical properties of the pulse, such as the value of its peak intensity and full width at half-maximum (FWHM). Each episode is terminated if a threshold on the number of interactions is met or when the agent produces pulses with an FWHM over a user-defined threshold.
We tested both on- and off-policy RL algorithms, i.e. Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO).

Experimental results & Conclusions

Experimental results averaged over 25 test episodes on MDP-2 using 5 different random seeds. These results are solely related to the best-performing discount factor. The best results are underlined. Each group of 5 experimental trials is presented via median value and inter-quantile range (IQR), between parentheses.

We conducted extensive experiments to evaluate the performance of the selected DRL algorithms in the context of temporal shape optimization. Our results indicate that SAC outperforms both PPO and TRPO, achieving faster convergence and better stability.

Results obtained with our best performing algorithm, SAC, training in less than 3 hours on simulated data and multiple copies of the same environment.

Interestingly, our experiments also show that the agents trained on our physics-informed MDP learn faster and better how to minimize the dissimilarity from transform-limited. Indeed, even if entirely unaware of the very concept of transform-limited pulses, agents pursuing peak energy maximization learn to reproduce almost TL- shapes, reducing the value of FWHM up to ∼1.60ps (FWHM TL ≃ 1.58ps) and achieving loss values almost 50% lower than those achieved with direct loss minimization.

This finding suggests that directly using loss information might not be beneficial for learning control strategies and that, instead, using physical quantities characteristic of the actual pulse might yield better results. Our experiments indicate that fully-trained agents can reconstruct almost TL shapes in less than ten interactions. Moreover, our best-performing agents not only can reach the target shape but also learn to maintain in time said shape by performing the appropriate action.

We believe our demonstration of the feasibility and performance of DRL-based automated laser pulse shape optimization paves the way for further investigation of applying such technique in high-power laser facilities.
We plan to extend the work presented in this paper with an extensive study on how to transfer the control strategy agents learn in simulation to the real world, extending recent work in Sim2Real for Reinforcement Learning, with the goal of bridging the gap between the simulated environment and real-world laser systems.

If you have any question/advice... We would be glad to hear from you!
You can find the contact author information on the footer of this webpage.

Page updated

Google Sites

Report abuse