We present LEMURS [1], an algorithm for learning scalable multi-robot control policies from cooperative task demonstrations. We propose a port-Hamiltonian description of the multi-robot system to exploit universal physical constraints in interconnected systems and achieve closed-loop stability. We represent a multi-robot control policy using an architecture that combines self-attention mechanisms and neural ordinary differential equations. The former handles time-varying communication in the robot team, while the latter respects the continuous-time robot dynamics. Our representation is distributed by construction, enabling the learned control policies to be deployed in robot teams of different sizes.
MORE INFORMATION AT: https://eduardosebastianrodriguez.github.io/LEMURS/
Further, we propose a physics-informed reinforcement learning approach able to learn distributed multi-robot control policies that are both scalable and make use of all the available information to each robot [3]. Our approach has three key characteristics. First, it imposes a port-Hamiltonian structure on the policy representation, respecting energy conservation properties of physical robot systems and the networked nature of robot team interactions. Second, it uses self-attention to ensure a sparse policy representation able to handle time-varying information at each robot from the interaction graph. Third, we present a soft actor-critic reinforcement learning algorithm parameterized by our self-attention port-Hamiltonian control policy, which accounts for the correlation among robots during training while overcoming the need of value function factorization
MORE INFORMATION AT: https://eduardosebastianrodriguez.github.io/phMARL/
PAPER - PRONUNCIATION - CODE
Fixed swapping
Time-varying swapping
Flocking
The graph identification problem consists of discovering the interactions among nodes in a network given their state/feature trajectories. This problem is challenging because the behavior of a node is coupled to all the other nodes by the unknown interaction model. Besides, high-dimensional and nonlinear state trajectories make difficult to identify if two nodes are connected. Current solutions rely on prior knowledge of the graph topology and the dynamic behavior of the nodes, and hence, have poor generalization to other network configurations. To address these issues, we propose a novel learning-based approach [2] that combines (i) a strongly convex program that efficiently uncovers graph topologies with global convergence guarantees and (ii) a self-attention encoder that learns to embed the original state trajectories into a feature space and predicts appropriate regularizers for the optimization program. In contrast to other works, our approach can identify the graph topology of unseen networks with new configurations in terms of number of nodes, connectivity or state trajectories. We demonstrate the effectiveness of our approach in identifying graphs in multi-robot formation and flocking tasks.
MORE INFORMATION AT: https://eduardosebastianrodriguez.github.io/LIGMRS/
Multi-robot formation problem
Multi-robot flocking problem
References
[1] E. Sebastián, T. Duong, N. Atanasov, E. Montijano and C. Sagüés, "LEMURS: Learning Distributed Multi-robot Interactions", IEEE International Conference on Robotics and Automation, 2023. More info at: https://eduardosebastianrodriguez.github.io/LEMURS/
[2] E. Sebastián, T. Duong, N. Atanasov, E. Montijano and C. Sagüés, "Learning to Identify Graphs from Node Trajectories in Multi-robot Networks", IEEE International Symposium on Multi-robot & Multi-agent Systems, 2023. More info at: https://eduardosebastianrodriguez.github.io/LIGMRS/
[3] E. Sebastián, T. Duong, N. Atanasov, E. Montijano and C. Sagüés, "Physics-Informed Multi-agent Reinforcement Learning for Distributed Multi-robot Problems", under review at IEEE Transactions on Robotics, 2024. More info at: https://eduardosebastianrodriguez.github.io/phMARL/