We present BricksRL, a platform designed to democratize access to robotics for reinforcement learning research and education. BricksRL facilitates the creation, design, and training of custom LEGO robots in the real world by interfacing them with the TorchRL library for reinforcement learning agents. The integration of TorchRL with the LEGO hubs, via Bluetooth bidirectional communication, enables state-of-the-art reinforcement learning training on GPUs for a wide variety of LEGO builds. This offers a flexible and cost-efficient approach for scaling and also provides a robust infrastructure for robot-environment-algorithm communication. We present various experiments across tasks and robot configurations, providing built plans and training results. Furthermore, we demonstrate that inexpensive LEGO robots can be trained end-to-end in the real world to achieve simple tasks, with training times typically under 120 minutes on a normal laptop. Moreover, we show how users can extend the capabilities, exemplified by the successful integration of non-LEGO sensors. By enhancing accessibility to both robotics and reinforcement learning, BricksRL establishes a strong foundation for democratized robotic learning in research and educational settings.
Evaluating a trained SAC agent in the RunAway-v0 environment and a TD3 agent in the Spinning-v0 environment, each performing simple yet distinct tasks. In RunAway-v0, the SAC agent's objective is to maximize the distance measured by an Ultrasonic sensor. Meanwhile, in Spinning-v0, the TD3 agent's task is to turn the 2Wheeler left or right based on indicators provided to the agent.
Training results for 2Wheeler robot for the RunAway-v0 and the Spinning-v0 environment.
This example features a DroQ agent trained in the Walker-v0 environment, which is directly implemented in the real world (left), and a DroQ agent trained in the WalkerSim-v0 simulation (right). The comparison demonstrates a successful simulation-to-reality (sim2real) transfer of the policy, where both agents are tasked with learning a forward-moving walking gait.
Training performance for Walker robot for the Walker-v0 and the WalkerSim-v0 environment.
The videos display the evaluation of two SAC agents: one trained entirely in the real-world environment, RoboArm-v0 (left), and another in the simulation environment, RoboArmSim-v0 (right). Displayed at the top of the video are four goal positions, selected to evaluate the effectiveness of the trained policies, which the agent must reach to demonstrate task completion.
Training outcomes for the RoboArm robot in both the RoboArm-v0 and RoboArmSim-v0 environments. The plot also includes the final error at the epoch's last step and the total number of episode steps required to reach the goal position.
This video presents sequences of successive evaluation trials of the SAC agent in the RoboArm-mixed-v0 environment, which integrates direct sensor information of the robot arm angles and image inputs.
Training performance of the RoboArm robot in the RoboArm_mixed-v0 environment, showing both the reward and the number of episode steps required to reach the target location.