In this project, we applied Reinforcement Learning and planning for large-scale systems. We developed an optimistic planning algorithm, SOOP (Simultaneous Optimistic Optimization for Planning), described in our publication. SOOP is an online Markovian Decision Process solver, and the implementation is in C++. It was tested in simulation and on an inverted rotary pendulum. The hardware is from Quanser, and the communication protocols with the hardware are implemented in the Quanser Hardware in the Loop C API. The mathematical model and its parameters can be found here, and the equation is:
The control loop implementation has three threads:
1) computing control sequence,
2) applying the control sequence and
3) data logging.
The method can efficiently compute longer control sequences even for a short amount of computation time. All threads are synchronized with a barrier. The Compute U and Apply U loop frequency is 20Hz, so the sampling time is 50ms. We applied a control sequence with length one, while the computed sequence length was five.
The logger thread had a higher sampling frequency, 40Hz, to save the lambda and theta angles, control signal, and reward.
Thread synchronization
It can be observed that the system stabilizes in less than 0.6 sec. This is not an optimal solution because the pendulum required more than one swing to reach the stable upright position. Fine-tuning of the control parameters and code optimization would improve the outcome to an optimal solution.
Experiments results
GIT source can be found here: https://bitbucket.org/ElodP/soopirp/
Reference publication: Lucian Busoniu, Elod Páll, Remi Munos, "Discounted near-optimal control of general nonlinear systems using optimistic planning", American Control Conference (ACC-16) 2016
Use-case application of Environmental Constraint Exploitation for robotic surface treatment
Motion Generation With Contact-Based Environmental Constraints (Thesis link)
Human-like grasping from piles leveraging granular Environmental Constraints
Reactive motion planning with contact events
Assistive robotics with POMDP online solver
Inverted rotary pendulum controlled with an optimistic planning algorithm
Vision-based autonomous navigation for railway inspection with a UAV
UAV sensor noises and system identification
Pick and place Matlab application with a Melfa RV-2AJ arm
A Five-bar mechanism with Matlab UI and control for drawing