The cart pole experiment, based off of the gymnasium environment of the same name, was used to test multiple mid-experiment Unity-Python communications and the pipeline from training a deep neural net to performing inference in Unity with Barracuda.
The agent is based off of the first Cart Pole agent in "Deep Reinforcement Learning In Action" by Zai, Alexander and Brown, Brandon (106-108).
Below is an interactive demo showcasing the most recent agent.
The reward system currently used is identical to gymnasium's cart pole (+1 point is given for every update the cart pole is still in a valid state).
The agent can move a maximum of 5 meters from its starting position; the pole can swing at most 28 degrees from its starting angle.
Currently, the agent is good at remaining within bounds when there is no wind. Is struggles when there is wind, but it has learned to correctly tilt the pole towards the source of the wind.
The current implementation of wind simply continuously adds a force to the pole. It does not account for exposed surface area nor tries to get the cart to match a velocity (as real wind would).
Currently, the agent is not provided information about the wind. It is only provided the same information as the traditional cart pole experiment (cart position and velocity and pole angle and angular velocity).
The agent is trained using Softmax to force exploration, but Softmax is bypassed in the demo to always choose the action the agent deems optimal (this is due to an issue with importing onnx files with Barracuda).Â
When the agent is disabled in the demo, it is replaced with an agent who makes random decisions.