LEARN
Cooperative Reinforcement Learning
LEARN
Cooperative Reinforcement Learning
Team: Lucus Wu, Olamide Falayi, Erol Ozel, Zachary Akycampong, Zoey Zhang
Faculty/Graduate Students: Kaiqing Zhang, Souradip Chakraborty, Xiangyu Liu
I4C Teaching Assistant: Christopher Song
How did we try to solve this problem?
Create Environment
Train Agent
Attempt to Increase Risk-Seeking by Changing Reward Values
Determine Conclusion
Creating an environment (in our case), consists of creating the road with the pre-programmed blue cars which move according to a simple algorithm. In reinforcement learning, the environment is what the agent (the car), interacts with and is what gives the car its reward, punishment, and new state (or options for the next step).
So, How Did Our Cars Perform?
Looking at the intersection environnment, our car agent performed poorly. We wanted it to move quickly through the intersection without any collisions, but in both videos that was not always the case. Because the car agent did not meet this goal, we can determine that our original reward incentive was not good for promoting risk-seeking (like speeding up through the intersection).
The car did not perform very well. How do we know? The green car (our agent) hardly moved across the intersection.
The car also did not perform very well. How do we know again? The green car (our agent) moved slowly across the intersection and ended up crashing into another car.
Here, we are looking at the roundabout environment, but again, our car agent performed quite poorly. In one video, it hardly moves, in another it crashes into a car, and in another it moves very slowly: three things opposite to our goal. And so, we can determine that our original reward incentive was bad at promoting good car performance and risk-seeking.
So, getting to our question...
If we change our cars' reward incentive, can we make it reach our goal (to drive quickly through the roundabout/intersection)?
We tested this out by exponentiating our rewards!
Here's our process and outcome:
Exponentiating involves raising e to the power of our current rewards times β (beta: some factor that you choose).
For our roundabout environment, we can clearly see how it improved from before exponentiating to after, not only did it move much faster, but it avoid collisions better and was able to change lanes very well.
After exponentiating
In this multi-agent version of the intersection, two cars controlled by different agents must go straight through an intersection. They get a penalty for crashing and a reward for arriving at the other side.
The regular agent is too risk-averse: It brakes and doesn't go through the intersection.
However, as the exponentiated agent is more risk seeking, it is able to successfully navigate the intersection.