(again, reference driver did it slower, smoother transition)
(reference driver hasn't even started his maneuver)
(steering too wobbly)
(actually one and a half, agent was spawned exactly between two lanes, seemingly learned that maneuver must not be too short in order to get a reward)
(was able to wait until the car in front of him moves off)
(had to hit the brakes a bit at roundabout)
(stuck for too long, decided to move off, afraid of timeout)
(rare example considering trained agent)