Foundation model accuracy test: flip overs -> 0, stands normally -> 1.
start of training (1/3)
outputed scores per frame
acc: 73%
mid of training (2/3)
outputed scores per frame
acc: 70%
end of training (3/3)
outputed scores per frame
acc: 96%
Averaged reward over the whole episode during the training. They all increase, indicating the RL optimization part is doing well.Â