Furuta Pendulum

Dynamic Programming Continuous Fitted Value Iteration (DP cFVI):

Real Time DP Continuous Fitted Value Iteration (RTDP cFVI):

SAC-N - SAC with Gaussian Initial State Distribution

SAC-N & UDR - SAC with Gaussian Initial State Distribution & Uniform Domain Ramdomization

SAC-U - SAC with Uniform Initial State Distribution

SAC-U & UDR - SAC with Uniform Initial State Distribution & Domain Randomization

PPO-N - PPO with Gaussian Initial State Distribution

PPO-N & UDR - PPO with Gaussian Initial State Distribution & Domain Randomization

PPO-U - PPO with Uniform Initial State Distribution

PPO-U & UDR - PPO with Uniform Initial State Distribution & Domain Randomization

DDPG-N - DDPG with Gaussian Initial State Distribution

DDPG-N & UDR - DDPG with Gaussian Initial State Distribution & Domain Randomization

DDPG-U - DDPG with Uniform Initial State Distribution

DDPG-U & UDR - DDPG with Uniform Initial State Distribution & Domain Randomization