DCUR: Data Curriculum for Teaching

via Samples with Reinforcement Learning

Daniel Seita, Abhinav Gopal, Zhao Mandi, John Canny

University of California, Berkeley

Questions? dseita@andrew.cmu.edu

As of September 14, 2021, you can find a link to the full paper (in PDF) here, which includes the appendix.

Code [LINK]

The code is built on top of SpinningUp from OpenAI. The main things to know about the code are:

  • We added the following sub-package: spinup/teaching.

  • We slightly modified the TD3 code replay buffer. You can find it in: spinup/algos/pytorch/td3/td3.py

  • The scripts we used to run experiments in the paper are located in this directory: bash/paper. We run them by doing: ./bash/paper/script_name.sh.

If you have questions about how to use the code, please use the GitHub issues report tracker.

Teacher Data [LINK]

For the paper, we use one teacher per environment (per algorithm). This means there are 4 TD3 teachers and 4 SAC teachers. The zipped file is 1.63GB. If you are able to download the data and untar it, you should see the following 8 teachers that we used for this paper (note the file sizes here):

$ du -sh */*

1013M ant_sac_alpha0-2_fix_alpha/ant_sac_alpha0-2_fix_alpha_s50

1.1G ant_td3_act0-1/ant_td3_act0-1_s50

252M halfcheetah_sac_alpha0-2_fix_alpha/halfcheetah_sac_alpha0-2_fix_alpha_s40

320M halfcheetah_td3_act0-1/halfcheetah_td3_act0-1_s40

170M hopper_sac_alpha0-2_fix_alpha/hopper_sac_alpha0-2_fix_alpha_s40

234M hopper_td3_act0-1/hopper_td3_act0-1_s40

251M walker2d_sac_alpha0-2_fix_alpha/walker2d_sac_alpha0-2_fix_alpha_s50

317M walker2d_td3_act0-1/walker2d_td3_act0-1_s50

From there, you should be able to run students by specifying these teachers in command line arguments. See the bash scripts that we use for further details.

Student Results [LINK]

In the provided link, you can find results from student training (typically 5 random seeds per setting). The zipped file is 24.94GB.

Acknowledgments

We thank members of the CannyLab for helpful discussions. During portions of this research, Daniel Seita was supported by a Graduate Fellowships for Stem Diversity.