Massive Computational Experiments, Painlessly

Stanford University, Autumn 2023
Instructors: David Donoho, Apratim Dey, and X.Y. Han
IT Maven: Andrew Donoho

Mondays 3:30 PM - 5:20 PM
Littlefield Center, Room 103

Ambitious Data Science requires massive computational experimentation; the entry ticket for a solid PhD in some fields is now to conduct experiments involving 1 Million CPU hours. Recently several groups have created efficient computational environments that make it painless to run such massive experiments. This course reviews state-of-the-art practices for doing massive computational experiments on compute clusters in a painless and reproducible manner. Students will learn how to automate their computing experiments first of all using nuts-and-bolts tools such as Perl and Bash, and later using available comprehensive frameworks such as CodaLab and Vizier which enables them to take on ambitious Data Science projects. The course also features a few guest lectures by renowned scientists in the field of Data Science. Students should have a familiarity with the need for computational experiments in their own field and be facile in some high-level computer language such as R, Matlab, or Python.