Resources for Classes
We support educational activities at NYU in several ways.
If you are an instructor or TA of a course, please take a look at the options below. If you think we could help you to teach classes in other ways, let us know (Contact us).
Please note, we do not have a separated cluster dedicated for courses.
General HPC (Greene)
Classes may use Greene cluster, if this is absolutely necessary due to need of large datasets, heavy computations, etc.
Notes:
Jobs are submitted to general queue
Students of your class can use Greene cluster to run their jobs, as any other user of HPC cluster.
Cluster tends to get more busy during period of exams, and thus the wait time to get resource may get significantly longer
If you would like for students to use cluster, all of them have to get HPC account. Please submit this form to do that (use this form only if you are full-time faculty)
HPC has maintenance schedule which may interfere with your class exams, so it is recommended NOT to rely on HPC cluster for exams
To share data with students, data can be put on /scratch/ and access can be controlled using ACL
Users can use cluster in a classical way (using terminal), or using OOD web interface
There is no cost for using Greene cluster for your classes
There is no student limit. Any NYU student with valid netID can obtain HPC account and get access
One can find more information here
Special pre-allocation of resources
It is possible in some rear cases to get special allocation of nodes for specific class. However this is very uncommon and requires strong justification, as this prevents other users from using hardware.
Sometimes resources can be allocated in cloud instead, to have lower impact on other users' work, using cloud-bursting approach. Please contact us for details, if you believe your class needs this kind of resources. Additional cost may be associated with this approach
Hadoop (Dataproc): Big Data / HDFS / Spark
If you are an instructor or student of a Big Data class, and require students to learn Hadoop, Dataproc is a potential option. You can teach and learn the ecosystem of Hadoop/Spark technologies used in most companies working with large data.
For work on large datasets traditional HPC relies on specialized shared file systems (like Lustre, and BeeGFS) and a high speed network. In contrast most companies in industry rely on Hadoop/HDFS. Hadoop provides a perfect model of horizontal scaling - when data input/output requirements grow (for example number of users reading website or sending queries) additional nodes can be added to allow for faster read/write. Hadoop's map-reduce approach allows one to write code which brings computations to the same nodes where data is stored, and thus the impact of the relatively slower inter-node communications becomes less important.
Notes
Any NYU instructor may start using Hadoop cluster for their class. However, we request instructors notify us before the beginning of the semester on their plan of using the Dataproc cluster for their class
If you would like for students to use Dataproc, you can submit a bulk request form and your students will be added to the cluster.
There is no pre-allocated resources for specific classes
To share data with students it can be put on HDFS file system and access can be controlled using ACL
There is no cost for using Hadoop cluster for your classes
There is no student limit. Any NYU student with valid NetID can obtain an HPC account and get access
After your class has been added to the cluster, you can manage which students have access by adding/removing students from a Google Group. The Google Group will have a name of the form Dataproc-<Course Catalog Number>-<Fall/Summer/Spring/Winter>-<Last Two Digits of Year>@nyu.edu.
Dedicated course coding environment (JupyterHub on GCP)
Dedicated environment provides some advantages comparing to students working on their laptops, or on HPC cluster. Advantages include high availability, no HPC queue, simple management of environment. For more info (including "who is paying") look at JupyterHub at ResearchCloud. We support classes of various sizes - from very small to classes of hundreds of students
Google Cloud (GCP)
Instructors: If you need GCP resources for your classes, please email us. There may be cost associated with the GCP services.
Students: if this is the first time you are using GCP as NYU student, you may be eligible to obtain Google credits
If this is the case, please apply for credits using your NYU account (https://cloud.google.com/free/)
List of classes using our services
Other resources you may find useful
Data Robot (Auto ML) - Academic program application