Getting Started
with AMD nodes
Getting Maximum Performance from AMD GPUs
AMD nodes are dedicated to AI/GPU workloads. Do not run CPU-only jobs on AMD nodes.
Using AMD nodes
In order to work with AMD nodes please login to Greene. Please refer to the Accessing HPC page to read about other options.
SSH using Terminal or Command Prompt
Simply open a terminal (Linux, Mac) or the Command Prompt (Windows 10) and enter the commands:
ssh <NetID>@gw.hpc.nyu.edu ## you can skip this step if you are on the NYU Network or using the NYU VPNssh <NetID>@greene.hpc.nyu.edu
NOTE: When you are asked to enter password, just type it (letters will not be displayed), and then hit "Enter"
Submitting a Job to AMD nodes
To submit a job to the AMD GPUs, you submit a job the same way you would on Greene, with one major change: specify --gres=gpu:mi50:1 (or another appropriate AMD GPU model)
Use Singularity images containing pre-installed software. You can use overlay to install additional packages as described here
Software/Modules
The modules on the cluster are built for Intel CPUs, not AMD, so do not use the modules that are installed.
We recommend using Singularity and setting up an Anaconda environment to manage your packages. For information on setting up a conda environment please see our reference page.
The usable Singularity containers are located at /scratch/work/public/singularity/hudson/images and begin with "rocm4.X" - other Singularity containers are compiled for NVIDIA GPUs and should not be used.
To learn how to install python packages using singularity and overlay files please read
Python packages
Use versions of packages that are designed for AMD (ROCm).
When installing, make sure the installation process will compile packages from source, instead of installing a wheel or conda precompiled package.
As an example, those who uses ternsorFlow or PyTorch can install the following packages
pip install torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
pip install tensorflow-rocm
Minimal PyTorch Example
For a simple example you can use the overlay image with ROCm versions of PyTorch and TensorFlow installed - overlay-10GB-400K-rocm-tensorflow-pytorch.ext3
OVERLAY_FILE=/scratch/work/public/examples/amd-getting-started/overlay-10GB-400K-rocm-tensorflow-pytorch.ext3SINGULARITY_IMAGE=/scratch/work/public/singularity/hudson/images/rocm4.2-ubuntu20.04.sifsingularity exec --overlay $OVERLAY_FILE $SINGULARITY_IMAGE /bin/bashsource /ext3/miniconda3/bin/activate
Using your Singularity Container in a SLURM Batch Job
Below is an example script of how to call a python script, in this case torch-test.py, from a SLURM batch job using your new Singularity image
torch-test.py: (for convenience it is available at /scratch/work/public/examples/amd-getting-started/torch-test.py )
#!/bin/env pythonimport torch
print(torch.__file__)print(torch.__version__)
# How many GPUs are there?print(torch.cuda.device_count())
# Get the name of the current GPUprint(torch.cuda.get_device_name(torch.cuda.current_device()))
# Is PyTorch using a GPU?print(torch.cuda.is_available())
Now we will write the SLURM job script, run-test.SBATCH, that will start our Singularity Image and call the torch-test.py script.
run-test.SBATCH: (for convenience it is available at /scratch/work/public/examples/amd-getting-started/run-test.SBATCH )
#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=1:00:00#SBATCH --mem=2GB#SBATCH --gres=gpu:mi50:1 ## USE HUDSON CLUSTER#SBATCH --job-name=torch
module purge
singularity exec \ --overlay /scratch/work/public/examples/amd-getting-started/overlay-10GB-400K-rocm-tensorflow-pytorch.ext3:ro \ /scratch/work/public/singularity/hudson/images/rocm4.2-ubuntu20.04.sif \ /bin/bash -c "source /ext3/env.sh; \ python /scratch/work/public/examples/amd-getting-started/torch-test.py"
Run the run-test.SBATCH script
sbatch /scratch/work/public/examples/amd-getting-started/run-test.SBATCHCheck your SLURM output for results, an example is shown below
cat slurm-3752662.out# example output:
# /ext3/miniconda3/lib/python3.8/site-packages/torch/__init__.py
# 1.9.1+rocm4.2
# 1
# VEGA
# True
Software installation
To learn how to install python packages using singularity and overlay files please read page Singularity with Miniconda
Note: An example above is using an overlay image with has being created by installing
pip install torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
pip install tensorflow-rocm
One can find up to date versions of those packages here
PyTorch (go to Install section, choose Linux - ROCm)
AMD GPU Tutorials
AMD Tutorials (COVID-19 HPC Fund- Hackathon Trainings)