Getting Started
with AMD nodes

Getting Maximum Performance from AMD GPUs

AMD nodes are dedicated to AI/GPU workloads. Do not run CPU-only jobs on AMD nodes.

Using AMD nodes

In order to work with AMD nodes please login to Greene. Please refer to the Accessing HPC page to read about other options.

SSH using Terminal or Command Prompt

Simply open a terminal (Linux, Mac) or the Command Prompt (Windows 10) and enter the commands:

ssh <NetID>@gw.hpc.nyu.edu ## you can skip this step if you  are on the NYU Network or using the NYU VPN
ssh <NetID>@greene.hpc.nyu.edu

NOTE: When you are asked to enter password, just type it (letters will not be displayed), and then hit "Enter"

Submitting a Job to AMD nodes

Software/Modules

The modules on the cluster are built for Intel CPUs, not AMD, so do not use the modules that are installed.

We recommend using Singularity and setting up an Anaconda environment to manage your packages. For information on setting up a conda environment please see our reference page.

The usable Singularity containers are located at /scratch/work/public/singularity/hudson/images and begin with "rocm4.X" - other Singularity containers are compiled for NVIDIA GPUs and should not be used.

To learn how to install python packages using singularity and overlay files please read

Python packages

Use versions of packages that are designed for AMD (ROCm). 

When installing, make sure the installation process will compile packages from source, instead of installing a wheel or conda precompiled package. 

 As an example, those who uses ternsorFlow or PyTorch can install the following packages


pip install torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
pip install tensorflow-rocm

Minimal PyTorch Example

For a simple example you can use the overlay image with ROCm versions of PyTorch and TensorFlow installed -  overlay-10GB-400K-rocm-tensorflow-pytorch.ext3

OVERLAY_FILE=/scratch/work/public/examples/amd-getting-started/overlay-10GB-400K-rocm-tensorflow-pytorch.ext3SINGULARITY_IMAGE=/scratch/work/public/singularity/hudson/images/rocm4.2-ubuntu20.04.sif

singularity exec --overlay $OVERLAY_FILE $SINGULARITY_IMAGE /bin/bashsource /ext3/miniconda3/bin/activate

Using your Singularity Container in a SLURM Batch Job

Below is an example script of how to call a python script, in this case torch-test.py, from a SLURM batch job using your new Singularity image

torch-test.py: (for convenience it is available at  /scratch/work/public/examples/amd-getting-started/torch-test.py )

#!/bin/env python
import torch
print(torch.__file__)print(torch.__version__)
# How many GPUs are there?print(torch.cuda.device_count())
# Get the name of the current GPUprint(torch.cuda.get_device_name(torch.cuda.current_device()))
# Is PyTorch using a GPU?print(torch.cuda.is_available())

Now we will write the SLURM job script, run-test.SBATCH, that will start our Singularity Image and call the torch-test.py script.

run-test.SBATCH: (for convenience it is available at  /scratch/work/public/examples/amd-getting-started/run-test.SBATCH )

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=1:00:00#SBATCH --mem=2GB#SBATCH --gres=gpu:mi50:1  ## USE HUDSON CLUSTER#SBATCH --job-name=torch
module purge
singularity exec \     --overlay /scratch/work/public/examples/amd-getting-started/overlay-10GB-400K-rocm-tensorflow-pytorch.ext3:ro \            /scratch/work/public/singularity/hudson/images/rocm4.2-ubuntu20.04.sif \             /bin/bash -c "source /ext3/env.sh; \            python /scratch/work/public/examples/amd-getting-started/torch-test.py"

Run the run-test.SBATCH script

sbatch /scratch/work/public/examples/amd-getting-started/run-test.SBATCH

Check your SLURM output for results, an example is shown below

cat slurm-3752662.out

# example output:
# /ext3/miniconda3/lib/python3.8/site-packages/torch/__init__.py
# 1.9.1+rocm4.2
# 1
# VEGA
# True

Software installation

To learn how to install python packages using singularity and overlay files please read page Singularity with Miniconda 

Note: An example above is using an overlay image with has being created by installing


pip install torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
pip install tensorflow-rocm

One can find up to date versions of those packages here

AMD GPU Tutorials

Tutorial from NYU HPC users