Getting Started
with AMD nodes

Getting Maximum Performance from AMD GPUs

Using AMD nodes

SSH using Terminal or Command Prompt

Submitting a Job to AMD nodes

Software/Modules

Python packages

Minimal PyTorch Example

Software installation

AMD GPU Tutorials

Tutorial from NYU HPC users

Getting Maximum Performance from AMD GPUs

AMD nodes are dedicated to AI/GPU workloads. Do not run CPU-only jobs on AMD nodes.

Using AMD nodes

In order to work with AMD nodes please login to Greene. Please refer to the Accessing HPC page to read about other options.

SSH using Terminal or Command Prompt

Simply open a terminal (Linux, Mac) or the Command Prompt (Windows 10) and enter the commands:

ssh <NetID>@gw.hpc.nyu.edu ## you can skip this step if you are on the NYU Network or using the NYU VPN
ssh <NetID>@greene.hpc.nyu.edu

NOTE: When you are asked to enter password, just type it (letters will not be displayed), and then hit "Enter"

Submitting a Job to AMD nodes

To submit a job to the AMD GPUs, you submit a job the same way you would on Greene, with one major change: specify --gres=gpu:mi50:1 (or another appropriate AMD GPU model)
Use Singularity images containing pre-installed software. You can use overlay to install additional packages as described here

Software/Modules

The modules on the cluster are built for Intel CPUs, not AMD, so do not use the modules that are installed.

We recommend using Singularity and setting up an Anaconda environment to manage your packages. For information on setting up a conda environment please see our reference page.

The usable Singularity containers are located at /scratch/work/public/singularity/hudson/images and begin with "rocm4.X" - other Singularity containers are compiled for NVIDIA GPUs and should not be used.

To learn how to install python packages using singularity and overlay files please read

Singularity with Miniconda

Python packages

Use versions of packages that are designed for AMD (ROCm).

When installing, make sure the installation process will compile packages from source, instead of installing a wheel or conda precompiled package.

As an example, those who uses ternsorFlow or PyTorch can install the following packages

pip install torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
pip install tensorflow-rocm

Minimal PyTorch Example

For a simple example you can use the overlay image with ROCm versions of PyTorch and TensorFlow installed - overlay-10GB-400K-rocm-tensorflow-pytorch.ext3

OVERLAY_FILE=/scratch/work/public/examples/amd-getting-started/overlay-10GB-400K-rocm-tensorflow-pytorch.ext3SINGULARITY_IMAGE=/scratch/work/public/singularity/hudson/images/rocm4.2-ubuntu20.04.sif

singularity exec --overlay $OVERLAY_FILE $SINGULARITY_IMAGE /bin/bashsource /ext3/miniconda3/bin/activate

Using your Singularity Container in a SLURM Batch Job

Below is an example script of how to call a python script, in this case torch-test.py, from a SLURM batch job using your new Singularity image

torch-test.py: (for convenience it is available at /scratch/work/public/examples/amd-getting-started/torch-test.py )

#!/bin/env python
import torch
print(torch.__file__)print(torch.__version__)
# How many GPUs are there?print(torch.cuda.device_count())
# Get the name of the current GPUprint(torch.cuda.get_device_name(torch.cuda.current_device()))
# Is PyTorch using a GPU?print(torch.cuda.is_available())

Now we will write the SLURM job script, run-test.SBATCH, that will start our Singularity Image and call the torch-test.py script.

run-test.SBATCH: (for convenience it is available at /scratch/work/public/examples/amd-getting-started/run-test.SBATCH )

#!/bin/bash

#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=1:00:00#SBATCH --mem=2GB#SBATCH --gres=gpu:mi50:1 ## USE HUDSON CLUSTER#SBATCH --job-name=torch
module purge
singularity exec \ --overlay /scratch/work/public/examples/amd-getting-started/overlay-10GB-400K-rocm-tensorflow-pytorch.ext3:ro \ /scratch/work/public/singularity/hudson/images/rocm4.2-ubuntu20.04.sif \ /bin/bash -c "source /ext3/env.sh; \ python /scratch/work/public/examples/amd-getting-started/torch-test.py"

Run the run-test.SBATCH script

sbatch /scratch/work/public/examples/amd-getting-started/run-test.SBATCH

Check your SLURM output for results, an example is shown below

cat slurm-3752662.out

# example output:
# /ext3/miniconda3/lib/python3.8/site-packages/torch/__init__.py
# 1.9.1+rocm4.2
# 1
# VEGA
# True

Software installation

To learn how to install python packages using singularity and overlay files please read page Singularity with Miniconda

Note: An example above is using an overlay image with has being created by installing

pip install torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
pip install tensorflow-rocm

One can find up to date versions of those packages here

PyTorch (go to Install section, choose Linux - ROCm)
tensorflow-rocm · PyPI

AMD GPU Tutorials

AMD Tutorials (COVID-19 HPC Fund- Hackathon Trainings)
Introduction to Programming with ROCm

Tutorial from NYU HPC users

Page updated

Report abuse

Getting Startedwith AMD nodes

Getting Maximum Performance from AMD GPUs

Using AMD nodes

SSH using Terminal or Command Prompt

Submitting a Job to AMD nodes

Software/Modules

Python packages

Minimal PyTorch Example

Software installation

AMD GPU Tutorials

Tutorial from NYU HPC users

Getting Started
with AMD nodes