LLM on HPC

Prepare environment

Prepare script

Prepare Sbatch file

Run the run.SBATCH file

Acknowledgements

Here we provide an example of how one can run a Large-language model (LLM) on NYU Greene cluster

Prepare environment

Install Singularity using instructions here (choose 50 Gb overlay file)

After launching the Singularity container in read/write mode, install transformers library using pip

pip install transformers

Prepare script

Create a python script using the following code from sections 1-9 and save it in a file called huggingface.py

Importing necessary modules

import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel

Create a list of reviews

texts = ["How do I get a replacement Medicare card?",
"What is the monthly premium for Medicare Part B?",
"How do I terminate my Medicare Part B (medical insurance)?",
"How do I sign up for Medicare?",
"Can I sign up for Medicare Part B if I am working and have health insurance through an employer?",
"How do I sign up for Medicare Part B if I already have Part A?"]

Choose the model name from huggingface’s model hub and instantiate the model and tokenizer object for the given model. We are setting output_hidden_states as True as we want the output of the model to not only have loss, but also the embeddings for the sentences.

model_name = 'cardiffnlp/twitter-roberta-base-sentiment'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_hidden_states=True)

Create the ids to be used in the model using the tokenizer object. We set the return_tensors as “pt” as we want to return the pytorch tensor of the ids.

ids = tokenizer(texts, padding=True, return_tensors="pt")

Set the device to cuda, and move the model and the tokenizer to cuda as well. Since, we will be extracting embeddings, we will only be performing a forward pass of the model and hence we will set the model to validation mode using eval():

device = 'cuda' if torch.cuda.is_available() else 'cpu'model.to(device)ids = ids.to(device)
model.eval()

Performing the forward pass and storing the output tuple in out:

with torch.no_grad():
out = model(**ids)

Extracting the embeddings of each review from the last layer:

last_hidden_states = out.last_hidden_state

For the purpose of classification, we are extracting the CLS token which is the first embedding in the embedding list for each review:

sentence_embedding = last_hidden_states[:, 0, :]

We can check the shape of the final sentence embeddings for all the reviews. The output should look like torch.Size([6, 768]), where 6 is the batch size as we input 6 reviews as shown in step 2b, and 768 is the embedding size of the RoBERTa model used.

print("Shape of the batch embedding: {}".format(sentence_embedding.shape))

Prepare Sbatch file

After saving the above code in a script called huggingface.py, create a file called run.SBATCH with the the following code:

#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=00:10:00#SBATCH --mem=64GB#SBATCH --gres=gpu#SBATCH --job-name=huggingface#SBATCH --output=huggingface.out
module purge
if [ -e /dev/nvidia0 ]; then nv="--nv"; fi
singularity exec $nv \ --overlay /scratch/netID/my_env/overlay-50G-10M.ext3:rw \ /scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \ /bin/bash -c "source /ext3/env.sh; python /scratch/netID/path-to-script/huggingface.py"

Run the run.SBATCH file

sbatch run.SBATCH

The output can be found in huggingface.out

Acknowledgements

Instructions are developed and provided by Laiba Mehnaz, a member of AIfSR

Page updated

Report abuse