LLM on HPC
Here we provide an example of how one can run a Large-language model (LLM) on NYU Greene cluster
Prepare environment
Install Singularity using instructions here (choose 50 Gb overlay file)
After launching the Singularity container in read/write mode, install transformers library using pip
Prepare script
Create a python script using the following code from sections 1-9 and save it in a file called huggingface.py
Importing necessary modules
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel
Create a list of reviews
"What is the monthly premium for Medicare Part B?",
"How do I terminate my Medicare Part B (medical insurance)?",
"How do I sign up for Medicare?",
"Can I sign up for Medicare Part B if I am working and have health insurance through an employer?",
"How do I sign up for Medicare Part B if I already have Part A?"]
Choose the model name from huggingface’s model hub and instantiate the model and tokenizer object for the given model. We are setting output_hidden_states as True as we want the output of the model to not only have loss, but also the embeddings for the sentences.
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
Create the ids to be used in the model using the tokenizer object. We set the return_tensors as “pt” as we want to return the pytorch tensor of the ids.
ids = tokenizer(texts, padding=True, return_tensors="pt")
Set the device to cuda, and move the model and the tokenizer to cuda as well. Since, we will be extracting embeddings, we will only be performing a forward pass of the model and hence we will set the model to validation mode using eval():
model.eval()
Performing the forward pass and storing the output tuple in out:
out = model(**ids)
Extracting the embeddings of each review from the last layer:
For the purpose of classification, we are extracting the CLS token which is the first embedding in the embedding list for each review:
We can check the shape of the final sentence embeddings for all the reviews. The output should look like torch.Size([6, 768]), where 6 is the batch size as we input 6 reviews as shown in step 2b, and 768 is the embedding size of the RoBERTa model used.
Prepare Sbatch file
After saving the above code in a script called huggingface.py, create a file called run.SBATCH with the the following code:
module purge
if [ -e /dev/nvidia0 ]; then nv="--nv"; fi
singularity exec $nv \ --overlay /scratch/netID/my_env/overlay-50G-10M.ext3:rw \ /scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \ /bin/bash -c "source /ext3/env.sh; python /scratch/netID/path-to-script/huggingface.py"
Run the run.SBATCH file
sbatch run.SBATCHThe output can be found in huggingface.out
Acknowledgements
Instructions are developed and provided by Laiba Mehnaz, a member of AIfSR