Here we provide an example of how one can run a Large-language model (LLM) on NYU Greene cluster
Install Singularity using instructions here (choose 50 Gb overlay file)
After launching the Singularity container in read/write mode, install transformers library using pip
Create a python script using the following code from sections 1-9 and save it in a file called huggingface.py
Importing necessary modules
Create a list of reviews
Choose the model name from huggingface’s model hub and instantiate the model and tokenizer object for the given model. We are setting output_hidden_states as True as we want the output of the model to not only have loss, but also the embeddings for the sentences.
Create the ids to be used in the model using the tokenizer object. We set the return_tensors as “pt” as we want to return the pytorch tensor of the ids.
Set the device to cuda, and move the model and the tokenizer to cuda as well. Since, we will be extracting embeddings, we will only be performing a forward pass of the model and hence we will set the model to validation mode using eval():
Performing the forward pass and storing the output tuple in out:
Extracting the embeddings of each review from the last layer:
For the purpose of classification, we are extracting the CLS token which is the first embedding in the embedding list for each review:
We can check the shape of the final sentence embeddings for all the reviews. The output should look like torch.Size([6, 768]), where 6 is the batch size as we input 6 reviews as shown in step 2b, and 768 is the embedding size of the RoBERTa model used.
After saving the above code in a script called huggingface.py, create a file called run.SBATCH with the the following code:
The output can be found in huggingface.out
Instructions are developed and provided by Laiba Mehnaz, a member of AIfSR