Installing Local Python Modules

There several ways to compile a Python module. We will present you some of the options.

For this example, we will use Python 3.7.0 from the GCC + OpenMPI hierarchy. For other Python 3.X.Y the procedure is the same. Note that some Python versions, such as python/3.8.6, are not compiled with openmpi, and do not require loading openmpi. See output of 'module avail python' from the 'gcc/6.3.0, openmpi/2.0.1' module hierarchy.

Before You Start

Request a compute node:

srun --mem=8gb --pty /bin/bash

Then load Python 3.7.0:

module swap intel gcc

module load python/3.7.0

NOTE: sometimes you will need to load extra packages, in addition to Python, in order to compile certain packages. Please follow the steps at the beginning of the Software Installation guide related to selecting a module hierarchy and loading the base module.

Setting up a PYTHONUSERBASE

If you do not have one, create a directory where to store the installed Python modules. Take into account that you will need a different directory for each version of Python, even if the Python versions share the same branch. In other words, a package installed for Python 3.6.6 may not work with Python 3.7.0.

This will be your PYTHONUSERBASE.

For this example I will create the following PYTHONUSERBASE:

export PYTHONUSERBASE=$HOME/.usr/local/python/3.7.0

mkdir -p $PYTHONUSERBASE

This will set up our PYTHONUSERBASE.

Installing modules with PIP

The most common mistake that our HPC user make when installing Python packages is trying to install a Python packages like they would do it in their local machines:

pip install pkg-name # This is a mistake in HPC

This will never work in our HPC cluster since users do not have permissions to install packages directly in our Python path.

However, as a user, you can use PIP to install extra packages in your local home directory. As an example, we will install the package called twill.

First of all, we will setup the environment variable PYTHONUSERBASE. Then we we install the package with pip.

export PYTHONUSERBASE=$HOME/.usr/local/python/3.7.0

pip install --user twill

This will install the package twill into the PYTHONUSERBASE directory. To check that twill has been installed, we could run the following command:

ls $PYTHONUSERBASE/lib/python3.7/site-packages

Output:

twill twill-3.0.dist-info

We can also check that twill is installed with the command:

pip freeze | grep twill # use "pip freeze --user" for locally installed packages; can sue "pip list" 

output:

twill==3.0

Case Study: Installing and Running PyTorch 

Request a compute node and load a python module (e.g. python/3.8.6) following the section 'Before you Start"

Go to PyTorch page (https://pytorch.org/) and select - PyTorch Build :stable, Your OS :Linux, Package: pip, Language:Python, Compute Platform: CUDA-11.1 which will generate a pip3 command similar. Run this command using --user flag.

pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html --user

Check the Pytorch packages:

pip3 freeze --user

output:

torch==1.9.1+cu111

torchaudio==0.9.1

torchvision==0.10.1+cu111


Follow the Tutorial - Deep Learning with PyTorch - https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html . You can also find the python scripts at /usr/local/doc/PYTORCH/.

Install a dependent package matplotlib

pip3 install matplotlib --user

Request a GPU node with 1 GPU (use --x11 for GUI or you can use OnDemand Desktop for visualization with 8gb of memory)

srun --x11 --mem=8gb -p class --gres=gpu:1 --pty /bin/bash # for Markov Cluster 

srun --x11 --mem=8gb -p gpu -C gpup100 --gres=gpu:1 --pty /bin/bash # for Rider Cluster

Load python/3.8.6 and cuda/11.2 modules

module swap intel gcc

module load cuda/11.2  python/3.8.6

Run the python script:

python3 <python-script>

Installing packages from source

Another way to download and install Python modules is from source. Some modules will gain extra performance if we compile them. For this particular example, we will compile NumPy.

Installing NumPy with pip (pip install --user numpy) on HPC is usually a bad idea since the pre-compiled package does not offer the performance that our compiled package can offer. In this case, we will install Numpy 1.20.3 from source. We have NumPy 1.18.1 installed in our Python module. This NumPy version was compiled using the BLAS and LAPACK libraries that come with the Intel Math Kernel Library (MKL), available through the module 'base'.

Download the source from this link numpy-1.20.3.zip. Then extract the package and move into the source directory.

unzip numpy-1.20.3.zip

cd numpy-1.20.3

Set up your PYTHONUSERBASE variable:

export PYTHUSERBASE=$HOME/.usr/local/python/3.7.0

Then compile the package with the following command (notice the dot at the end of the command):

pip install --user .

Output:

Installing collected packages: numpy

  Running setup.py install for numpy ... done

Successfully installed numpy-1.20.3

PIP will install the package for you acting as a black box. This process will take a can take a long time. If you want to see all the processes that pip is running, use the verbose mode:

pip install -v --user .

Using The Installed Modules

If your package is just a library that you run within Python, then just setting up the environment variable PYTHONUSERBASE is enough. However, if your package also includes binaries, you need to include those in the path.

The best way to set these variables up is by using a module file.

For our local directory of packages, we will create a module file that will set those variables for us. We will call the module "python-modules" and we will set the version of the module to the version of Python we are using.

We need to create the the following directory (just once): 

PYTHONMODULES=$HOME/.usr/local/share/modulefiles/python-modules

mkdir -p $PYTHONMODULES

cd $PYTHONMODULES

We will then create a file named 3.7.0-gcc.lua with the following content:

-- This is Lua module file for our local Python 

-- modules.

--

-- To use it just run

--    module load python-modules/3.7.0-gcc

--

load("gcc/6.3.0","openmpi/2.0.1","python/3.7.0")

pushenv("PYTHONUSERBASE",pathJoin(os.getenv("HOME"),".usr/local/python/3.7.0"))

prepend_path("PATH",pathJoin(os.getenv("HOME"),".usr/local/python/3.7.0/bin"))

To use the module just run

module load python-modules/3.7.0-gcc

The PYTHONUSERBASE directory, when using this template, will be /home/<case_ID>/.usr/local/python/3.7.0

Using Virtual Environments

This is the simplest way. However, it may take an important part of your disk quota. This topic is covered in or guide for Python Virtual Environments.

Pip install with PyPi

For information about using pip install with PyPi, visit  https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-from-pypi

New Version of Python

You should be able to install python from source - https://devguide.python.org/getting-started/setup-building/, if you need the latest version that is not available as a module in HPC. The python source releases are available at https://www.python.org/downloads/source/. You just need to set the Prefix. For example:


./configure --prefix=~/csds312


But use the existing latest ones if possible.