Installing Local Python Modules
There several ways to compile a Python module. We will present you some of the options.
For this example, we will use Python 3.7.0 from the GCC + OpenMPI hierarchy. For other Python 3.X.Y the procedure is the same. Note that some Python versions, such as python/3.8.6, are not compiled with openmpi, and do not require loading openmpi. See output of 'module avail python' from the 'gcc/6.3.0, openmpi/2.0.1' module hierarchy.
Before You Start
Request a compute node:
srun --mem=8gb --pty /bin/bash
Then load Python 3.7.0:
module swap intel gcc
module load python/3.7.0
NOTE: sometimes you will need to load extra packages, in addition to Python, in order to compile certain packages. Please follow the steps at the beginning of the Software Installation guide related to selecting a module hierarchy and loading the base module.
Setting up a PYTHONUSERBASE
If you do not have one, create a directory where to store the installed Python modules. Take into account that you will need a different directory for each version of Python, even if the Python versions share the same branch. In other words, a package installed for Python 3.6.6 may not work with Python 3.7.0.
This will be your PYTHONUSERBASE.
For this example I will create the following PYTHONUSERBASE:
export PYTHONUSERBASE=$HOME/.usr/local/python/3.7.0
mkdir -p $PYTHONUSERBASE
This will set up our PYTHONUSERBASE.
Installing modules with PIP
The most common mistake that our HPC user make when installing Python packages is trying to install a Python packages like they would do it in their local machines:
pip install pkg-name # This is a mistake in HPC
This will never work in our HPC cluster since users do not have permissions to install packages directly in our Python path.
However, as a user, you can use PIP to install extra packages in your local home directory. As an example, we will install the package called twill.
First of all, we will setup the environment variable PYTHONUSERBASE. Then we we install the package with pip.
export PYTHONUSERBASE=$HOME/.usr/local/python/3.7.0
pip install --user twill
This will install the package twill into the PYTHONUSERBASE directory. To check that twill has been installed, we could run the following command:
ls $PYTHONUSERBASE/lib/python3.7/site-packages
Output:
twill twill-3.0.dist-info
We can also check that twill is installed with the command:
pip freeze | grep twill # use "pip freeze --user" for locally installed packages; can sue "pip list"
output:
twill==3.0
Case Study: Installing and Running PyTorch
Request a compute node and load a python module (e.g. python/3.8.6) following the section 'Before you Start"
Go to PyTorch page (https://pytorch.org/) and select - PyTorch Build :stable, Your OS :Linux, Package: pip, Language:Python, Compute Platform: CUDA-11.1 which will generate a pip3 command similar. Run this command using --user flag.
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html --user
Check the Pytorch packages:
pip3 freeze --user
output:
torch==1.9.1+cu111
torchaudio==0.9.1
torchvision==0.10.1+cu111
Follow the Tutorial - Deep Learning with PyTorch - https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html . You can also find the python scripts at /usr/local/doc/PYTORCH/.
Install a dependent package matplotlib
pip3 install matplotlib --user
Request a GPU node with 1 GPU (use --x11 for GUI or you can use OnDemand Desktop for visualization with 8gb of memory)
srun --x11 --mem=8gb -p class --gres=gpu:1 --pty /bin/bash # for Markov Cluster
srun --x11 --mem=8gb -p gpu -C gpup100 --gres=gpu:1 --pty /bin/bash # for Rider Cluster
Load python/3.8.6 and cuda/11.2 modules
module swap intel gcc
module load cuda/11.2 python/3.8.6
Run the python script:
python3 <python-script>
Installing packages from source
Another way to download and install Python modules is from source. Some modules will gain extra performance if we compile them. For this particular example, we will compile NumPy.
Installing NumPy with pip (pip install --user numpy) on HPC is usually a bad idea since the pre-compiled package does not offer the performance that our compiled package can offer. In this case, we will install Numpy 1.20.3 from source. We have NumPy 1.18.1 installed in our Python module. This NumPy version was compiled using the BLAS and LAPACK libraries that come with the Intel Math Kernel Library (MKL), available through the module 'base'.
Download the source from this link numpy-1.20.3.zip. Then extract the package and move into the source directory.
unzip numpy-1.20.3.zip
cd numpy-1.20.3
Set up your PYTHONUSERBASE variable:
export PYTHUSERBASE=$HOME/.usr/local/python/3.7.0
Then compile the package with the following command (notice the dot at the end of the command):
pip install --user .
Output:
Installing collected packages: numpy
Running setup.py install for numpy ... done
Successfully installed numpy-1.20.3
PIP will install the package for you acting as a black box. This process will take a can take a long time. If you want to see all the processes that pip is running, use the verbose mode:
pip install -v --user .
Using The Installed Modules
If your package is just a library that you run within Python, then just setting up the environment variable PYTHONUSERBASE is enough. However, if your package also includes binaries, you need to include those in the path.
The best way to set these variables up is by using a module file.
For our local directory of packages, we will create a module file that will set those variables for us. We will call the module "python-modules" and we will set the version of the module to the version of Python we are using.
We need to create the the following directory (just once):
PYTHONMODULES=$HOME/.usr/local/share/modulefiles/python-modules
mkdir -p $PYTHONMODULES
cd $PYTHONMODULES
We will then create a file named 3.7.0-gcc.lua with the following content:
-- This is Lua module file for our local Python
-- modules.
--
-- To use it just run
-- module load python-modules/3.7.0-gcc
--
load("gcc/6.3.0","openmpi/2.0.1","python/3.7.0")
pushenv("PYTHONUSERBASE",pathJoin(os.getenv("HOME"),".usr/local/python/3.7.0"))
prepend_path("PATH",pathJoin(os.getenv("HOME"),".usr/local/python/3.7.0/bin"))
To use the module just run
module load python-modules/3.7.0-gcc
The PYTHONUSERBASE directory, when using this template, will be /home/<case_ID>/.usr/local/python/3.7.0
Using Virtual Environments
This is the simplest way. However, it may take an important part of your disk quota. This topic is covered in or guide for Python Virtual Environments.
Pip install with PyPi
For information about using pip install with PyPi, visit https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-from-pypi
New Version of Python
You should be able to install python from source - https://devguide.python.org/getting-started/setup-building/, if you need the latest version that is not available as a module in HPC. The python source releases are available at https://www.python.org/downloads/source/. You just need to set the Prefix. For example:
./configure --prefix=~/csds312
But use the existing latest ones if possible.