Containerized Applications

Running Tesnsorflow in HPC

Copy the tensorflow files to your home directory and cd to it:

cp -r /usr/local/doc/SINGULARITY/singularity/tensorflow .

cd tensorflow

Interactive job submission

Request a GPU node with 8gb of memory

srun -p gpu -C gpup100 --gres=gpu:1 --mem=8gb --pty bash

Load the Singularity module

module load singularity

Run python Matrix Multiplication code

singularity exec -B /scratch --nv $TENSORFLOW python log-device-placement.py

Output:

2019-05-07 13:54:21.959086: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-07 13:54:22.117470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:

name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285

pciBusID: 0000:03:00.0

totalMemory: 11.91GiB freeMemory: 11.63GiB

2019-05-07 13:54:22.117517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

2019-05-07 13:54:22.714286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-05-07 13:54:22.714336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0

2019-05-07 13:54:22.714345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N

...

2019-05-07 13:54:22.716203: I tensorflow/core/common_runtime/placer.cc:927] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0

[[22. 28.]

[49. 64.]]

BATCH Job Submission

Find the tensor.slurm job file in the tensorflow directory and submit the job:

sbatch tensor.slurm

Check the output file:

cat slurm-<jobid>.out

You will get the same output.

Running RAPIDS in HPC

RAPIDS accelerates the complete data science pipeline from data ingestion and manipulation to machine learning training.
- utilizes NVIDIA CUDA and exposes GPU parallelism and high bandwidth memory speed through user-friendly interfaces like pandas, scikit-learn etc.
- With Apache Spark or Dask, RAPIDS can scale out to multi-node, Multi-GPU cluster

Scikit-Learn Vs cuML using RAPIDS

Access Markov Desktop (Interactive Apps) from ondemand.case.edu (with GPU allocation)
Get the jupyter notebook “kmeans_demo.ipynb” at RAPIDSAI Github page or copy it from /usr/local/doc/SINGULARITY/singularity/rapids

cp /usr/local/doc/SINGULARITY/singularity/rapids/kmeans_demo.ipynb .

Load Singularity module and download the latest RAPIDSAI container

module load singularity

(Optional) If you want to install more recent version of the image than existing one, pull the container. Make sure that you are using storage space other than home to avoid quota violation.

singularity pull docker://rapidsai/rapidsai:latest

Add this environment variable:

export SINGULARITYENV_TINI_SUBREAPER=1

Run the RAPIDSAI container.
It is included as an environment variable $RAPIDSAI (check with "module display singularity")

singularity run --nv -B /mnt $RAPIDSAI

Open Jupyter Lab

/conda/envs/rapids/bin/jupyter-lab --allow-root --ip=0.0.0.0 &

You will be prompted to copy and paste on of the URLs

Open the Firefox browser on the same node and type (or simply past link address of) one of the URLs in the browser

http://classt01:8889/?token=xxxx

Start executing the python commands in the jupyter notebook

Although both methods can find the same centroid (within threshold value), the cuML performance is much faster.
The graph shows the results: blue-filled circle for scikit-learn and red circle for cuML.

Running Cryolo in HPC

CrYOLO is a fast and accurate particle picking procedure. It's based on convolutional neural networks and utilizes the popular You Only Look Once (YOLO) object detection system. The detailed Guide is available here.