Containerized Applications

Running Tesnsorflow in HPC

Copy the tensorflow files to your home directory and cd to it:

cp -r /usr/local/doc/SINGULARITY/singularity/tensorflow .

cd tensorflow

Interactive job submission

Request a GPU node with 8gb of memory

srun -p gpu -C gpup100 --gres=gpu:1 --mem=8gb --pty bash

Load the Singularity module

module load singularity

Run python Matrix Multiplication code

singularity exec -B /scratch --nv $TENSORFLOW python log-device-placement.py

Output:

2019-05-07 13:54:21.959086: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2019-05-07 13:54:22.117470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:

name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285

pciBusID: 0000:03:00.0

totalMemory: 11.91GiB freeMemory: 11.63GiB

2019-05-07 13:54:22.117517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

2019-05-07 13:54:22.714286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-05-07 13:54:22.714336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0

2019-05-07 13:54:22.714345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N

...

2019-05-07 13:54:22.716203: I tensorflow/core/common_runtime/placer.cc:927] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0

[[22. 28.]

[49. 64.]]

BATCH Job Submission

Find the tensor.slurm job file in the tensorflow directory and submit the job:

sbatch tensor.slurm

Check the output file:

cat slurm-<jobid>.out

You will get the same output.


Running RAPIDS in HPC


Scikit-Learn Vs cuML using RAPIDS


cp /usr/local/doc/SINGULARITY/singularity/rapids/kmeans_demo.ipynb .

module load singularity

singularity pull docker://rapidsai/rapidsai:latest

export SINGULARITYENV_TINI_SUBREAPER=1  

singularity run --nv -B /mnt $RAPIDSAI

/conda/envs/rapids/bin/jupyter-lab --allow-root --ip=0.0.0.0 &

You will be prompted to copy and paste on of the URLs

 http://classt01:8889/?token=xxxx

Although both methods can find the same centroid (within threshold value), the cuML performance is much faster.
The graph shows the results: blue-filled circle for scikit-learn and red circle for cuML.

Running Cryolo in HPC

CrYOLO is a fast and accurate particle picking procedure. It's based on convolutional neural networks and utilizes the popular You Only Look Once (YOLO) object detection system.  The detailed Guide is available here.