Caffe
Caffe [1] is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
Running Caffe in HPC
Interactive Job
Request a node
srun -p gpu -C gpuk40 --gres=gpu:1 --pty bash
Caffe is compiled with GCC, please switch to GCC environment.
module swap intel gcc
Load the module
module load caffe
Using Python Wrapper:
python
Python Interactive Session:
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from PIL import Image
>>> import caffe
>>> caffe.set_device(0)
>>> caffe.set_mode_gpu()
Batch Job - Training LeNet on MNIST with Caffe
Copy the Caffe example files including a job script "HPCrun.slurm" from /usr/local/doc/CAFFE. More details on Training LeNet on MNIST with Caffe [2].
cp -r /usr/local/doc/CAFFE/* .
Submit the job:
sbatch HPCrun.slurm
output in slurm.o<jobID>
I0810 12:27:13.292641 111449 caffe.cpp:185] Using GPUs 0
I0810 12:27:13.343544 111449 caffe.cpp:190] GPU 0: Tesla K40m
I0810 12:27:13.934409 111449 solver.cpp:48] Initializing solver from parameters:
test_iter: 100
test_interval: 500
base_lr: 0.01
display: 100
max_iter: 10000
lr_policy: "inv"
...
I0810 12:27:53.956357 111449 solver.cpp:404] Test net output #0: accuracy = 0.991
I0810 12:27:53.956373 111449 solver.cpp:404] Test net output #1: loss = 0.0270981 (* 1 = 0.0270981 loss)
I0810 12:27:53.956378 111449 solver.cpp:322] Optimization Done.
I0810 12:27:53.956382 111449 caffe.cpp:222] Optimization Done.
Refer to HPC Guide to Deep Learning & HPC Software Guide for more information.
References:
[1] Caffe home
[2] Caffe Tutorial