Caffe

Caffe [1] is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

Running Caffe in HPC

Interactive Job

Request a node

srun -p gpu -C gpuk40 --gres=gpu:1 --pty bash

Caffe is compiled with GCC, please switch to GCC environment.

module swap intel gcc

Load the module

module load caffe

Using Python Wrapper:

python

Python Interactive Session:

Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

Anaconda is brought to you by Continuum Analytics.

Please check out: http://continuum.io/thanks and https://anaconda.org

>>> import numpy as np

>>> import matplotlib.pyplot as plt

>>> from PIL import Image

>>> import caffe

>>> caffe.set_device(0)

>>> caffe.set_mode_gpu()

Batch Job - Training LeNet on MNIST with Caffe

Copy the Caffe example files including a job script "HPCrun.slurm" from /usr/local/doc/CAFFE. More details on Training LeNet on MNIST with Caffe [2].

cp -r /usr/local/doc/CAFFE/* .

Submit the job:

sbatch HPCrun.slurm

output in slurm.o<jobID>

I0810 12:27:13.292641 111449 caffe.cpp:185] Using GPUs 0

I0810 12:27:13.343544 111449 caffe.cpp:190] GPU 0: Tesla K40m

I0810 12:27:13.934409 111449 solver.cpp:48] Initializing solver from parameters:

test_iter: 100

test_interval: 500

base_lr: 0.01

display: 100

max_iter: 10000

lr_policy: "inv"

...

I0810 12:27:53.956357 111449 solver.cpp:404] Test net output #0: accuracy = 0.991

I0810 12:27:53.956373 111449 solver.cpp:404] Test net output #1: loss = 0.0270981 (* 1 = 0.0270981 loss)

I0810 12:27:53.956378 111449 solver.cpp:322] Optimization Done.

I0810 12:27:53.956382 111449 caffe.cpp:222] Optimization Done.

Refer to HPC Guide to Deep Learning & HPC Software Guide for more information.

References:

[1] Caffe home

[2] Caffe Tutorial