Home‎ > ‎Software Guide‎ > ‎

Jacket

Jacket

Jacket (www.accelereyes.com ) is a full run time system that provides visual computing capability as well as speed of GPU to MATLAB program by introducing data types such as GDOUBLE and GSINGLE into MATLAB. It transparently overloads CPU-based functions with GPU-based functions.

Important Notes

  • We have limited jacket licenses available for GPU nodes (click here for GPU resource). We would like users to checkout as minimum licenses as possible being considerate to other users. Click on Checking Licenses.
  • You can even request for one or two GPUs (gpus=1 or 2) depending on your need. Please use minimum considering other GPU users. Also, you should be sure that your script (job) is actually using both the GPUs in the node with gpus=2 else use gpus=1. For gpus=1, use ppn=6 and for gpus=2, use ppn=12

Installed Versions

Jacket is incorporated with CUDA and MATLAB. You may want to check the available version of CUDA and MATLAB*.
Get the version of Jacket using Interactive Job submission.

Interactive Job Submission

Load appropriate MATLAB and CUDA modules
module load matlab
module load cuda

Get the CUDA enabled node by requesting a gpufermi type of nodes: 
qsub -q gpufermi -l nodes=1:ppn=6:gpus=1 –I
You will be assigned a GPU node.

To make good use of gpus, issue the command
export CUDA_VISIBLE_DEVICES=`gpu-free`

Run MATLAB with -nodisplay option
matlab -nodisplay

At the MATLAB prompt, type:
ginfo
output:
Jacket v2.3 (build b55c105) by AccelerEyes (64-bit Linux)
License: /usr/local/jacket/jacket/engine/cn-jacket-linux-x64/jlicense.dat
Addons: MGL16, DLA
CUDA toolkit 5.0, driver 5.0 (310.19)
GPU1 Tesla M2090, 5376 MB, Compute 2.0 (single,double) (in use)
GPU2 Tesla M2090, 5376 MB, Compute 2.0 (single,double)
Memory Usage: 5305 MB free (5376 MB total)

Batch Job Submission

copy the the pbs script "cuda_mat.pbs" and matlab script file "float_pi.m" from /usr/local/doc/JACKET in your home directory.
cp /usr/local/doc/JACKET/cuda_mat.pbs .
cp /usr/local/doc/JACKET/float_pi.m .

Submit your job:
qsub cuda_mat.pbs
Find your output at Jacket_test.o<jobid> file

If you would like to assess the performance between CPU vs GPU and MATLAB vs Jacket, please refer to the section "Benchmark". You can use also use the MATLAB script "matrix_mult_add.m" available at the location /usr/local/doc/JACKET.

Checking Licenses

module load matlab
module load cuda
$MATLAB/etc/lmstat -a -c $JACKET/cn-jacket-linux-x64/jlicense.dat

Running Jacket Job in Multiple GPUs

You can use the template working script provided in a Jacket website(http://wiki.accelereyes.com/wiki/index.php/Jacket_MGL) which generate random values and perform FFTs in parallel across all available devices.

ngpu = getfield(ginfo, 'gpu_count');
for i = 1:ngpu
  gselect(i)     % switch device
  out{i} = fft(rand(2048), gsingle));
end
gsync('all')  % wait for all devices to finish

Benchmarking

Testing CPU vs GPU:

Running the testusing the example code from this website: http://ircs.seas.harvard.edu/display/USERDOCS/How+to+use+Jacket+(GPU+based+Matlab+accelerator):

The result for 0.78GB of data with error 10^-4:

Time (s), CPU vs GPU

ans =

  260.7151    8.2173



Matrix Multiplication
The codes -- runjack.m, runmatGpu.m perform matrix multiplication over several data set sizes specified in the code, with the main functions jacket and matGpu, respectively. A mean, and SD are taken over various trials, and the results plotted. The functions -- jacket, and matGpu differ in the fact that jacket uses Jacket (v 1.4) routines for matrix multiplication while matGpu uses the native MATLAB R2010b Cuda specific routines.
While in general Jacket seems to scale better with larger datasets, it slows down the CPU (non-GPU) component, whereas the native MATLAB GPU routines, as expected, perform comparably with its GPU counterpart.



The FLOPs are computed by estimating the number of floating point operations performed by each of the routines, and the time taken for the routine to completea.
To reproduce the results, copy runjack.m, jacket.m, matGpu.m, runmatGpu.m to a folder, and cd to the folder. 
And ...

~$ module load cuda
~$ module load matlab
~$ matlab -r runjack.m
~$ matlab -r runmatGpu.m

 1024x1024 2048x2048 4096x4096    8192x8192  11585x11585
 jacket.m 0.0248 s
86.6 Gflops
 0.0966s
177 Gflops
 0.3843s
357.64 Gflops
 1.6454s
668 Gflops
 3.5s
888.5 Gflops
 CPU (jacket.m) 0.0930s 0.4149s 2.7407s 24.7407s 69.6144s
 matGpu.m 0.0162s
66.28 Gflops
 0.0457s
375 Gflops
 0.3398s
404.5 Gflops
2.6771s
410 Gflops
 10.0234s
310.2 Gflops
 CPU (matGpu.m) 0.0499s 0.2219s 1.5407s 11.1573s 33.523s

aThe computation time does not include the variable initialization and the GPU data transfer time.

Troubleshooting:

If two jobs, each requesting one gpu, are simultaneously assigned to the same node (less than 10 sec delay between two jobs), the second job may be terminated with the following error without affecting the first job:

{^H??? Error using ==> gpu_entry

src/cuda/context.cpp:361: CUDA driver error: invalid device (101)




Subpages (1): CUDA MATLAB Source Code