www.accelereyes.com ) is a full run time system that provides visual computing capability as well as speed of GPU to MATLAB program by introducing data types such as GDOUBLE and GSINGLE into MATLAB. It transparently overloads CPU-based functions with GPU-based functions.
Jacket is incorporated with CUDA and MATLAB. You may want to check the available version of CUDA and MATLAB*.
Get the version of Jacket using Interactive Job submission.
Load appropriate MATLAB and CUDA modules
Get the CUDA enabled node by requesting a gpufermi type of nodes:
You will be assigned a GPU node.
To make good use of gpus, issue the command
Run MATLAB with -nodisplay option
At the MATLAB prompt, type:
copy the the pbs script "cuda_mat.pbs" and matlab script file "float_pi.m" from /usr/local/doc/JACKET in your home directory.
Submit your job:
Find your output at Jacket_test.o<jobid> file
If you would like to assess the performance between CPU vs GPU and MATLAB vs Jacket, please refer to the section "Benchmark". You can use also use the MATLAB script "matrix_mult_add.m" available at the location /usr/local/doc/JACKET.
You can use the template working script provided in a Jacket website(http://wiki.accelereyes.com/wiki/index.php/Jacket_MGL) which generate random values and perform FFTs in parallel across all available devices.
Testing CPU vs GPU:
Running the testusing the example code from this website: http://ircs.seas.harvard.edu/display/USERDOCS/How+to+use+Jacket+(GPU+based+Matlab+accelerator):
The codes -- runjack.m, runmatGpu.m perform matrix multiplication over several data set sizes specified in the code, with the main functions jacket and matGpu, respectively. A mean, and SD are taken over various trials, and the results plotted. The functions -- jacket, and matGpu differ in the fact that jacket uses Jacket (v 1.4) routines for matrix multiplication while matGpu uses the native MATLAB R2010b Cuda specific routines.
While in general Jacket seems to scale better with larger datasets, it slows down the CPU (non-GPU) component, whereas the native MATLAB GPU routines, as expected, perform comparably with its GPU counterpart.
The FLOPs are computed by estimating the number of floating point operations performed by each of the routines, and the time taken for the routine to completea.
To reproduce the results, copy runjack.m, jacket.m, matGpu.m, runmatGpu.m to a folder, and cd to the folder.
aThe computation time does not include the variable initialization and the GPU data transfer time.
If two jobs, each requesting one gpu, are simultaneously assigned to the same node (less than 10 sec delay between two jobs), the second job may be terminated with the following error without affecting the first job: