CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by Nvidia, the compute engine of Nvidia graphic processing units. See more information on this Wikipedia page.
Being a Java programmer I wanted to explore how to make a java program talk to nVidia hardware using CUDA C code. CUDA hardware is good in handling matrices, I had a problem that could be modeled using matrices and thought I'd use CUDA for matrix processing. For book keeping and handling code logic I feel more comfortable with Java's various packages, such as SWING for GUI. So I thought of designing my solution such that the calculation extensive parts are done on the parallel CUDA hardware while other non-calculation extensive part are done in JVM.
The CUDA documentation provided by nVidia was very helpful in setting up the development environment on my Ubuntu machine. The toolkit and SDK are essential to run tests in emulation mode. Oh, I should mention that I do not have a CUDA enabled hardware thus the whole experience, listed here, is in emulation mode. According to CUDA documentation what works on emulation mode should work on the hardware.
An interesting tutorial on how to get started with your first CUDA program is posted by llpanorama. The first step for anyone to get started with CUDA is to get a simple example, like the one presented by llpanorama, running on the machine. The tutorials explain how to produce an executable that runs routines on the CUDA enabled hardware, which was not what I wanted to do. I wanted to call CUDA-code, that is code that is executed on the GPU cores, from a Java program through JNI. This meant that I needed to create a shared library, or a dll, that contained CUDA-code and link that to my Java code. Being inexperienced in C programming finding out how this could be done took some time.
The following diagram gives an overview of the steps I used to achieve my goal.
The boxes colored in pink above indicate where code was written.
To start with, a Main.java file was created containing simple code that declares a native function
declares three arrays,
initializes two of them
and calls the JNI function
The command line used in this step is the conventional javacc command
Which creates the .class file.
The following command line is used
Since our code is scattered over two files, and in real life it may be scattered over several files. The best strategy to compile all the code into one library is to compile each source code into an object file then combine all object file into one library. This step uses the gcc compiler to compile the native C code.
Since .cu files contain CUDA code not C code, it cannot be compiled using gcc. Instead, we need to use nvcc, which comes with the CUDA toolkit.
UPDATE: In some linux installations you may need to do Step 4 before Step 3 because proxy.c contains a call to a function in kernel_code.cu
Step 5: Combine all object files into one dynamic linkable library
Now that all of our object files are created, hopefully if you didn't encounter any compilation errors. We need to combine these object files into a library. For Linux machines these files usually have the .so extension (.dll extension in Windows). We use gcc again to build the library using the following command:
The -lcufftemu and -lcublasemu switches are there just in case your code uses calls to functions in those libraries. Though my sample code here does not, I've included these switches in the gcc command for future use. The postfix "emu" indicates emulation mode. If you have the CUDA hardware these need to be changed to -lcufft and -lcublas respectively.
If everything goes well you'll find a library will exist by the name program.so in your current folder. Congratulations!! The hard part is over. Now all we need to do is execute the Java program.
The static section in the Java code that loads the dynamic linked library is:
To execute the code all we need to do is run the Main program by:
and enter the name of the library, in our case it is program.so
I've created a simple Makefile to make my life easy. The usage is as follows
to execute Steps 1 and 2. That is, compiling the .java file into .class file and then building the .h file that contains the native headers.
to compile the .cu file and the .c files then combine them into one program.so library file
You may need to modify the Makefile according to your system configurations.