Parallel Computing
To learn TornadoVM, let us perform this simple exercise of rewriting the "Ray Tracing in One Weekend" tutorial in Java and TornadoVM.
Assumption:Â
You have installed the TornadoVM on your machine and made sure that you added the TornadoVM javaFlags to your IntelliJ IDEA project (See this tutorial)Â
Create a new Maven project. Then, modify the pom.xml file of the project to the following:
Here, we add the "TornadoVM" repository, to be used later.Â
In the dependencies, we add the TornadoVM dependencies.Â
We will also use Lombok and Slf4j for logging.
Output an Image
The tutorial starts with a code snippet that saves an image in PPM format. The C++ code looks like this:
C++
Source: Ray Tracing in One Weekend
 Since Java is an object-oriented language, we will use OO methodology to develop our code. We will start with an Interface for an Image. The approach is to:
Translate the sequential C++ code into sequential Java code. Then,Â
Convert the sequential Java code into parallel code based on the programming model of TornadoVM.
Let us start with an Interface that will be implemented by all classes containing the translated code.
Java
Though the tutorial only covers saving an image in PPM format, we will add two additional functionalities: saving the image in JPG format and converting image data into a BufferedImage object for further internal processing.
Next, we create an ImageCPU class that implements the interface and contains all the sequential code given in the tutorial.
Java
The class contains three float arrays, one for each color red, green, and blue—the dimension of the image (width and height). The Constructor initializes all the instance variables.
Then, we implement (override) the saveImageInPPMFormat() method for this class. This is a direct translation of the C++ code in the Ray Tracing in One Week tutorial.
Java
Let us implement the getImage() method that returns the image as a BufferedImage object.
Java
Now we can implement (override) the saveImageInJPGFormat() method.
Java
We are ready to write the code that outputs the gradient test pattern.Â
Add the following method in the ImageCPU class:
Note:Â
Our approach is that as we progress in the tutorial, each fundamental change is implemented in a stand-alone method inside the relevant class.
Java
To see some output, we now implement the Main class, which contains the public static void main() entry point of our project.
Java
After execution, two new files will be created in the resources folder: one with a JPG extension and the other with a PPM extension.Â
Both files should contain an image like the one shown here.
Convert the Sequential Code into Parallel Code and execute it using TornadoVM
Some bookkeeping:Â
We create an AppManager class to maintain application-level constants.Â
Java
In this singleton class, we will define a few variables for use later.
A snapshot of the devices on one of my machines.
Java
A slight modification to the main() method to print out the TornadoVM information
Implement a Container for Parallel Code
Java
The ImageGPU class uses the TornadoVM programming model. It utilizes FloatArray to manage off-heap memory and, in TornadoVM, to move data between host and device.
Write Parallel Code for generating the test pattern on the GPU
We will implement the parallel version of ImageCPU.render1_TestPattern() inside the ImageGPU class.
We use the TornadoVM programming model, where a TaskGraph is defined, configured, and then executed. This is done in ImageGPU.render1_TestPattern() method.
The actual parallel code is found in ImageGPU.render_GPU_TestPattern(). This is the code that will be translated to GPU code (OpenCL/PTX ... etc.). Note the @Parallel loop annotation.
Java
The main() method is changed as follows. This will execute both the sequential code and the parallel code.Â
Java
The project structure
Once the code is executed successfully, the resources folder will contain four files as shown in the screenshot. Files with the _cpu postfix are rendered using the sequential code, and files with the _gpu postfix are rendered using the parallel code.
You must have noticed that PPM files are large compared to JPG files. That is because JPG files are compressed. From now on, in this tutorial, we will save our files in JPG format unless otherwise specified.
The Vec3 Class
The vec3 class in the Ray Tracing In One Weekend uses a double array of 3 elements to save the three components of a vector. In our implementation, we will take a different approach. We will implement a Vec3 class with three individual instance variables of type float.
 Note: Although the C++ code uses double variables, we will revert to float variables in our Java implementation. This is because not all accelerator devices support double precision. Since TornadoVM works with heterogeneous accelerators, we avoid future compatibility issues by using float variables.The tutorial also declares aliases for color and point. For compatibility purposes with TornadoVM, we will not define new classes for color and point. Instead, we will add semantic getters and setters to the Vec3 class. For color, we will add getR(), getG(), and getB() methods to the Vec3 class, and for point, we'll add getX(), getY(), and getZ() methods. This way, we can use a Vec3 object and call the appropriate method based on the context without confusion.
Java
Let us use Vec3 class to render the same test pattern.
 Sequential Code:
 We will add render2_TestPattern() to the ImageCPU class.
Java
Parallel Code:
 Let us write the parallel version of the code that uses the Vec3 class. We will add the following methods to the ImageGPU class.
Java
Note:
 Vec3 is used inside the loop annotated with @Parallel. TornadoVM is able to transpile the Java Vec3 object into native code because the Vec3 class itself is composed of primitive types that are handled by TornadoVM, as per TornadoVM specifications.
 If Vec3 contains complex instance variables, i.e., standard Java objects, TornadoVM will fail to produce corresponding accelerator-related code (in OpenCL, PTX ...etc.)
Produce Images Using the New Implementation with Vec3 Class
Java
If everything works, the render2_TestPattern() in both ImageCPU and ImageGPU should output the same image as render1_TestPattern().
 The difference here is that we demonstrated that we can use Vec3 with TornadoVM.
Rays
Following along with the Ray Tracing In One Weekend tutorial, we will implement the Ray class.
Java
Sending Rays Into the Scene
As per the tutorial, we need to create a simple Camera class and encapsulate all the needed values like camera location, focal length, viewport information ... etc.
Java
Now we need to implement a utility method inside the Camera class that computes the Ray for a given pixel at position (i,j) on the screen.
Java
If the Camera, Ray, and Vec3 classes are placed in the ***.math package, the project's structure will be as shown.
Now, we can use the Camera object to render our images. Let us test our code by rendering the gradient background in Listing 10 of the "Ray Tracing in One Weekend" tutorial. Add the following method in the Camera class.
Java
To use Camera.render3_GradientColorForPixel() method, we will write the ImageCPU.render3_GradientUserCamera() method.
Sequential Code
Java
Finally, we can render the test image by using the pixel coloring code in the Camera class. Change the code in the main() method to the following:
Java
Running the code will produce the image shown here.
We are done with the sequential part of the code. Let us write the equivalent parallel code.
Parallel Code
Add the following two methods to the ImageGPU class.
Note that in ImageGPU.render_3_GPU_GradientUseCamera() method declares a camera object, then uses this object inside the nested @Parallel loops.
Java
To run the parallel code, we modify the main() method as follows:
Java
Once executed, two identical images will be produced that look like the one above. One image is produced through the sequential code executed on the CPU, and the other through the parallel code executed on the GPU.
Add a Sphere
Following the "Ray Tracing in One Weekend" tutorial, we proceed with the code that renders a red sphere in the middle of the image. We will rewrite the sequential code given in Listing 11 of the tutorial and then write the parallel version.Â
We need to add a method to the Camera class that determines whether a Ray hits a sphere or not.
Java
Sequential Code
Recall that we have shifted to writing sequential rendering code inside the Camera class.Â
Add the Camera.render4_RedSphere(int, int) method shown below to compute the color of the pixel (i, j).
Java
Color a single pixel
Add the ImageCPU.render4_RedSphere() method, as shown below, to compute the color of all pixels sequentially.
I have used lousy naming criteria for methods that might confuse the reader. Please note that there are three methods in three different classes with the same name:
Camera.render4_RedSphere(int, int) that computes the color of a single pixel.
ImageCPU.render4_RedSphere() that computes the color of all pixels in the image.
ImageGPU.render4_RedSphere() is the entry point for the parallel implementation of the rendering code.
Java
Color all pixels sequentially
Parallel CodeÂ
Add the following two methods to ImageGPU:
ImageGPU.render4_RedSphere()
ImageGPU.render4_GPU_RedSphere()
Note:
 We had to explicitly serialize the code intended to be executed on the GPU. See the code of ImageGPU.render4_GPU_RedSphere().Â
This is because, through encapsulation of code within objects, method calls will involve de-referencing (pointers to) objects, which many GPU programming models do not allow. Hence, it is better to avoid pointer references in GPU code.
Java
We are ready to execute the sequential and the parallel code to render a red sphere in the middle of the image. The main() method is modified accordingly (see below).
Java
Executing the code will produce two identical images with a red sphere in the middle of the image.
Shading with Surface Normals
Moving along in the "Ray Tracing in One Weekend" tutorial, it is noticed that the implementation of the hitSphere() method is changed. In the modification, the method returns a double value instead of a boolean value.Â
We will overload the hitSphere() method in the Camera class; however, to observe the method overload restrictions in Java, we'll need to change the order of the method parameters in the method signature.Â
Also, recall that we'll use float type instead of double in our implementation.
Java
Now, we can implement the shading of the sphere surface using the normals.Â
Sequential CodeÂ
Note that the code below uses the new implementation of the hitSphere() method.
Java
Add this method to the Camera class
Lousy naming convention
Again, note that there are three methods in three different classes with the same name:
Camera.render5_ShadedSphere(int, int) that computes the color of a single pixel.
ImageCPU.render5_ShadedSphere() that computes the color of all pixels in the image.
ImageGPU.render5_ShadedSphere() is the entry point for the parallel version of the code.
Now, add the following implementation in the ImageCPU.render5_ShadedSphre() method.
Java
Parallel Code
Once again, note that the code intended to be executed on the GPU is serialized. For example, we do not call Camera.hitSphere() method inside the @Parallel loop. Instead, we spell out the method's code in the @Parallel loop and avoid method calls on object instances.Â
This is the workaround that worked with me during my experimentations with TornadoVM. I found out that writing GPU code in a sequential manner (unfolding the code) and avoiding calls to object methods prevented execution bugs in TornadoVM.Add the following code in the ImageGPU class.
Java
To see the result of our changes, we can rewrite the main() method as given below.
Java
Executing the main() method now will produce two identical images similart to the one shown here.
One image produced through the sequential code executed on the CPU and the other through the parallel code executed on the GPU.
Simplifying the Ray-Sphere Intersection Code
The "Ray Tracing in One Weekend" tutorial optimized the Ray-Sphere intersection code to reduce operations. Let us implement this change in our code. We will write the optimized code in the Camera.hitSphereSimplified() method.
Java
Now, let us use this new optimized/simplified method to render the images sequentially and in parallel.
Sequential Code
Java
Add the following code to the Camera.render6_ShadedSphere(int, int) method. This will compute the color of a single pixel.
Java
Add the following code to ImageCPU.render6_ShadedSphere() method. This will compute the color of all pixels in the image.
Lousy naming convention
Again, note that there are three methods in three different classes with the same name:
Camera.render6_ShadedSphere(int, int) that computes the color of a single pixel.
ImageCPU.render6_ShadedSphere() that computes the color of all pixels in the image.
ImageGPU.render6_ShadedSphere() is the entry point for the parallel version of the code.
Parallel Code
Again, we unfold the code intended to be executed on the GPU inside the @Parallel loop to avoid pointer reference errors.
Java
Write the following two methods in the ImageGPU class.
Modify the main() method accordingly...
Java
The new modification will produce images identical to the ones produced in the previous step, i.e. using the render5_*() methods. However, this time, the implementation used an optimized code for computing Ray-Sphere hit.
Congratulations. You have made it this far. Now that we've started getting the hang of converting sequential code to parallel, let us continue with the tutorial. But first we need to follow the abstraction of Hittable objects as covered in the "Ray Tracing in One Weekend" tutorial