Parallel Computing

To learn TornadoVM, let us perform this simple exercise of rewriting the "Ray Tracing in One Weekend" tutorial in Java and TornadoVM.

Assumption: 

You have installed the TornadoVM on your machine and made sure that you added the TornadoVM javaFlags to your IntelliJ IDEA project (See this tutorial) 

Create a new Maven project. Then, modify the pom.xml file of the project to the following:

Here, we add the "TornadoVM" repository, to be used later. 

In the dependencies, we add the TornadoVM dependencies. 

We will also use Lombok and Slf4j for logging.

Output an Image

The tutorial starts with a code snippet that saves an image in PPM format. The C++ code looks like this:

 Since Java is an object-oriented language, we will use OO methodology to develop our code. We will start with an Interface for an Image. The approach is to:

Let us start with an Interface that will be implemented by all classes containing the translated code.

Java

Though the tutorial only covers saving an image in PPM format, we will add two additional functionalities: saving the image in JPG format and converting image data into a BufferedImage object for further internal processing.

Next, we create an ImageCPU class that implements the interface and contains all the sequential code given in the tutorial.

Java

The class contains three float arrays, one for each color red, green, and blue—the dimension of the image (width and height). The Constructor initializes all the instance variables.

Then, we implement (override) the saveImageInPPMFormat() method for this class. This is a direct translation of the C++ code in the Ray Tracing in One Week tutorial.

Java

Let us implement the getImage() method that returns the image as a BufferedImage object.

Java

Now we can implement (override) the saveImageInJPGFormat() method.

Java

We are ready to write the code that outputs the gradient test pattern. 

Add the following method in the ImageCPU class:

Note: 

Our approach is that as we progress in the tutorial, each fundamental change is implemented in a stand-alone method inside the relevant class.

Java

To see some output, we now implement the Main class, which contains the public static void main() entry point of our project.

Java

After execution, two new files will be created in the resources folder: one with a JPG extension and the other with a PPM extension. 

Both files should contain an image like the one shown here.

Convert the Sequential Code into Parallel Code and execute it using TornadoVM

Some bookkeeping: 

We create an AppManager class to maintain application-level constants. 

Java

In this singleton class, we will define a few variables for use later.

A snapshot of the devices on one of my machines.

Java

A slight modification to the main() method to print out the TornadoVM information

Implement a Container for Parallel Code

Java

The ImageGPU class uses the TornadoVM programming model. It utilizes FloatArray to manage off-heap memory and, in TornadoVM, to move data between host and device.

Write Parallel Code for generating the test pattern on the GPU

We will implement the parallel version of ImageCPU.render1_TestPattern() inside the ImageGPU class.

We use the TornadoVM programming model, where a TaskGraph is defined, configured, and then executed. This is done in ImageGPU.render1_TestPattern() method.

The actual parallel code is found in ImageGPU.render_GPU_TestPattern(). This is the code that will be translated to GPU code (OpenCL/PTX ... etc.). Note the @Parallel loop annotation.

Java

The main() method is changed as follows. This will execute both the sequential code and the parallel code. 

Java

The project structure

Once the code is executed successfully, the resources folder will contain four files as shown in the screenshot. Files with the _cpu postfix are rendered using the sequential code, and files with the _gpu postfix are rendered using the parallel code.

You must have noticed that PPM files are large compared to JPG files. That is because JPG files are compressed. From now on, in this tutorial, we will save our files in JPG format unless otherwise specified.

The Vec3 Class

The vec3 class in the Ray Tracing In One Weekend uses a double array of 3 elements to save the three components of a vector. In our implementation, we will take a different approach. We will implement a Vec3 class with three individual instance variables of type float.

 Note: Although the C++ code uses double variables, we will revert to float variables in our Java implementation. This is because not all accelerator devices support double precision. Since TornadoVM works with heterogeneous accelerators, we avoid future compatibility issues by using float variables.

The tutorial also declares aliases for color and point. For compatibility purposes with TornadoVM, we will not define new classes for color and point. Instead, we will add semantic getters and setters to the Vec3 class. For color, we will add getR(), getG(), and getB() methods to the Vec3 class, and for point, we'll add getX(), getY(), and getZ() methods. This way, we can use a Vec3 object and call the appropriate method based on the context without confusion.

Java

Let us use Vec3 class to render the same test pattern.

 Sequential Code:

 We will add render2_TestPattern() to the ImageCPU class.

Java

Parallel Code:

 Let us write the parallel version of the code that uses the Vec3 class. We will add the following methods to the ImageGPU class.

Java

Note:

 Vec3 is used inside the loop annotated with @Parallel. TornadoVM is able to transpile the Java Vec3 object into native code because the Vec3 class itself is composed of primitive types that are handled by TornadoVM, as per TornadoVM specifications.

 If Vec3 contains complex instance variables, i.e., standard Java objects, TornadoVM will fail to produce corresponding accelerator-related code (in OpenCL, PTX ...etc.)

Produce Images Using the New Implementation with Vec3 Class

Java

If everything works, the render2_TestPattern() in both ImageCPU and ImageGPU should output the same image as render1_TestPattern().

 The difference here is that we demonstrated that we can use Vec3 with TornadoVM.

Rays

Following along with the Ray Tracing In One Weekend tutorial, we will implement the Ray class.

Java

Sending Rays Into the Scene

As per the tutorial, we need to create a simple Camera class and encapsulate all the needed values like camera location, focal length, viewport information ... etc.

Java

Now we need to implement a utility method inside the Camera class that computes the Ray for a given pixel at position (i,j) on the screen.

Java

If the Camera, Ray, and Vec3 classes are placed in the ***.math package, the project's structure will be as shown.

Now, we can use the Camera object to render our images. Let us test our code by rendering the gradient background in Listing 10 of the "Ray Tracing in One Weekend" tutorial. Add the following method in the Camera class.

Java

From this point forward, we will write the sequential rendering code inside the Camera class and call this code from within ImageCPU class. For the parallel rendering code, we will take a different approach shown later in this tutorial.

To use Camera.render3_GradientColorForPixel() method, we will write the ImageCPU.render3_GradientUserCamera() method.

Sequential Code

Java

Finally, we can render the test image by using the pixel coloring code in the Camera class. Change the code in the main() method to the following:

Java

Running the code will produce the image shown here.

We are done with the sequential part of the code. Let us write the equivalent parallel code.

Parallel Code

Add the following two methods to the ImageGPU class.

Note that in ImageGPU.render_3_GPU_GradientUseCamera() method declares a camera object, then uses this object inside the nested @Parallel loops.

Java

To run the parallel code, we modify the main() method as follows:

Java

Once executed, two identical images will be produced that look like the one above. One image is produced through the sequential code executed on the CPU, and the other through the parallel code executed on the GPU.

Add a Sphere

Following the "Ray Tracing in One Weekend" tutorial, we proceed with the code that renders a red sphere in the middle of the image. We will rewrite the sequential code given in Listing 11 of the tutorial and then write the parallel version. 

We need to add a method to the Camera class that determines whether a Ray hits a sphere or not.

Java

Sequential Code


Recall that we have shifted to writing sequential rendering code inside the Camera class. 

Add the Camera.render4_RedSphere(int, int) method shown below to compute the color of the pixel (i, j).

Java

Color a single pixel

Add the ImageCPU.render4_RedSphere() method, as shown below, to compute the color of all pixels sequentially.

I have used lousy naming criteria for methods that might confuse the reader. Please note that there are three methods in three different classes with the same name:

Java

Color all pixels sequentially

Parallel Code 

Add the following two methods to ImageGPU:

Note:

 We had to explicitly serialize the code intended to be executed on the GPU. See the code of ImageGPU.render4_GPU_RedSphere(). 

This is because, through encapsulation of code within objects, method calls will involve de-referencing (pointers to) objects, which many GPU programming models do not allow. Hence, it is better to avoid pointer references in GPU code.

Java

We are ready to execute the sequential and the parallel code to render a red sphere in the middle of the image. The main() method is modified accordingly (see below).

Java

Executing the code will produce two identical images with a red sphere in the middle of the image.

Shading with Surface Normals

Moving along in the "Ray Tracing in One Weekend" tutorial, it is noticed that the implementation of the hitSphere() method is changed. In the modification, the method returns a double value instead of a boolean value. 

We will overload the hitSphere() method in the Camera class; however, to observe the method overload restrictions in Java, we'll need to change the order of the method parameters in the method signature. 

Also, recall that we'll use float type instead of double in our implementation.

Java

Now, we can implement the shading of the sphere surface using the normals. 

Sequential Code 

Note that the code below uses the new implementation of the hitSphere() method.

Java

Add this method to the Camera class

Lousy naming convention

Again, note that there are three methods in three different classes with the same name:

Now, add the following implementation in the ImageCPU.render5_ShadedSphre() method.

Java

Parallel Code

Once again, note that the code intended to be executed on the GPU is serialized. For example, we do not call Camera.hitSphere() method inside the @Parallel loop. Instead, we spell out the method's code in the @Parallel loop and avoid method calls on object instances. 

This is the workaround that worked with me during my experimentations with TornadoVM. I found out that writing GPU code in a sequential manner (unfolding the code) and avoiding calls to object methods prevented execution bugs in TornadoVM.

Add the following code in the ImageGPU class.

Java

To see the result of our changes, we can rewrite the main() method as given below.

Java

Executing the main() method now will produce two identical images similart to the one shown here.

One image produced through the sequential code executed on the CPU and the other through the parallel code executed on the GPU.

Simplifying the Ray-Sphere Intersection Code

The "Ray Tracing in One Weekend" tutorial optimized the Ray-Sphere intersection code to reduce operations. Let us implement this change in our code. We will write the optimized code in the Camera.hitSphereSimplified() method.

Java

Now, let us use this new optimized/simplified method to render the images sequentially and in parallel.

Sequential Code

Java

Add the following code to the Camera.render6_ShadedSphere(int, int) method. This will compute the color of a single pixel.

Java

Add the following code to ImageCPU.render6_ShadedSphere() method. This will compute the color of all pixels in the image.

Lousy naming convention

I do feel embarrassed opting for lousy naming conventions. 

Again, note that there are three methods in three different classes with the same name:

Parallel Code

Again, we unfold the code intended to be executed on the GPU inside the @Parallel loop to avoid pointer reference errors.

Java

Write the following two methods in the ImageGPU class.

Modify the main() method accordingly...

Java

The new modification will produce images identical to the ones produced in the previous step, i.e. using the render5_*() methods. However, this time, the implementation used an optimized code for computing Ray-Sphere hit.

Congratulations. You have made it this far. Now that we've started getting the hang of converting sequential code to parallel, let us continue with the tutorial. But first we need to follow the abstraction of Hittable objects as covered in the "Ray Tracing in One Weekend" tutorial