By Albert Lee
With the explosion of AI applications, the demand for parallel compute is growing exponentially. However, while specialized components like GPUs that accelerate these applications are nearly ubiquitous. Their utilization in consumer hardware usually is limited to driving the display, and rendering graphics.
WebGPU intends to bridge this gap, by giving the browser the ability to interface with the user’s GPU regardless of operating system and GPU model. However this comes with heavy security considerations. It’s critical for WGSL (WebGPU Shading Language) to be safe, meaning it can’t access resources from other applications. Otherwise it can expose a critical attack vector that can be activated simply by entering a website.
Our work centers around fuzzing, a software testing technique that generates random but valid WGSL programs. These programs will stress certain parts of the compiler in order to find bugs. We create patterns that are designed to test certain undefined behaviors such as integer overflow and division by zero. Then using register pressure which acts as a catalyst for exposing these bugs leaking data from other applications.
The primary work we draw from is the paper “Bounding Data Races in Space and Time” by Dolan et. al, which created a model for evaluating semantics for parallel programs. The concept we focus on is data races in space.
The example displayed in the paper, shows how a data race can affect memory unrelated to what the data race touches through a overoptimistic compiler not anticipating concurrent accesses.
Assume we have a statement, `b = a + 10`. And there are no other accesses to 'a' or 'b'. The logical outcome after this statement is `b = a + 10`
However, assume we have a shared variable called c which is initialized to 1. And the program is as follows:
The compiler saving two redundant computations, might reinterpret this statement as:
Then register pressure might cause a smart register allocator to save some stack space by moving c to b.
However this runs into a critical flaw. We assume that the result of b = a + 10. However, b erroneously equals 1 when the program terminates. Thus a data race happens to affect memory wholly unrelated where the data race happened!
Thread 1: Thread 2:
b = a + 10
----------------------------- -----------------------------
c = a + 10;
// some computation c = 1
b = a + 10;
----------------------------- -----------------------------
t = a + 10;
c = t;
// some computation c = 1
b = t;
----------------------------- -----------------------------
t = a + 10;
c = t;
// some computation c = 1
b = c;
An example shader being run on the website
My primary contribution is the website which interfaces with WebGPU through the Chrome Browser. https://gpuharbor.ucsc.edu/webgpu-race-testing/
This website has the capability of generating random shaders using WGSLsmith and running them in the browser.
This allows many different systems to run these fuzzed shaders allowing tests to be run on a wide variety operating systems and GPUs.
A example run occurring in WGSL
When accessing memory in WGSL, when an out of bounds memory access occurs there are two scenarios. Either it is clamped, meaning that there is a min call so that the resulting index is within the array. Or the access returns 0 no matter the contents of the array.
To exploit this, we generate this pattern that attempts to remove this min call, accessing the data beyond the buffer. An example of this is shown below.
A psuedo-code snippet showing the pattern
The central part of this pattern is the if statement. This convinces the compiler to remove the min call for clamping, then we insert a data race as the bottom. We change the index buffer to some large number to access a value beyond the data buffer. We also modify this pattern to do different actions such as integer overflow by addition/multiplication, and divide by zero.
An example shader with register pressure enabled.
However, there is an important facet when it comes to generating these shaders. GPUs have a lot of registers per individual thread which can store variables local to it. However, this is bad for pattern generation since we don’t want these variables to remain in the register, unable to be changed by other threads. So we want to encourage the compiler to swap these registers out into memory when necessary.
Thus in addition to the pattern, we pad every statement between the pattern in order to force data to be exposed to other threads. This is done by generating a lot of variables, and then using them continously through out the shader.
The primary violations we we're looking for are: mismatches, out of bounds nonzero, and uninitialized violations.
Mismatches are when unrelated sections of memory are affected by a data race in the program.
Out of bounds nonzero violations are when an buffer is accessed with an index larger than its size, and it returns a non-zero value.
Uninitialized violations are when a uninitialized variable has a value other than zero.
A sample of outcomes from our website shows a variety of violations.
The Apple M3 with Metal has some non zero out of bounds violations meaning that it extracted program data from another application that was run on the system.
And both the Nvidia GPU on Direct3D and the Apple M1 with Metal has shown some mismatches, which mean that a data race has affected memory in an unrelated location.
We are still collecting data from a large scale study being done, to characterize the behavior of a large range of systems.
Thank you Koret Foundation for the opportunity to pursue my research and allowing me to dedicate my time towards this project.
Thank you, Reese Levine and Tyler Sorensen, my mentors, for supporting me and my research. Without your guidance through so many trials, none of this would be possible.
Dolan, Stephen, et al. “Bounding data races in space and Time.” Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 11 June 2018, https://doi.org/10.1145/3192366.3192421.