GPU acceleration: 20x faster

A Free-Vortex Wake model can get quite time-intensive for modeling additional turn filaments, multiple rotors or ground effect. This is because each wake filament marker interacts with all the other markers, yielding a complexity of order N-squared. One code acceleration technique is to use a fast-multipole method to reduce the complexity to an order N log(N). Another method is to use Nvidia's GPU programming using CUDA. Using the hundreds of processors on these relatively inexpensive devices, multiple simultaneous computations can be executed at once. Here, I've given a brief overview of how code executes on the GPU.

A thread is a process that performs a certain task (e.g. addition)

A block is a set of threads that perform the same task on different data. To add two sets of numbers, we can assign each addition operation to a single thread, and execute this operation in parallel.

A grid is a set of blocks, like a matrix of threads.

Grids and blocks can have threads in up to 3 dimensions each.

In a practical implementation, we can immediately replace 3 nested loops (do-loops in fortran) with thread/block combinations. (We could also do more, with loop unrolling and reassignment) Hardware-wise, the GPU processors are slower than the CPU processors, but more than make up for individual speeds through their sheer numbers.

Operations like addition/accumulation of induced velocities can be modified to use binary trees, and this addition operation can be performed in logarithmic time

Each of these accumulation operations on various Lagrangian wake markers can occur simultaneously, yielding another dimension for parallelism.

The final outcome on an Nvidia 560Ti vs. a 3.2GHz i7 (single core) is a 20x speedup for double precision operations. When performing operations on single precision, the speedup (over CPU serial on double precision) is 40x. Using this GPU, the Maryland Free-Vortex Wake simulation was used with an in-house flight dynamics solver and optimization methodology to predict the performance and improve the rotor design for a coaxial compound helicopter. The speedup afforded by the GPU is critical for obtaining realistic runtimes on a single desktop. Shown below is an animation of the coaxial rotor wake in hover. The two colors are used to distinguish the wake from individual rotors.

The following movies show the wake geometry for a coaxial rotor configuration in high-speed forward flight.

Side view of coaxial rotor wake in high speed forward flight

Top view of coaxial rotor wake in high speed forward flight