Hongyuan Liu

Why Graphics Processing Units are Slow at Executing Non-Deterministic Finite Automata and How to Make Them Faster

Computer Science | William & Mary

Co-Author: S.P. Pai

Advisor: Adwait Jog

Abstract

Non-deterministic Finite Automata (NFA) are space efficient finite state machines that have significant applications in domains such as pattern matching and data analytics. In this paper, we investigate why the Graphics Processing Unit (GPU) – a massively parallel computational device with the highest memory bandwidths available on general-purpose processors – cannot efficiently execute NFAs. First, we identify excessive data movement in the GPU memory hierarchy and describe how to effectively use the GPU’s memory hierarchy to privatize reads to limit this excessive data movement. We also show that in several cases, indirect table lookups in NFAs can be eliminated by converting memory reads into computation, to further reduce the number of memory reads. Although our optimization techniques significantly alleviate these memoryrelated bottlenecks, a side effect of these techniques is the static assignment of work to cores. This leads to poor compute utilization, where GPU cores are wasted on idle NFA states. Therefore, we propose a new dynamic scheme that effectively balances compute utilization with reduced memory usage. Our combined optimizations provide a significant improvement over the previous state-of-the-art GPU implementations of NFAs. Moreover, they enable current GPUs to outperform the domainspecific accelerator for NFAs (i.e., Automata Processor) across several applications while performing within an order of magnitude for the rest of the applications.

Bio

Hongyuan Liu is a third-year Ph.D. candidate in the Department of Computer Science at William & Mary. His research interests are computer architectures and highperformance computing.

Liu, Hongyuan.pptx

* these slides contain Notes that can be accessed when opened in Microsoft PowerPoint