Radeon GPU Profiler (2017 - current) AMD BDC Innovation Winner 2017
Multiple AMD groups collaborated to develop a new powerful graphics optimization tool using custom, built-in, hardware thread-tracing correlated with rich data from the software stack. This tool is quickly becoming the must-have graphics optimization tool. I oversaw the cross-functional development of many fundamental building blocks for the tool including its interoperability with RenderDoc. Check out the video and blog post.
Technologies: Direct3D 12, Vulkan, SQTT, GCN, C/C++ and Qt.
ROCm GDB and CodeXL 2.0 (2016)
Continuing on the AMD initiative to develop debug solutions for AMD platforms, I am leading global teams to extend the GCN core debugger technology from supporting APUs on HSA to the discrete GPUs running on Radeon Open Compute platforms (ROCm). This core debugger technology is used by ROCm GDB and CodeXL 2.0 to allow debugging of GPU kernels on ROCm. ROCm GDB can support debugging all language runtimes built on top of ROCm. Check out the blog post.
Technologies: ROCm, HCC, HIP, GCN, C/C++ and GDB internals.
HSA Debugger (2012-2015) (Release Announcement)
I led AMD hardware, firmware, driver, runtime, compiler and tools engineers and architects to define and develop the software and hardware foundation to enable hardware-based debugger based on AMD Graphics Core Next (GCN) GPUs. We extended GDB with the ability to seamlessly debug host CPU and GPU kernel code for HSA applications as the first application of this core debugger technology.
The hardware-based implementation provided in the GCN debugger technology is a vast improvement over the previous debugger implementation provided in the CodeXL OpenCL™ debugger which relies on repeated kernel recompilation and replay. Using the GCN debugger technology, we are able to stop all the massively parallel threads in the GPU at a breakpoint, inspect the machine state by reading registers and memory, and then resume and execute all the GPU threads. The instruction pointer at the ISA level can be correlated with the HSAIL line.
Technologies: HSAIL, GCN, C/C++ and GDB.
OpenCL™ GPU Kernel Debugger (2011) AMD Executive Spotlight Award 2011
I led the GPU Compute Tools team that develops the core software technology to enable debugging OpenCL™ kernel on AMD Radeon GPUs and APUs. You can view variables, set breakpoints and run to breakpoints in an OpenCL kernel executing on the GPUs. These operations can be performed inside the familiar Visual Studio environment and with a single machine. This is the first OpenCL GPU kernel debugger in the industry. This core technology has been incorporated into CodeXL (previously gDebugger), a comprehensive tool for debugging OpenCL applications. The team was recognized with the company-wide AMD Executive Spotlight Award in 2011.
Technologies: C++ and OpenCL™.
AMD APP Profiler (2009-2011) AMD Executive Spotlight Award 2009
The AMD APP Profiler (formerly ATI Stream Profiler) is a Microsoft Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application’s performance. As the lead developer of the project, I led the design, development of the tool and coordinated the efforts from multiple groups in AMD.
Together with the OpenCL™ team, I was awarded the AMD Executive Spotlight Award 2009 for outstanding contribution to AMD's continuing success.
Technologies: C#/.Net/VSIP (client), C++, OpenCL™ (backend) and DirectCompute (Backend).
AMD APP KernelAnalyzer (2008-2010)
Due to the popularity of GPU ShaderAnalyzer, in 2008 I started the development of AMD APP KernelAnalyzer, a tool for statically analyzing the performance of stream kernels (OpenCL™ C, Brook+ and IL kernels). KernelAnalyzer will compile down stream kernels into the actual instructions used to program the GPU. It then performs a static analysis of the instruction stream and is able to report back to the developer a variety of information, including register usage, ALU utilization and memory contention, all without having to run the application on actual hardware.
Technologies: C/C++, MFC, OpenCL™.
AMD Shader Debugger (2009)
I led the development of the HLSL shader debugger component which allows you to debug HLSL pixel shaders inside your application. You can view variables, constants and values at each pixel. You can also insert and run to breakpoints, and step forwards and backwards.
Technologies: C#/.Net (client), C++ and DirectX 10 (server).
AMD GPU ShaderAnalyzer (2008)
In 2008, I took over the ownership of GPU ShaderAnalyzer which is a static analysis tool for GPU shaders on ATI graphics cards. You can view many shader statistics (ALU, tex, registers, throughput and estimated cycles) and the disassembly of the generated hardware shader.
Technologies: C/C++, MFC, OpenGL, DirectX 9 and 10.
AMD GPU MeshMapper (2008)
GPU MeshMapper is a tool to generate normal, displacement and AO maps from low and high resolution meshes. You can visualize the resulting maps applied to the low resolution mesh inside the tool. It also supports ATI hardware tessellation. In 2008, I led the development of version 1.1 to add many new features (multi-core support, seamless mapping, seamless tangent space) and fixed many bugs.
Technologies: C++, MFC, and DirectX 9.
ATI Tessellation Library (2009)
This library allows developers to add ATI hardware tessellation support in their game engines. In 2009, I picked up the maintenance of this library and successfully shepherded it through the initial public release.
Technologies: C++ and DirectX 9.
AMD Tootle (2008) (Open source in 2016)
In 2008, I assumed the ownership and led the development of AMD Tootle v2.0, a tool to optimize 3D meshes for pixel overdraw and vertex cache performance.
Technologies: C/C++ and DirectX 9 (a Linux version comes with software rendering).
DesignMentor (2001)
When I was a graduate student at Michigan Tech University, I was the project leader and developer of DesignMentor v2.0 which is a tool to help students learn how to use Bezier, B-spline and NURBS technology. This is a popular tool that has been used in many universities as a teaching aid.
Technologies: C and OpenGL.
I have also architected and co-developed many systems in the past including a seamless 3D mesh parameterization system (STA), a novel 3D walkthrough system that combines visibility and level of detail (vLOD), a vertex compression system for GPU, a geometry processing system for 3D scanned objects, a camera projector system for surgeons (NOMAD) and compiler optimization system.