I received my PhD from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) in May, 2006 (thesis title: Automating the Construction of Compiler Heuristics using Machine Learning). 

I joined Nvidia Research in February, 2014.  Prior to that I was a research staff member at IBM's Austin Research Laboratory.  I have worked on a range of compiler and architecture projects, including:

  • Mainstream computing is a prototype system that aims to detect anomalous program behavior (e.g., a malicious attack on a vulnerable program).  Mainstream computing is a collaborative system that automatically constructs models of mainstream program behavior, and with these models, dynamically checks a program to ensure that it is running according to expectations.  While such systems exist in various flavors, the novelty of our system is that we use a statistical approach to generate models of mainstream behavior; this lets users select an upper bound on their tolerance for annoying red flags being raised during program execution.  Based on the user's tolerance, the system will generate an appropriate set of runtime assertions to apply.  Our prototype was able to detect denial of service attacks, integer overflows, frees of uninitialized memory, boundary violations, and an injection attack.  (CGO 2010 paper, slides).
  • Predication support for out-of-processors.  The benefits of Out of Order (OOO) processing are well known, as is the effectiveness of predicated execution for unpredictable control flow. However, as previous research has demonstrated, these orthogonal techniques are at odds with one another. One common approach to reconciling their differences is to simplify the form of predication supported by the architecture. For instance, the only form of predication supported by modern OOO processors is a simple conditional move. We argue that it is the simplicity of conditional move that has allowed its widespread adoption, but we also show that this simplicity compromises its effectiveness as a compilation target. In this paper, we introduce a generalized form of hammock predication that requires few modifications to an existing processor pipeline, yet presents the compiler with abundant predication opportunities. (HPCA 2009 paper). 
  • Applications for the Power 775 (PERCS):  As part of IBM's PERCS DARPA commitment, I worked on developing and fine-tuning several benchmarks and micro-benchmarks for various incarnations of the Power 775 system (functional simulation, cycle-accurate simulation, FPGA-based hardware simulation, and the real architecture).  I was pulled in to the project specifically to help with the FFT, because based on initial runs it seemed the FFT's performance on the system was way under IBM's commitment to DARPA. In the end we substantially over-delivered.  I augmented GCC's backend and the binutils to target the Power 775's new vector instruction set (called VSX), I modified FFTW to use the VSX instructions, and I "stitched" together an FFT using FFTW and a super-fast cache oblivious transpose I wrote.  As of November 2012, the FFT set the world record for FFT GFLOPs. (ICS 2013 paper).

Please see my CV for more information, and this page for an updated list of my publications.