Distinguished Engineer at NVIDIA

Summary

 John is a Distinguished Engineer at NVIDIA  where he contributes to their Mathematical Libraries effort.  Until January 2024, he was also the manager of the cuQuantum team, whose mission is to provide developers with a library that can be used to speed up quantum simulations based on state vector and tensor network methods by orders of magnitude on NVIDIA GPUs.

Immediately before joining NVIDIA, he was a Principal Engineer at Amazon Web Services (AWS), focused on creating high-performance cloud-computing solutions (HPCCS) that anticipated customer demand.  At the same time, he served on the Advisory Board of SheQuantum.

Prior to joining Amazon in June 2020, John was the manager of the Qiskit software team and a member of the Quantum Theory, Software, and Applications group inside of IBM Quantum Research. From 2016 - 2017, Dr. Gunnels was the manager of the Reasoning Systems effort in IBM Research AI and a Distinguished Research Staff Member, simultaneously contributing to IBM's quantum computing effort. During 2015-2016, John was the Program Director of the Computational Sciences Center and a Senior Manager in the Data-Centric Systems department at the Thomas J. Watson Research Center.  For several years prior to that (2001-2015), he was a member of (and managed) the High Performance Analytics team as a Distinguished Research Staff Member in the Mathematical Sciences department. 

John has extensive experience as an individual contributor (researcher/engineer), manager, senior manager, and program manager, working with technology research and development across HPC, AI, Edge,  and Quantum computing, focusing on the design and deployment of high performance computing solutions (hardware and software) at scale.  He was also instrumental in the co-design, bring-up, deployment, and acceptance of several supercomputing systems.

While at IBM Research, John was a member of the IBM Academy of Technology, named Master Inventor multiple times, received seven Outstanding Technical Achievement Awards, an IBM Corporate Award, The Gerstner Award for Client Excellence, the 2020 Supercomputing Test of Time Award, the 2020 SIAM/SIAG Best Paper Award, a 2016 IPDPS Best Paper Award, and three Gordon Bell Awards.  He is a prolific inventor, with dozens of patents related to high-performance computing, natural language processing, quantum simulation, distributed reasoning, device technology, cache technologies, green technology, and edge computingJohn has the singular distinction of writing the Linpack benchmark (HPL) code for the IBM Blue Gene/L, IBM Blue Gene/P, IBM Blue Gene/Q, and IBM Roadrunner supercomputers, as well as being a member of eight Gordon Bell Prize finalist teams.

His research interests include massive scale analytics, parallel algorithms and programming, quantum computing, code verification, high performance library specification and development, large scale scientific computing, the development and optimization of large-scale applications related to graph theory and machine learning algorithms, and the effective use of both exotic and commercially successful hardware accelerators.

Professional Experience

Distinguished Engineer, Mathematical Libraries, NVIDIA Corp. (02/2024 - Present)

Distinguished Engineer & Manager, Mathematical Libraries/Quantum Computing, NVIDIA Corp. (09/2022 - 02/2024)

Principal Engineer, Amazon Web Services (06/2020 - 09/2021)

Distinguished Research Staff Member and Manager, Quantum Computing, IBM T.J. Watson Research Center, Yorktown Heights, NY (11/2017 – 05/2020)

Distinguished Research Staff Member and Manager, Artificial Intelligence Research, IBM T.J. Watson Research Center, Yorktown Heights, NY (04/2016 – 11/2017)

Distinguished Research Staff Member, Program Director, and Manager, Systems Research/Data-Centric Systems, IBM T.J. Watson Research Center, Yorktown Heights, NY (02/2015 – 04/2016)

Distinguished Research Staff Member and Manager, Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, NY (10/2001 – 02/2015)

Education

Ph.D. in Computer Science, The University of Texas at Austin, 2001

Honors and Awards

IBM

Academic

Research Interests

The design and analysis of parallel algorithms

Automated algorithm and code generation, hybridization, and analysis

API design

Compiler verification/automated theorem proving

Benchmarking metrics and implementations

Graphics processors (and other processing units) as general purpose processors

Program visualization

AI (Machine Learning/Probabilistic Reasoning)

Quantum computing

Skills

Programming Languages: C, C++, Mathematica, Fortran, Python, Java, Scheme, Matlab, LISP, Pascal, Assembly (various), Prolog

Operating Systems: Windows, AIX, Linux, BLRTS, Solaris, UNIX, MacOS

Programs/Packages: MPI, OpenMP/Pthreads, ANTLR/Sorcerer, BLAS, CUDA, LAPACK/ScaLAPACK, Eclipse, Visual Studio, GNU (gcc/g77), VisualAge

Recent Patents Issued (2021 - Present)

Co-Scheduling Quantum Computing Jobs
Issued April 30, 2024 | US 11972321

Establishing a Logical Connection Between an Indirect Utterance and a Transaction
Issued Apr 9, 2024 | US 11954613

Quantum Sparse Fourier Transform
Issued Mar 19, 2024 | US 11934479

Quantum Circuit Decomposition by Integer Programming
Issued Oct 3, 2023 | US 11775721


Cached Result Use Through Quantum Gate Rewrite

Issued May 9, 2023 | US 11645203


Thread Embedded Cache Management

Issued Feb 28, 2023 | US 11593167


Physical Cursor Control in Microfluidic Display Devices

Issued Oct 25, 2022 | US 11481069 


Adaptive Error Correction in Quantum Computing

Issued Oct 18, 2022 | US 11475189


Quantum Walk for Community Clique Detection

Issued Sep 27, 2022 | US 11455562


Adaptive Error Correction in Quantum Computing

Issued Sep 13, 2022 | US 11443086


Sketching Using a Hybrid Quantum-Classical System

Issued June 28, 2022 | US 11372895


Liquid Bottle Processing and Refilling

Issued June 21, 2022 | US 11367324

Simulating Quantum Circuits
Issued Feb 15, 2022 | US 11250190

Quantum Space Distance Estimation for Classifier Training Using Hybrid Classical-Quantum Computing System
Issued Nov 2, 2021 | US 11164099

Simulating Quantum Circuits on a Computer Using Hierarchical Storage
Issued Aug 24, 2021 | US 11100417

Quantum Circuit Decomposition by Integer Programming
Issued Aug 17, 2021 | US 11093679

Adaptive Error Correction in Quantum Computing
Issued Jun 29, 2021 | US 11048839

Co-Scheduling Quantum Computing Jobs
Issued May 4, 2021 | US 10997519

Predicting Intent of a User from Anomalous Profile Data
Issued Feb 2, 2021 | US 10909152

Cached Result use Through Quantum Gate Rewrite

Issued Jan 26, 2021 | US 10901896


Generating Driving Behavior Models
Issued Jan 26, 2021 | US 10901423

Customizing Responses to Users in Automated Dialogue Systems

Issued Jan 12, 2021 | US 10891956

Selected Publications:

"cuQuantum SDK: A High-Performance Library for Accelerating Quantum Science," Bayraktar, et al. 2023.  arXiv. https://arxiv.org/abs/2308.01999 .  Also published in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), pp. 1050-1061.

“Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19,” Acharya, et al.  2020.  . Journal of Chemical Information and Modeling, Volume 60, Issue 12, 2020.  Also published under the same title at: https://chemrxiv.org/articles/preprint/Supercomputer-Based_Ensemble_Docking_Drug_Discovery_Pipeline_with_Application_to_Covid-19/12725465

"Leveraging Secondary Storage to Simulate Deep 54-qubit Sycamore Circuits,” Edwin Pednault, John A. Gunnels, Giacomo Nannicini, Lior Horesh, and Robert Wisnieff.  arXiv, October 21, 2019 (updated on October 22, 2019).

"Preparation and optimization of a diverse workload for a large-scale heterogeneous system,” Sierra Supercomputing Team, ACM/IEEE Conference for High Performance Computing, Network, Storage, and Analysis (Supercomputing 2019).

“Computing the ankle-brachial index with parallel computational fluid dynamics,” John Gounley, Erik W Draeger, Tomas Oppelstrup, William D Krauss, John A Gunnels, Rafeed Chaudhury, Priya Nair, David Frakes, Jane A Leopold, and Amanda Randles. Journal of Biomechanics.  October 2018.

“Breaking the 49-Qubit Barrier in the Simulation of Quantum Circuits,” Edwin Pednault, John A. Gunnels, Giacomo Nannicini, Lior Horesh, Thomas Magerlein, Edgar Solomonik, Erik W. Draeger, Eric T. Holland, and Robert Wisnieff.  arXiv, 16 Oct 2017 (update 12 Nov 2018).

“A Knowledge and Reasoning Toolkit for Cognitive Applications,” Mustafa Canim, Cristina Cornelio, Robert Farrell, Achille Fokoue, Kyle Gao, John Gunnels, Arun Iyengar, Ryan Musa, Mariano Rodriguez-Muro, Rosario Uceda-Sosa.  HotWeb 2017.  October 2017.

“Massively Parallel First-Principles Simulation of Electron Dynamics in Materials,” Erik Draeger, Xavier Andrade, John Gunnels, Abhinav Bhatele, Andre Schieffe, and Alfredo Correa.  Journal of Parallel and Distributed Computing.  Volume 106, pp. 204-214.  August 2017.

“Parallel Deep Neural Network Training for Big Data on Blue Gene/Q,” I. Chung, T. N. Sainath, B. Ramabhadran, M. Picheny, J. Gunnels, V. Austel, U. Chaudhari and B. Kingsbury, IEEE Transactions on Parallel and Distributed Systems, Volume 28, Issue 6, pp. 1703-1714.  June 2017.

“Massively Parallel First-Principles Simulation of Electron Dynamics in Materials,” Erik Draeger, Xavier Andrade, John Gunnels, Abhinav Bhatele, Andre Schieffe, and Alfredo Correa. International Parallel and Distributed Processing Symposium (IPDPS), 2016. Best Paper Award

“An Early Performance Study of Large-Scale Power8 SMP Systems,” Xing Liu, Daniele Buono, Fabio Checconi, Jee W Choi, Xinyu Que, Fabrizio Petrini, John Gunnels, and Jeff Stuecheli. International Parallel and Distributed Processing Symposium (IPDPS), 2016.

"The BLIS Framework: Experiments in Portability." Field G. Van Zee, Tyler Smith, Bryan Marker, Tze Meng Low, Robert A. van de Geijn, Francisco Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler, Vernon Austel, John A. Gunnels, Lee Killough. ACM Transactions on Mathematical Software (TOMS), 42(2):12:1-12:19, 2016. SIAM  Supercomputing Group Best Paper Prize for 2020

“Massively Parallel Models of the Human Circulatory System,” Amanda Randles, Erik Draeger, Tomas Oppelstrup, Liam Krauss, and John A. Gunnels. ACM/IEEE Conference for High Performance Computing, Network, Storage, and Analysis (Supercomputing 2015). Gordon Bell Award Finalist

“Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics,” Daniele Buono, John A. Gunnels, Xinyu Que, Fabio Checconi, and Fabrizio Petrini. IEEE Computer, Volume 48, Issue 8, pp. 26-34. August, 2015.

“Scalable Community Detection with the Louvain Algorithm,” Xinyu Que, Fabio Checconi, Fabrizio Petrini, and John A. Gunnels. International Parallel and Distributed Processing Symposium (IPDPS), 2015.

“Active Memory Cube: A Processing-in-Memory Architecture for Exascale Systems,” IBM AMC Team. IBM Journal of Research and Development. Volume 59, Issue 2/3. 2015.

“Parallel Deep Neural Network Training for Big Data on Blue Gene/Q,” I. Chung, T. N. Sainath, B. Ramabhadran, M. Picheny, J. Gunnels, V. Austel, U. Chaudhari and B. Kingsbury, Supercomputing 2014, November 2014.

“Parallel Deep Neural Network Training for LVCSR on Blue Gene/Q,” T. N. Sainath, I. Chung, B. Ramabhadran, M. Picheny, J. Gunnels, B. Kingsbury, G. Saon, V. Austel and U. Chaudhari. In Proceedings of Interspeech, September 2014.

“Low Power Massively Parallel Energy Efficient Supercomputer,” Blue Gene Team. Green Computing: Large-Scale Energy Efficiency, Randy Cohen, Ed. 2014. [Book Chapter]

“Tends and Outlook for the Massive-Scale Analytics Stack,” Amol Ghoting, John A. Gunnels, Prabhanjan Kambadur, Edwin Pednault, and Mark Squillante. IBM Journal of Research and Development. Volume 57, Number 3/4. 2013.

“Towards Real-Time Simulation of Cardiac Electrophysiology in a Human Heart at High Resolution,” David F Richards, James N Glosli, Erik W Draeger, Arthur A Mirin, Bor Chan, Jean-Luc Fattebert, William D Krauss, Tomas Oppelstrup, Chris J Butler, John A Gunnels, Viatcheslav Gurev, Changhoan Kim, John Magerlein, Matthias Reumann, Hui-Fang Wen, John Jeremy Rice. Computer Methods in Biomechanics and Biomedical Engineering. 06/2013.

“Science at LLNL with IBM Blue Gene/Q,” LLNL and IBM Blue Gene/Q Teams. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Design for Low Power and Power Management in IBM Blue Gene/Q,” K. Sugavanam, C.-Y. Cher, J. A. Gunnels, R. A. Haring, P. Heidelberger, H. M. Jacobson, M. K. McManus, D. P. Paulsen, D. L. Satterfield, Y. Sugawara, and R. Walkup. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Design of the Blue Gene/Q Compute Chip,” Blue Gene/Q Team. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Modeling, Validation, and Co-Design of IBM Blue Gene/Q: Tools and Examples,” Blue Gene/Q Team. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“The IBM Blue Gene Project,” Blue Gene Team. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Blue Gene/Q: Sequoia and Mira,” Blue Gene Team. Contemporary High Performance Computing: From Petascale toward Exascale. Chapman and Hall/CRC. Jeffrey Vetter, Ed. April, 2013. [Book Chapter]

“Toward Real-Time Modeling of Human Heart Ventricles at Cellular Resolution: Simulation of Drug-Induced Arrhythmias,” Arthur A. Mirin, David F. Richards, James N. Glosli, Erik W. Draeger, Bor Chan, Jean-luc Fattebert, William D. Krauss, Tomas Oppelstrup, John Jeremy Rice, John A. Gunnels, Viatcheslav Gurev, Changhoan Kim, John Magerlein, Matthias Reumann, Hui-Fang Wen. Supercomputing 2012. Gordon Bell Award Finalist

“Deriving Dense Linear Algebra Libraries,” Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Ortí, Tyler Rhodes, Robert A. van de Geijn, and Field G. Van Zee. Formal Aspects of Computing. January 2012.

“Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 processor,” T. Malas, A. Ahmadia, J. Brown, J. Gunnels, and D. Keyes. International Journal of High Performance Computing Applications. May 2012.

"PLAPACK," John A. Gunnels. Encyclopedia of Parallel Computing. David Padua, Ed. September, 2011. [Book Chapter]

“Massive Scale Analytics,” Mark Squillante, Amol Ghoting, and John Gunnels. Encyclopedia of Parallel Computing. David Padua, Ed. September, 2011. [Book Chapter]

“Efficient High-precision Dense Matrix Algebra on Parallel Architectures for Nonlinear Discrete Optimization,” J. Gunnels, J. Lee, S. Margulies. Mathematical Programming Computation, 2(2), pg. 103-124, 2010.

“Architecture of the Component Collective Messaging Interface,” Sameer Kumar, Ahmad Faraj, Amith R Mamidala, Brian Smith, Gabor Dozsa, Jeremy Berg, Bob Cernohous, John Gunnels, Douglas Miller, Joseph Ratterman, Philip Heidelberger. 2010 International Journal of High Performance Computing Applications (pp. 16-33).

“Beyond Homogeneous Decomposition: Scaling Long-Range Forces on Massively Parallel Systems.” D. F. Richards, J. N. Glosli, B. Chan, M. R. Dorr, E. W. Draeger, J.-L. Fattebert, W. D. Krauss, T. Spelce, F. H. Streitz, M. P. Surh, and J. A. Gunnels. Supercomputing 2009. Gordon Bell Award Finalist

“MPI Collective Communications on The Blue Gene/P Supercomputer, Algorithms and Optimizations,” Ahmad Faraj, Sameer Kumar, Brian Smith, Amith Mamidala, John Gunnels. IEEE's 17th Hot Interconnects 2009.

“Petascale Computing with Accelerators.” Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. Principles and Practices of Parallel Programming (PPoPP 2009).

“Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor,” Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. Special Issue of Scientific Programming. Accepted. Volume 17, Issue 1-2 (January 2009).

“Programming the Linpack Benchmark for Roadrunner,” Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. IBM Journal of Research and Development. Volume 53, Number 5. 2009.

“Overview of the Blue Gene/P Project,” Blue Gene/P Team. IBM Journal of Research and Development. Volume 52, 1/2, pp. 199-220. 2008.

“Fine grained parallelization of the Car-Parrinello ab initio MD method on Blue Gene/L,” Eric Bohm, Abhinav Bhatele, Laxmikant V. Kale, Mark E. Tuckerman, Sameer Kumar, John A. Gunnels, and Glenn J. Martyna. IBM Journal of Research and Development. 52, ½, pp. 159-176. 2008.

“Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer.” Yogish Sahbharwal, Saurabh K. Garg, Rahul Garg, John A. Gunnels, and Ramendra K. Sahoo. International Conference on High Performance Computing (HiPC) 2008.

“Optimization of BLAS on the Cell Processor,” Vaibhav Saxena, Prashant Agrawal, Yogish Sabharwal,Vijay K. Garg, Vimitha Kuruvilla, and John A. Gunnels. International Conference on High Performance Computing (HiPC) 2008.

“Extending Stability Beyond CPU Millennium: A Micron-Scale Atomistic Simulation of Kelvin-Helmholtz Instability,” J.N. Glosli, K.J. Caspersen, J.A. Gunnels, D.F. Richards, R.E. Rudd, and F.H. Streitz. Supercomputing 2007. Gordon Bell Award Finalist Gordon Bell Award Winner

“An Experimental Comparison of Cache-oblivious and Cache-aware Programs,” Kamen Yotov, Thomas Roeder, Keshav Pingali, John Gunnels, and Fred Gustavson. 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’07).

“Large Scale Drop Impact Analysis of a Mobile Phone on Blue Gene/L: Introduction to a Work Selected as the Finalist of Gordon Bell Prize,” Hiroshi Akiba, Shinobu Yoshimura, Hirohisa Noguchi, John A Gunnels, and Yogish Sabharwal. The Japan Society for Industrial and Applied Mathematics (JSIAM). Appeared 10/2007.

“Large-Scale Electronic Structure Calculations of High-Z Metals on the BlueGene/L Platform,” Francois Gygi, Erik W. Draeger, Martin Schulz, Bronis R. De Supinski (LLNL), John A. Gunnels, Vernon Austel, James C. Sexton (IBM), Franz Franchetti, Stefan Kral, Christoph Ueberhuber, Juergen Lorenz (U Vienna), Supercomputing 2006. Gordon Bell Award Finalist. Gordon Bell Award Finalist Gordon Bell Award Winner

“Large Scale Drop Impact Analysis of Mobile Phone Using ADVC on Blue Gene/L,” Hiroshi Akiba, Tomonobu Ohyama, Yoshinori Shibata, Kiyoshi Yuyama, Yoshikazu Katai, Ryuichi Takeuchi, Takeshi Hoshino, Shinobu Yoshimura, Hirohisa Noguchi, Manish Gupta, John Gunnels, Vernon Austel, Yogish Sabharwal, Rahul Garg, Shoji Kato, Takashi Kawakami, Satoru Todokoro, Junko Ikeda. Supercomputing 2006. Gordon Bell Award Finalist

“Is Cache-Oblivious DGEMM Viable?,” John A. Gunnels, Fred G. Gustavson, Keshav Pingali, Kamen Yotov. PARA'06: State-of-the-Art in Scientific Computing, 2006, Umea, Sweden.

“Minimal Data Copy for Dense Linear Algebra Factorization,” Fred G. Gustavson, John A. Gunnels, James C. Sexton. PARA'06: State-of-the-Art in Scientific Computing, 2006, Umea, Sweden.

“100+ TFlop Solidification Simulations on BlueGene/L,” Frederick H. Streitz, James N. Glosli, Mehul V. Patel, Bor Chan, Robert K. Yates, Bronis R. de Supinski (Lawrence Livermore National Laboratory), James Sexton, John A. Gunnels (IBM). Supercomputing 2005. Gordon Bell Award Finalist Gordon Bell Award Winner

“Large-Scale First-Principles Molecular Dynamics Simulations on the BlueGene/L Platform using the Qbox Code,” F. Gygi, E. Draeger, B. R. de Supinski, R. K. Yates, F. Franchetti, S. Kral, J. Lorenz, C. W. Ueberhuber, J. Gunnels, J. Sexton. Supercomputing 2005. Gordon Bell Award Finalist

“Early Experience with Scientific Applications on the Blue Gene/L Supercomputer,” George Almasi, Gyan Bhanot, Dong Chen, Maria Eleftheriou, Blake Fitch, Alan Gara, Robert Germain, John Gunnels, Manish Gupta, Philip Heidelberger, Mike Pitman, Aleksandr Rayshubskiy, James Sexton, Frank Suits, Pavlos Vranas, Bob Walkup, Chris Ward, Yuriy Zhestkov, Alessandro Curioni, Wanda Andreoni, Charles Archer, Jose Moreira, Richard Loft, Henry Tufo, Theron Voran, and Katherine Riley. Europar 2005.

"The Science of Deriving Dense Linear Algebra Algorithms", Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Orti, and Robert A. van de Geijn. ACM Transactions on Mathematical Software (TOMS) 31(1):1-26 (March 2005).

“A Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” B.S. Andersen, J.A. Gunnels, F.G. Gustavson, J.K. Reid, and J. Wasniewski. TOMS 31(2): 201-227 (2005).

“BlueGene/L Performance Tools,” Xavier Martorell, Nils Smeds, Bob Walkup, Jose R. Brunheroto, George Almasi, John Gunnels, Luiz DeRose, Jesus Labarta, Francesc Escale, Judit Gimenez, Harald Servat, and Jose E. Moreira. . IBM Journal of Research and Development, 49, 2/3, pp. 407-424. 2005.

“Design and implementation of message passing services for the Blue Gene/L supercomputer,” G. Almási, C. Archer, J. G. Castaños, C. C. Erway, J. A. Gunnels, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, B. Toonen. IBM Journal of Research and Development, 49, 2/3, pp. 393–406. 2005.

“Exploiting the Floating Point and Memory Subsystems on the Blue Gene/L Node: Architecture, Compilers, and Algorithm Design,” G. Almasi, L. R. Bachega, L. H. Ceze, S. Chatterjee, K. A. Dockser, J. A. Gunnels, M. Gupta, F. G. Gustavson, D. Hoenicke, C. A. Lapkowski, G. K. Liu, M. P. Mendell, M. Ohmacht, K. Strauss, C. D. Wait, and T.J. C. Ward. IBM Journal of Research and Development, 49, 2/3, pp. 377-392. 2005.

“Unlocking the Performance of the BlueGene/L Supercomputer.” George Almasi, Siddhartha Chatterjee Alan Gara, John Gunnels, Manish Gupta, Amy Henning, Jose Moreira, Bob Walkup (IBM Thomas J. Watson Research Center), Alessandro Curioni (IBM Zurich Research Laboratory), Charles Archer (IBM Systems and Technology Group), Leonardo Bachega (LARC - University of Sao Paulo), Bor Chan, Bruce Curtis (Lawrence Livermore National Laboratory), Maciej Brodowicz, Sharon Brunett, Ed Upchurch (Caltech), Giri Chukkapalli, Robert Harkness, Wayne Pfeiffer (San Diego Supercomputer Center). Supercomputing 2004.

“A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design.” Parallel Architecture and Compilation Techniques, 13th International Conference on (PACT’04) September 29 - October 03, 2004 Antibes Juan-les-Pins, France, Leonardo Bachega, Siddhartha Chatterjee, Kenneth A. Dockser, John A. Gunnels, Manish Gupta, Fred G. Gustavson, Christopher A. Lapkowski, Gary K. Liu, Mark P. Mendell, Charles D. Wait, T. J. Chris Ward. 2004, pp. 85–96.

“Architecture and Performance of the BlueGene/L Message Layer,” George Almasi, Charles Archer, John Gunnels, Philip Heidelberger, Xavier Martorell, Jose E. Moreira. DAPSYS 2004: 5th Austrian-Hungarian Workshop on Distributed and Parallel Systems in conjunction with EuroPVM/MPI 2004.

“BlueGene/L Supercomputer,” (BG/L Team) Future Directions in IC and Package Design Workshop, 2003. Invited paper.

“An Overview of the BlueGene/L Supercomputer,” (with The Blue Gene/L Team), IEEE Supercomputing 2002.  Supercomputing Test of Time Award, Supercomputing 2020

“The Science of Programming High-Performance Linear Algebra Libraries,” (with Paolo Bientinese, Fred G. Gustavson, Greg M. Henry, Margaret E. Myers, Enrique S. Quintana-Orti, and Robert A. van de Geijn), Proceedings of Performance Optimization for High-Level Languages and Libraries (POHLL-02) , a workshop in conjunction with the 16th Annual ACM International Conference on Supercomputing (ICS'02), June 21, 2002.

“FLAME: Formal Linear Algebra Methods Environment,” (with Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn), TOMS, 27(4):422-455, December 2001.

“A Family of High-Performance Matrix Algorithms,” (with Greg M. Henry and Robert A. van de Geijn), Computational Science 2001 Part I, Lecture Notes in Computer Science 2073, pp. 51-60, Springer, 2001.

“Fault-Tolerant High-Performance Matrix-Matrix Multiplication: Theory and Practice,” (with Daniel S. Katz, Enrique S. Quintana-Orti, and Robert van de Geijn), The International Conference for Dependable Systems and Networks (DSN-2001), pp. 47-56, July, 2001.

“Formal Methods for High-Performance Linear Algebra Libraries,” (with Robert A. van de Geijn), The Architecture of Scientific Software, (R. F. Boisvert and P. T. Tang, editors), pp. 193-210, Kluwer Academic Press, 2001.

Using PLAPACK: Parallel Linear Algebra Package, (Robert A. van de Geijn) MIT Press, Spring 1997. Co-author of Chapters 2, 6-8. [Text Book]