John A. Gunnels
Distinguished Research Staff Member, Program Director, and Manager
IBM Research
IBM T.J. Watson Research Center, Yorktown Heights, NY
https://www.linkedin.com/in/johnagunnels
john.a.gunnels@gmail.com
(914) 224-4381




Summary

John joined IBM Research in 2001 after receiving his Ph.D. from the University of Texas at Austin and is currently the Program Director of the Computational Sciences Center and a Distinguished Research Staff Member. Currently, he is a member of the Data Centric Computing group at the Thomas J. Watson Research Center and the manager of the Applications and Workflows Analysis team. John has over eight years of experience as a manager and senior manager, over fifteen years of experience in industry working with cutting-edge technology in a research and development setting. His primary focus is on parallel programming and parallel system development, is a member of the IBM Academy of Technology, a past Master Inventor, and has received six Outstanding Technical Achievement Awards, an IBM Corporate Award, The Gerstner Award for Client Excellence, and three Gordon Bell Awards. His research interests include analytics, big data, parallel algorithms and programming, code verification, high performance library specification and development, large scale scientific computing, the development and optimization of large-scale applications related to graph theory and machine learning algorithms, and the effective use of hardware accelerators. He currently holds 29 US patents related to these and other areas of research.


Professional Experience

Distinguished Research Staff Member, Program Director, and Manager, IBM T.J. Watson Research Center, Data Centric Systems, Yorktown Heights, NY (Feb. 2015 – present).

Distinguished Research Staff Member and Manager, IBM T.J. Watson Research Center, Business Analytics and Mathematical Sciences, Yorktown Heights, NY (Sept. 2007 – Feb. 2015).

Research Staff Member, IBM T.J. Watson Research Center, Business Analytics and Mathematical Sciences, Yorktown Heights, NY (Oct. 2002 – Sept. 2007).

Postdoctoral Researcher, IBM T.J. Watson Research Center, Mathematical Sciences Department, Yorktown Heights, NY (Oct. 2001-Sept. 2002).


Education


Ph.D. in Computer Science, The University of Texas at Austin, 2001.


Honors and Awards


IBM

Six Outstanding Technical Achievement Awards
Member of the IBM Academy of Technology
Master Inventor
Eighth Patent Plateau
Corporate Award
Five-time Corporate Technical Recognition Event (CTRE) Invitee
Gerstner Award for Client Excellence

External

Best Paper Award (IPDPS 2016)
Lawrence Livermore National Laboratory Director’s Science & Technology Award
Three-time Gordon Bell Award Winner
Eight-time Gordon Bell Award Finalist
Twenty-Nine US Patents Issued
Team Lead for Three First Place HPC Challenge Class 1 Entries

Academic

IBM Microelectronics and Computer Development (MCD) Fellowship
Pacific Northwest National Laboratories Doctoral Fellowship
Intel Doctoral Fellowship

Research Interests

The design and analysis of parallel algorithms
Automated algorithm and code generation, hybridization, and analysis
Application Programming Interface design
Compiler verification
Benchmarking metrics and implementations
Graphics processors (and other processing units) as general purpose processors
Program visualization


Skills

Programming Languages: C, C++, Mathematica, Fortran, Java, Scheme, Matlab, LISP, Pascal, Assembly (various), Prolog

Operating Systems: Windows, AIX, Linux, BLRTS, Solaris, UNIX

Programs/Packages: MPI, OpenMP/Pthreads, ANTLR/Sorcerer, BLAS, CUDA,
LAPACK/ScaLAPACK, Eclipse, Visual Studio, GNU (gcc/g77), VisualAge


Selected Publications

“Massively Parallel First-Principles Simulation of Electron Dynamics in Materials,” Erik Draeger, Xavier Andrade, John Gunnels, Abhinav Bhatele, Andre Schieffe, and Alfredo Correa. International Parallel and Distributed Processing Symposium (IPDPS), 2016. [Best Paper Award, To Appear]

“An Early Performance Study of Large-Scale Power8 SMP Systems,” Xing Liu, Daniele Buono, Fabio Checconi, Jee W Choi, Xinyu Que, Fabrizio Petrini, John Gunnels, and Jeff Stuecheli. International Parallel and Distributed Processing Symposium (IPDPS), 2016. [To Appear]

“Massively Parallel Models of the Human Circulatory System,” Amanda Randles, Erik Draeger, Tomas Oppelstrup, Liam Krauss, and John A. Gunnels. ACM/IEEE Conference for High Performance Computing, Network, Storage, and Analysis (Supercomputing 2015). Gordon Bell Award Prize Finalist.

“Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics,” Daniele Buono, John A. Gunnels, Xinyu Que, Fabio Checconi, and Fabrizio Petrini. IEEE Computer, Volume 48, Issue 8, pp. 26-34. August, 2015.

“Scalable Community Detection with the Louvain Algorithm,” Xinyu Que, Fabio Checconi, Fabrizio Petrini, and John A. Gunnels. International Parallel and Distributed Processing Symposium (IPDPS), 2015.

“Active Memory Cube: A Processing-in-Memory Architecture for Exascale Systems,” IBM AMC Team. IBM Journal of Research and Development. Volume 59, Issue 2/3. 2015.

“Parallel Deep Neural Network Training for Big Data on Blue Gene/Q,” I. Chung, T. N. Sainath, B. Ramabhadran, M. Picheny, J. Gunnels, V. Austel, U. Chaudhari and B. Kingsbury, Supercomputing 2014, November 2014.

“Parallel Deep Neural Network Training for LVCSR on Blue Gene/Q,” T. N. Sainath, I. Chung, B. Ramabhadran, M. Picheny, J. Gunnels, B. Kingsbury, G. Saon, V. Austel and U. Chaudhari. In Proceedings of Interspeech, September 2014.

“Low Power Massively Parallel Energy Efficient Supercomputer,” Blue Gene Team. Green Computing: Large-Scale Energy Efficiency, Randy Cohen, Ed. 2014. [Book Chapter]

“The BLIS Framework: Experiments in Portability,” Field Van Zee, Tyler Smith, Bryan Marker, Tze Meng Low, Robert A. van de Geijn, Francisco Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler, Vernon Austel, John A. Gunnels, and Lee Killough. ACM Transactions on Mathematical Software (TOMS), 2013.

“Tends and Outlook for the Massive-Scale Analytics Stack,” Amol Ghoting, John A. Gunnels, Prabhanjan Kambadur, Edwin Pednault, and Mark Squillante. IBM Journal of Research and Development. Volume 57, Number 3/4. 2013.

“Towards Real-Time Simulation of Cardiac Electrophysiology in a Human Heart at High Resolution,” David F Richards, James N Glosli, Erik W Draeger, Arthur A Mirin, Bor Chan, Jean-Luc Fattebert, William D Krauss, Tomas Oppelstrup, Chris J Butler, John A Gunnels, Viatcheslav Gurev, Changhoan Kim, John Magerlein, Matthias Reumann, Hui-Fang Wen, John Jeremy Rice. Computer Methods in Biomechanics and Biomedical Engineering. 06/2013.

“Science at LLNL with IBM Blue Gene/Q,” LLNL and IBM Blue Gene/Q Teams. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Design for Low Power and Power Management in IBM Blue Gene/Q,” K. Sugavanam, C.-Y. Cher, J. A. Gunnels, R. A. Haring, P. Heidelberger, H. M. Jacobson, M. K. McManus, D. P. Paulsen, D. L. Satterfield, Y. Sugawara, and R. Walkup. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Design of the Blue Gene/Q Compute Chip,” Blue Gene/Q Team. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Modeling, Validation, and Co-Design of IBM Blue Gene/Q: Tools and Examples,” Blue Gene/Q Team. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“The IBM Blue Gene Project,” Blue Gene Team. IBM Journal of Research and Development. Volume 57, Number 1/2. 2013.

“Blue Gene/Q: Sequoia and Mira,” Blue Gene Team. Contemporary High Performance Computing: From Petascale toward Exascale. Chapman and Hall/CRC. Jeffrey Vetter, Ed. April, 2013. [Book Chapter]

“Toward Real-Time Modeling of Human Heart Ventricles at Cellular Resolution: Simulation of Drug-Induced Arrhythmias,” Arthur A. Mirin, David F. Richards, James N. Glosli, Erik W. Draeger, Bor Chan, Jean-luc Fattebert, William D. Krauss, Tomas Oppelstrup, John Jeremy Rice, John A. Gunnels, Viatcheslav Gurev, Changhoan Kim, John Magerlein, Matthias Reumann, Hui-Fang Wen. Supercomputing 2012. Gordon Bell Award Finalist.

“Deriving Dense Linear Algebra Libraries,” Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Ortí, Tyler Rhodes, Robert A. van de Geijn, and Field G. Van Zee. Formal Aspects of Computing. January 2012.

“Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 processor,” T. Malas, A. Ahmadia, J. Brown, J. Gunnels, and D. Keyes. International Journal of High Performance Computing Applications. May 2012.

"PLAPACK," John A. Gunnels. Encyclopedia of Parallel Computing. David Padua, Ed. September, 2011. [Book Chapter]

“Massive Scale Analytics,” Mark Squillante, Amol Ghoting, and John Gunnels. Encyclopedia of Parallel Computing. David Padua, Ed. September, 2011. [Book Chapter]

“Efficient High-precision Dense Matrix Algebra on Parallel Architectures for Nonlinear Discrete Optimization,” J. Gunnels, J. Lee, S. Margulies. Mathematical Programming Computation, 2(2), pg. 103-124, 2010.

“Architecture of the Component Collective Messaging Interface,” Sameer Kumar, Ahmad Faraj, Amith R Mamidala, Brian Smith, Gabor Dozsa, Jeremy Berg, Bob Cernohous, John Gunnels, Douglas Miller, Joseph Ratterman, Philip Heidelberger. 2010 International Journal of High Performance Computing Applications (pp. 16-33).

“Beyond Homogeneous Decomposition: Scaling Long-Range Forces on Massively Parallel Systems.” D. F. Richards, J. N. Glosli, B. Chan, M. R. Dorr, E. W. Draeger, J.-L. Fattebert, W. D. Krauss, T. Spelce, F. H. Streitz, M. P. Surh, and J. A. Gunnels. Supercomputing 2009. Gordon Bell Award Finalist.

“MPI Collective Communications on The Blue Gene/P Supercomputer, Algorithms and Optimizations,” Ahmad Faraj, Sameer Kumar, Brian Smith, Amith Mamidala, John Gunnels. IEEE's 17th Hot Interconnects 2009.

“Petascale Computing with Accelerators.” Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. Principles and Practices of Parallel Programming (PPoPP 2009).

“Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor,” Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. Special Issue of Scientific Programming. Accepted. Volume 17, Issue 1-2 (January 2009).

“Programming the Linpack Benchmark for Roadrunner,” Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. IBM Journal of Research and Development. Volume 53, Number 5. 2009.

“Overview of the Blue Gene/P Project,” Blue Gene/P Team. IBM Journal of Research and Development. Volume 52, 1/2, pp. 199-220. 2008.

“Fine grained parallelization of the Car-Parrinello ab initio MD method on Blue Gene/L,” Eric Bohm, Abhinav Bhatele, Laxmikant V. Kale, Mark E. Tuckerman, Sameer Kumar, John A. Gunnels, and Glenn J. Martyna. IBM Journal of Research and Development. 52, ½, pp. 159-176. 2008.

“Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer.” Yogish Sahbharwal, Saurabh K. Garg, Rahul Garg, John A. Gunnels, and Ramendra K. Sahoo. International Conference on High Performance Computing (HiPC) 2008.

“Optimization of BLAS on the Cell Processor,” Vaibhav Saxena, Prashant Agrawal, Yogish Sabharwal,Vijay K. Garg, Vimitha Kuruvilla, and John A. Gunnels. International Conference on High Performance Computing (HiPC) 2008.

“Extending Stability Beyond CPU Millennium: A Micron-Scale Atomistic Simulation of Kelvin-Helmholtz Instability,” J.N. Glosli, K.J. Caspersen, J.A. Gunnels, D.F. Richards, R.E. Rudd, and F.H. Streitz. Supercomputing 2007. Gordon Bell Award Finalist. Gordon Bell Award Winner.

“An Experimental Comparison of Cache-oblivious and Cache-aware Programs,” Kamen Yotov, Thomas Roeder, Keshav Pingali, John Gunnels, and Fred Gustavson. 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’07).

“Large Scale Drop Impact Analysis of a Mobile Phone on Blue Gene/L: Introduction to a Work Selected as the Finalist of Gordon Bell Prize,” Hiroshi Akiba, Shinobu Yoshimura, Hirohisa Noguchi, John A Gunnels, and Yogish Sabharwal. The Japan Society for Industrial and Applied Mathematics (JSIAM). Appeared 10/2007.

“Large-Scale Electronic Structure Calculations of High-Z Metals on the BlueGene/L Platform,” Francois Gygi, Erik W. Draeger, Martin Schulz, Bronis R. De Supinski (LLNL), John A. Gunnels, Vernon Austel, James C. Sexton (IBM), Franz Franchetti, Stefan Kral, Christoph Ueberhuber, Juergen Lorenz (U Vienna), Supercomputing 2006. Gordon Bell Award Finalist. Gordon Bell Award Winner.

“Large Scale Drop Impact Analysis of Mobile Phone Using ADVC on Blue Gene/L,” Hiroshi Akiba, Tomonobu Ohyama, Yoshinori Shibata, Kiyoshi Yuyama, Yoshikazu Katai, Ryuichi Takeuchi, Takeshi Hoshino, Shinobu Yoshimura, Hirohisa Noguchi, Manish Gupta, John Gunnels, Vernon Austel, Yogish Sabharwal, Rahul Garg, Shoji Kato, Takashi Kawakami, Satoru Todokoro, Junko Ikeda. Supercomputing 2006. Gordon Bell Award Finalist.

“Is Cache-Oblivious DGEMM Viable?,” John A. Gunnels, Fred G. Gustavson, Keshav Pingali, Kamen Yotov. PARA'06: State-of-the-Art in Scientific Computing, 2006, Umea, Sweden.

“Minimal Data Copy for Dense Linear Algebra Factorization,” Fred G. Gustavson, John A. Gunnels, James C. Sexton. PARA'06: State-of-the-Art in Scientific Computing, 2006, Umea, Sweden.

“100+ TFlop Solidification Simulations on BlueGene/L,” Frederick H. Streitz, James N. Glosli, Mehul V. Patel, Bor Chan, Robert K. Yates, Bronis R. de Supinski (Lawrence Livermore National Laboratory), James Sexton, John A. Gunnels (IBM). Supercomputing 2005. Gordon Bell Award Finalist. Gordon Bell Award Winner.

“Large-Scale First-Principles Molecular Dynamics Simulations on the BlueGene/L Platform using the Qbox Code,” F. Gygi, E. Draeger, B. R. de Supinski, R. K. Yates, F. Franchetti, S. Kral, J. Lorenz, C. W. Ueberhuber, J. Gunnels, J. Sexton. Supercomputing 2005. Gordon Bell Award Finalist.

“Early Experience with Scientific Applications on the Blue Gene/L Supercomputer,” George Almasi, Gyan Bhanot, Dong Chen, Maria Eleftheriou, Blake Fitch, Alan Gara, Robert Germain, John Gunnels, Manish Gupta, Philip Heidelberg, Mike Pitman, Aleksandr Rayshubskiy, James Sexton, Frank Suits, Pavlos Vranas, Bob Walkup, Chris Ward, Yuriy Zhestkov, Alessandro Curioni, Wanda Andreoni, Charles Archer, Jose Moreira, Richard Loft, Henry Tufo, Theron Voran, and Katherine Riley. Europar 2005.

"The Science of Deriving Dense Linear Algebra Algorithms", Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Orti, and Robert A. van de Geijn. ACM Transactions on Mathematical Software (TOMS) 31(1):1-26 (March 2005).

“A Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” B.S. Andersen, J.A. Gunnels, F.G. Gustavson, J.K. Reid, and J. Wasniewski. TOMS 31(2): 201-227 (2005).

“BlueGene/L Performance Tools,” Xavier Martorell, Nils Smeds, Bob Walkup, Jose R. Brunheroto, George Almasi, John Gunnels, Luiz DeRose, Jesus Labarta, Francesc Escale, Judit Gimenez, Harald Servat, and Jose E. Moreira. . IBM Journal of Research and Development, 49, 2/3, pp. 407-424. 2005.

“Design and implementation of message passing services for the Blue Gene/L supercomputer,” G. Almási, C. Archer, J. G. Castaños, C. C. Erway, J. A. Gunnels, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, B. Toonen. IBM Journal of Research and Development, 49, 2/3, pp. 393–406. 2005.

“Exploiting the Floating Point and Memory Subsystems on the Blue Gene/L Node: Architecture, Compilers, and Algorithm Design,” G. Almasi, L. R. Bachega, L. H. Ceze, S. Chatterjee, K. A. Dockser, J. A. Gunnels, M. Gupta, F. G. Gustavson, D. Hoenicke, C. A. Lapkowski, G. K. Liu, M. P. Mendell, M. Ohmacht, K. Strauss, C. D. Wait, and T.J. C. Ward. IBM Journal of Research and Development, 49, 2/3, pp. 377-392. 2005.

“Unlocking the Performance of the BlueGene/L Supercomputer.” George Almasi, Siddhartha Chatterjee Alan Gara, John Gunnels, Manish Gupta, Amy Henning, Jose Moreira, Bob Walkup (IBM Thomas J. Watson Research Center), Alessandro Curioni (IBM Zurich Research Laboratory), Charles Archer (IBM Systems and Technology Group), Leonardo Bachega (LARC - University of Sao Paulo), Bor Chan, Bruce Curtis (Lawrence Livermore National Laboratory), Maciej Brodowicz, Sharon Brunett, Ed Upchurch (Caltech), Giri Chukkapalli, Robert Harkness, Wayne Pfeiffer (San Diego Supercomputer Center). Supercomputing 2004.

“A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design.” Parallel Architecture and Compilation Techniques, 13th International Conference on (PACT’04) September 29 - October 03, 2004 Antibes Juan-les-Pins, France, Leonardo Bachega, Siddhartha Chatterjee, Kenneth A. Dockser, John A. Gunnels, Manish Gupta, Fred G. Gustavson, Christopher A. Lapkowski, Gary K. Liu, Mark P. Mendell, Charles D. Wait, T. J. Chris Ward. 2004, pp. 85–96.

“Architecture and Performance of the BlueGene/L Message Layer,” George Almasi, Charles Archer, John Gunnels, Philip Heidelberger, Xavier Martorell, Jose E. Moreira. DAPSYS 2004: 5th Austrian-Hungarian Workshop on Distributed and Parallel Systems in conjunction with EuroPVM/MPI 2004.

“BlueGene/L Supercomputer,” (BG/L Team) Future Directions in IC and Package Design Workshop, 2003. Invited paper.

“An Overview of the BlueGene/L Supercomputer,” (with The Blue Gene/L Team), IEEE Supercomputing 2002.

“The Science of Programming High-Performance Linear Algebra Libraries,” (with Paolo Bientinese, Fred G. Gustavson, Greg M. Henry, Margaret E. Myers, Enrique S. Quintana-Orti, and Robert A. van de Geijn), Proceedings of Performance Optimization for High-Level Languages and Libraries (POHLL-02) , a workshop in conjunction with the 16th Annual ACM International Conference on Supercomputing (ICS'02), June 21, 2002.

“FLAME: Formal Linear Algebra Methods Environment,” (with Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn), TOMS, 27(4):422-455, December 2001.

“A Family of High-Performance Matrix Algorithms,” (with Greg M. Henry and Robert A. van de Geijn), Computational Science 2001 Part I, Lecture Notes in Computer Science 2073, pp. 51-60, Springer, 2001.

“Fault-Tolerant High-Performance Matrix-Matrix Multiplication: Theory and Practice,” (with Daniel S. Katz, Enrique S. Quintana-Orti, and Robert van de Geijn), The International Conference for Dependable Systems and Networks (DSN-2001), pp. 47-56, July, 2001.

“Formal Methods for High-Performance Linear Algebra Libraries,” (with Robert A. van de Geijn),

The Architecture of Scientific Software, (R. F. Boisvert and P. T. Tang, editors), pp. 193-210, Kluwer Academic Press, 2001

Using PLAPACK: Parallel Linear Algebra Package, (Robert A. van de Geijn) MIT Press, Spring 1997. Co-author of Chapters 2, 6-8. [Text Book]