Massively Parallel Modeling and Simulation of Next Generation Hybrid Neuromorphic Supercomputer Systems

In recent years, a new type of processor technology has emerged called neuromorphic computing. This new class of processor provides a brain-like computational model that enables complex neural network computations to be done using significantly less power than current processors. For example, IBM has created an instance of the TrueNorth architecture that has 5.4 billion transistors that are arranged into 4096 neurosynaptic cores with a total of 1 million spiking neurons and 256 million reconfigurable synapses. This architecture consumes only 63 milliwatts when executing a multi-object detection and classification program using real-time video input (30 fps) for 400x240 pixel-frame images.

This is particularly interesting as next generation HPC systems are about to experience a radical shift in their design and implementation. The current configuration of leadership class supercomputers provides much greater offnodeparallelism than on-node parallelism. For example, the 20 PF “Sequoia” Blue Gene/Q supercomputer located at LLNL has over 98K compute nodes but each compute node provides at most 64 threads of execution. However, in order to reach exascale compute capabilities, a next generation system must be 50x more power efficient. This dominating demand for power efficiency is resulting in future designs that dramatically decrease the number of compute nodes while increasing the computational power and number of processing cores. For example, a recent NASA vision report [56] predicts that exascale class supercomputers in the 2030 time frame will have only 20,000 compute nodes and the number of parallel streams per node will rise to nearly 16,000.

To meet the computational demands of these future designs, it has become a widely held view that on-node accelerator processors in close coordination with multi-core CPUs will play an important role in compute-node designs [56]. These accelerators are currently used in two forms. The first are graphical processing units (GPUs) that offer a wide, single-instruction-multiple-data (SIMD) approach to parallelism that matches the execution paradigm of graphics applications. GPUs offer a massive amount of numerical compute power at a very affordable price. For example, the NVIDIA Titan Black GPU sells for less than $1,000 (see: newegg.com) and provides a peak performance of over 5.7 single precision and 1.8 double precision teraFLOPS using only 250 watts of power or 22.5 and 7.5 gigaFLOPS per watt of power respectively). The second form of compute node accelerators is a mesh processor architecture such as the Intel Phi. Here, a collection of lower clock-rate x86 cores are interconnected over a on-chip mesh network. This current version provides 1.2 teraFLOPS of compute power, consumes 270 watts of power and costs over $4,000.00. From a pure FLOP and cost prospective, the GPU is winning but from a programming point of view, the Intel Phi provides a much cleaner path to executing legacy HPC codes. In terms of real leadership class supercomputer systems, the GPU approach is used in the DOE Titan system which is currently ranked #2 in the world at 17.6 petaFLOPS and the mesh processor architecture is used in the Tianhe-2 system which is currently ranked #1 in the world at 33.9 petaFLOPS (see top500.org).

Given the advent of neuromorphic computing, the question we will address in this work is how might a neuromorphic “accelerator” processor be used to improve the application performance, power consumption and overall system reliability of future exascale systems . This systems design question is driven by the recent DOE SEAB report on high-performance computing [35] which highlights the neuromorphic architecture as a key technology in future supercomputer systems especially in addressing the needs of next generation, large-scale data processing and is one that “is an emergent area for exploitation”. We will also explore the embedding of neuromorphic processorsinto larger systems.

Project Objectives

To tackle this overarching research question, we propose the following five major research thrusts that lead to an end-to-end modeling and simulation capability that allows a “what-if” platform for asking metric-driven questions about the capabilities of potential hybrid HPC-neuromorphic supercomputer system designs. These core research thrusts (CRTs) include:

  1. Design and implementation of discrete-event neuromorphic processor models that can be efficiently executed on a massively parallel supercomputer.
  2. Design and implementation of integrated perceptual and mined data processing algorithms that can execute on a prototype neural network architecture / simulated neuromorphic chip. Potential scenarios inclu de: (i) streaming failure sensor data coupled with HPC application hardware performance data that will provide a self-aware capability and improve the system’s detection and resilence to component failures; (ii) mining of performance pattern data from live running HPC applications that may help improve application execution time and lower overall power consumption.
  3. Design and implementation of hybrid CPU, GPU, neuromorphic node models that can be efficiently executed on a massively parallel supercomputer.
  4. Design and implementation of HPC network models to access their ability to meet the latency and bandwidth demands of hybrid CPU, GPU and neuromorphic compute nodes.
  5. Design and implementation of a live demonstration experiment that links the “AMOS” Blue Gene/Q supercomputer located at the CCI with a live field experiment running at AFRL. This demonstration would connect a massively parallel ROSS simulation with the live field experiment via the DREN/NYSERNet highperformance network links that is available at both research sites.

To provide this end-to-end system simulation capability, these models will be implemented on top of the massively parallel ROSS simulation engine. Consequently, we will leverage existing investments in supercomputer systems (both at Rensselaer and AFRL) to create a high-fidelity, hybrid-architecture, co-design analysis tool that will enable the understanding of key architectural trade-offs. Currently, no such capability exists. This proposal represents a three-year collaboration between Rensselaer’s Center for Computational Innovations which houses our 5-rack Blue Gene/Q “AMOS” supercomputer system and the newly formed Institute for the Exploration of Data and Applications (IDEA). IDEA’s mission is to serve as an enabler for research across Rensselaer via the development of critical computational methodologies including data-intensive supercomputing, large-scale agent-based simulation, and cognitive computing technologies.