SPIRAL: AI for High Performance Code
Abstract:
This talk provides a current and comprehensive overview of the SPIRAL system, that has been developed over 20 years at Carnegie Mellon University, and is now available as BSD Open Source System. We show that SPIRAL is a rule based AI system that captures the knowledge of how algorithms, computer architecture, and program transformations are defined and interact. We develop the underlying formal framework to capture computational algorithms, computing platforms, and program transformations of interest, using a unifying mathematical formalism we call operator language (OL). Then we cast the problem of synthesizing highly optimized computational kernels for a given machine as a strongly constrained optimization problem that is solved by a multi-stage rewriting system. Since all rewrite steps are semantics preserving identity operations, our approach allows us to formally prove the equivalence between the kernel specification and the synthesized program. Finally we present a first look at FFTX and SpectralPack. We aim at translating the LAPACK/BLAS approach from the numerical linear algebra world to the N log N/spectral algorithm domain.
Bio:
Franz Franchetti is Professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received the Dipl.-Ing. (M.Sc.) degree in Technical Mathematics and the Dr. techn. (Ph.D.) degree in Computational Mathematics from the Vienna University of Technology in 2000 and 2003, respectively. Dr. Franchetti's research focuses on automatic performance tuning and program generation for emerging parallel platforms and algorithm/hardware co-synthesis. Within the Spiral effort, his research goal is to enable automatic generation of highly optimized software libraries for important kernel functionality.
Summary:
Evolution of our tools
Mathematical algorithms evolve slowly (over centuries)
Computer hardware evolves quickly (over few years)
Programming languages and software are in the middle
Evolve over decades
Programmers don’t want to rewrite their applications to run well on new hardware
SPIRAL:
Given
Mathematical description of an algorithm
Target hardware platform
Derive algorithm’s implementation on the platform
As efficient as the best hand-optimized implementation
And provably correct
Traditional
Manual optimization
Supported by compilers
Very general tools
Often don’t work that well at optimizing any particular algorithms
Specifying the mathematical algorithms in SPIRAL
Vector->vector operators
Stateless
Potentially non-linear
Higher-dimensional data is linearized
Language: math at the level of linear algebra & calculus
Tensorflow
A more general algorithm specification
The cost of generality is that TF system can do fewer optimizations but is more flexible
SPIRAL needs to know everything about the computation and is thus able to optimize more effectively
Representations:
Higher-level: computer algebra language
Lower-level: functional language (like OCaml) with map, zip, fold, etc. operations
Design:
Specify high-level mathematical operations and how they map to low-level implementations
Specification is recursive (break down larger data units into smaller ones)
Compiling the high-level specifications into implementation requires applying rules many times until a good implementation is found
Challenge: search algorithm to find good sequence of compilation rules to find good implementation
Design is a tradeoff between space of possible computer architectures and space of possible algorithms
Tensor algebra is the language for representing operations on data
Tensors can be mapped onto a wide range of computer architectures
SPIRAL includes many different domain theories for compiling mathematical specifications
Linear algebra
Software defined ratio
Partial differential equations (for HPC)
Synthetic Aperture Radar
Graph computations
Formal approach for all types of parallelism
Multi-threading
Vector SIMD
Message passing
Streaming
GPUs
FPGAs
Compilation process is a join between the hardware constraints and infinite space of possible optimizations, reducing the space of implementations to something that can be searched
They generate concrete implementations
Measure their performance
Keep searching in promising regions of implementation space
Making SPIRAL easy to use:
Let programmers write their algorithm using special SPIRAL API
Run the code, observe its execution trace
SPIRAL now optimizes that algorithm so that it runs efficiently on any platform