Providing Swift's implicitly parallel functional programming model and distributed in-memory file and object exchange for petascale and exascale applications.

The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new problem-solving methods that require the efficient execution of many concurrent and interacting tasks. Methodologies such as rational design (e.g., in materials science), uncertainty quantification (e.g., in engineering), parameter estimation (e.g., for chemical and nuclear potential functions, and in economic energy systems modeling), massive dynamic graph pruning (e.g., in phylogenetic searches), Monte-Carlo-based iterative fixing (e.g., in protein structure prediction), and inverse modeling (e.g., in reservoir simulation) all have these requirements. These many-task applications frequently have aggregate computing needs that demand the fastest computers. For example, proposed next-generation climate model ensemble studies will involve 1,000 or more runs, each requiring 10,000 cores for a week, to characterize model sensitivity to initial condition and parameter uncertainty.

ExM Architecture

ExM architecture - distributed functional dataflow evaluation with RAM-based file and object storage.
Applications shown are SWAT hydrology modeling (top) and ParVis AMWG climate model analysis (bottom).

ExM (for Exascale Many-task) focuses on a novel stack of system services that will enable a broader range of applications to be easily developed and efficiently executed on extreme-scale platforms. The ExM middleware stack will support a highly parallel, functional data model that exposes and automates the many levels of execution needed to efficiently leverage extreme-scale computing systems for complex, many-task applications.

The goal of the ExM project is to achieve the technical advances required to execute such many-task applications efficiently, reliably, and easily on petascale and exascale computers. In this way, we will open up extreme- scale computing to new problem solving methods and application classes.

ExM is a collaborative project, led by Argonne National Laboratory, with University of Chicago and University of British Columbia.

The ExM project is part of the X-Stack program funded by the Department of Energy Office of Science Advanced Scientific Computing Research (ASCR) and benefits from resources acquired through the Department of Energy INCITE Leadership Computing program.

The ExM approach 
  • Distributed, load balanced task manager 
    • Fast lightweight scheduler and resource provisioner: scales to over 1M-task scripts
  • Distributed, in-memory storage system
    • Full POSIX semantics on BG/P compute nodes
  • Scalability 
    • Collective data management provides efficient ways to integrate these layers 
  • Novel programming schemes
    • Functional data flow model with pervasive implicit parallelism
  • Success criteria
    • Quantitative measurement of target science codes on petascale systems and exascale simulations: seeking >1M sustained tasks per second
    • Subjective evaluation of programmability through application user engagement
The ExM software stack contains:
  • A language and runtime, Swift/T, that dynamically creates and executes data-dependent tasks, varying in granularity, on high-component-count platforms;
  • A data model that enables the exchange and processing of data objects that can reside in memory or on mass storage services;
  • A distributed virtual data store that enables the fast exchange and caching of these objects within a distributed namespace. The virtual data store efficiently and productively meets the data interchange needs of the task execution model. 
The ExM services work together to meet the resilience and scalability demands of extreme execution environments. They make object location—“in memory” vs. “on disk”—transparent to the programmer and user. The services are callable from Fortran, C/++, Java, and Python, and thus from evolving high-productivity HPC languages, as well as from compact parallel Swift scripts.

Project Overview Materials
  • Poster (April 2012) (pdf)
  • Handout (April 2012) (pdf)
  • Description (Mar 2011) (pdf)
  • Short Presentation (Mar 2011) (pdf)
  • Poster (Mar 2011) (pdf)
  • Handout (Oct 2011) (pdf)
  • Quad Chart (Oct 2011) (pdf)