HPCMASPA 2014
Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications
in conjunction with IEEE Cluster 2014
Madrid, Spain
Architects, administrators, and users of modern high-performance computing (HPC) systems strive to meet goals of energy, resource, and application efficiency. Optimizing for any or all of these can only be accomplished through analysis of appropriate system and application information. While application performance analysis tools can provide application performance insight, the associated overhead typically decreases application performance while the tools are being employed. Thus they are typically used for application performance tuning but not for actual user application runs. Likewise traditional system monitoring tools in conjunction with analysis tools can provide insight into run-time system resource utilization. However, due to overhead and impact concerns, such tools are often run with collection periods on order of minutes or only used to solve problems and not during normal HPC system operation. There are currently few, if any, tools that provide continuous, low impact, high fidelity system monitoring, analysis, and feedback that meet the increasingly urgent resource efficiency optimization needs of HPC systems.
Modern processors and operating systems being used in HPC systems expose a wealth of information about how system resources, including energy, are being utilized. Lightweight tools that gather and analyze this information could provide feedback, including run-time, to increase application performance, optimize system resource utilization, and drive more efficient future HPC system design.
The main goal of this workshop is to provide an opportunity for researchers to exchange new ideas, research, techniques, and tools in the area of HPC system level monitoring, analysis, and feedback as it relates to increasing efficiency with respect to energy, resource utilization, and application run-time.
Topics
Data collection, transport, and storage
Design of systems and frameworks for HPC monitoring which address HPC requirements such as:
Extreme scalability
Run time data collection and transport
Analysis on actionable timescales
Feedback on actionable timescales
Minimal application impact
Extraction and evaluation of resource utilization and state information from current and next generation components (e.g., GPU, MICS)
Monitoring methodologies and results for all HPC system components and support infrastructure (e.g., compute, network, storage, power)
How not to do it, with explanations, benchmarks, or analysis of code to save the rest of us from trying it again
Analysis of monitored data and system information
Extraction of meaningful information from raw data, such as system and resource health, contention, or bottlenecks
Methodologies and applications of analysis algorithms on large scale HPC system data
Visualization techniques for large scale HPC data (addressing size, timescales, presentation within a meaningful context)
Evaluation of correlative relationships between system state and application performance via use of monitored system data
Response to and utilization of processed data and system information
Mechanisms for feedback and response to applications and system software (e.g., informing schedulers, down-clocking CPUs)
HPC application design and implementation that take advantage of monitored system data (e.g., dynamic task placement or rank-to-core mapping)
System-level and Job-level feedback and responses to monitored system data
Job Scheduling and Allocation based on monitored system information (e.g. contention for storage or network resources)
Use of monitored system data for evaluation of future systems specifications and requirements
Use of monitored system data for validation of systems simulations
Important dates
May 31 AOE - Abstract due
June 7 AOE - Papers due (contingent upon abstract submission by 5/31)
June 30 - Acceptance notification
July 21 - Camera ready papers due
Sept 26 - Workshop
Format
HPCMASPA 2014 welcomes submissions for the following:
Technical Papers (30 minute presentation):
These submissions consist of work not previously published nor under review by another conference or journal. Accepted papers will be included in the workshop proceedings published by IEEE.
Mini-talks (15 minute presentation):
These submissions can consist of previously published work and can address work-in-progress, highlight gap areas, motivate research areas, etc. Accepted mini-talks will not be published in the proceedings.
Submission Guidelines:
Submissions (either type) must be compliant with the IEEE Xplore format for publication. LaTeX (preferred) and Word Templates are available here. Additional instructions can be found on the IEEE Cluster site.
Maximum 8 pages for Technical Papers
Maximum 4 pages descriptive text and figures for Mini-talks
Web-based submission through EasyChair. PDF's only.
Submissions must be in English.
Submission implies the willingness of at least one of the authors to register and present the work associated with submission.
Submissions will be evaluated on their originality, technical soundness, significance, presentation, and interest to the workshop attendees.
Organization
Organizing Committee
Benjamin Allan, Sandia National Laboratories
Jim Brandt, Sandia National Laboratories
Ann Gentile, Sandia National Laboratories
Cory Lueninghoener, Los Alamos National Laboratory
Nichamon Naksinehaboon, Open Grid Computing
Boyana Norris, University of Oregon
Narate Taerat, Open Grid Computing
Program Committee
Jon Cook, New Mexico State University
Narayan Desai, Argonne National Laboratory
Richard Gerber, NERSC
Forest Godfrey, Cray
Yun (Helen) He, Lawrence Berkeley National Laboratory
Karen Karavanic, Portland State University
Zhiling Lan, Illinois Institute of Technology
Box Leangsuksun, Louisiana Tech University
Mike Mason, Los Alamos National Laboratory
Henry Neeman, Oklahoma University Supercomputing Center for Education & Research
Martin Schulz, Lawrence Livermore National Laboratory
Mike Showerman, NCSA
David Thompson, Kitware
Ziming Zheng, HP Vertica
Related Future Events:
Monitoring Large-Scale HPC Systems: Issues and Approaches - BOF at SC14 Wed Nov 19 5:30-7:00 pm
HPCMASPA 2015 at IEEE Cluster 2015 Sept 8, 2015.