HPCMASPA 2016

Architects, administrators, and users of modern high-performance computing (HPC) systems strive to meet goals of energy, resource, and application efficiency. Optimizing for any or all of these can only be accomplished through analysis of appropriate system and application information. While application performance analysis tools can provide application performance insight, the associated overhead typically decreases application performance while the tools are being employed. Thus they are typically used for application performance tuning but not for production application runs. Likewise traditional system monitoring tools in conjunction with analysis tools can provide insight into run-time system resource utilization. However, due to overhead and impact concerns, such tools are often run with collection periods on order of minutes or only used to solve problems and not during normal HPC system operation. There are currently few, if any, tools that provide continuous, low impact, high fidelity system monitoring, analysis, and feedback that meet the increasingly urgent resource efficiency optimization needs of HPC systems.

Modern processors and operating systems being used in HPC systems expose a wealth of information about how system resources, including energy, are being utilized. Lightweight tools that gather and analyze this information could provide feedback, including run-time, to increase application performance; optimize system resource utilization; and drive more efficient future HPC system design.

The goal of this workshop is to provide an opportunity for researchers to exchange new ideas, research, techniques, and tools in the area of HPC monitoring, analysis, and feedback as it relates to increasing efficiency with respect to energy, resource utilization, and application run-time.

Keynote Speaker: William (Bill) T. C. Kramer, NCSA Blue Waters Director and PI

Failure and Resiliency in the Shadow of Exascale – Will our our Current Assumptions Take us in the Right Direction?

Panel: Accessible Analytics and Visualizations

Topics

Data collection, transport, and storage

Design of systems and frameworks for HPC monitoring which address HPC requirements such as:
- Extreme scalability
- Run time data collection and transport
- Analysis on actionable timescales
- Feedback on actionable timescales
- Minimal application impact
Extraction and evaluation of resource utilization and state information from current and next generation components
Monitoring methodologies and results for all HPC system components and support infrastructure (e.g., compute, network, storage, power, facilities)

Analysis of monitored data and system information

Extraction of meaningful information from raw data, such as system and resource health, contention, or bottlenecks
Methodologies and applications of analysis algorithms on large scale HPC system data
Visualization techniques for large scale HPC data (addressing size, timescales, presentation within a meaningful context)
Evaluation of correlative relationships between system state and application performance via use of monitored system data

Response to and utilization of processed data and system information

Mechanisms for feedback and response to applications and system software (e.g., informing schedulers, down-clocking CPUs)
HPC application design and implementation that take advantage of monitored system data (e.g., dynamic task placement or rank-to-core mapping)
System-level and Job-level feedback and responses to monitored system data
Job scheduling and allocation based on monitored system information (e.g. contention for storage or network resources)
- Integration of system and facilities data for system and site operational decisions
Use of monitored system data for evaluation of future systems specifications and requirements
Use of monitored system data for validation of systems simulations

Experience reports and System operations

Design and implementation of monitoring tools as part of HPC operations
Experiences with monitoring and analysis methodologies and tools in HPC applications
- Note this is not meant to include application performance analysis tools such as open|speedshop or craypat
Experiences with monitoring and analysis tools for HPC systems specification/selection
Sub-optimal approaches taken because there currently isn’t another way (include associated gap analysis)
How not to do it, with explanations, benchmarks, or analysis of code to save the rest of us from trying it again

Important dates (AOE):

Abstract Due: ~~Jan 9, 2016~~ ~~JAN 16, 2016~~
Paper Due (contingent on abstract submission by Jan 16): ~~Jan 16, 2016~~ ~~JAN 23, 2016~~ CLOSED
Notification: ~~Feb 15, 2016~~
Camera Ready: Mar 7, 2016 Updated
Workshop: May 27, 2016

HPCMASPA 2016 will be held in conjunction with IEEE IPDPS (May 23-27, 2016 in Chicago, IL)

Format

HPCMASPA 2016 will be a full day and will consist of talks from refereed papers, a panel discussion with representative researchers and practitioners, and a keynote speaker.

Talks from refereed papers

HPCMASPA 2016 welcomes submissions of original work not previously published nor under review by another conference or journal. Proceedings of the workshop will be distributed at the conference and will be submitted for inclusion in the IEEE Xplore Digital Library.

Categories:

Full length technical papers (preferred) addressing completed research, best practice whitepapers, and other in-depth research, etc. 10 pages max, including references. 30 min presentation.
Short technical papers addressing timely topics such as work in progress, experience reports, and tool surveys, etc. 5 pages max, including references. 20 min presentation.

Guidelines:

Submissions must be compliant with the IEEE format for conference proceedings. LaTex and Word templates can be found at the IEEE IPDPS 2016 website.
Web-based submissions through EasyChair. PDF's only.
Submissions must be in English.
Submission implies the willingness of at least one of the authors to register and present the work associated with submission.
Submissions will be evaluated on their technical soundness, significance, presentation, originality of work, and relevance and interest to the workshop scope.

30th IEEE International Parallel & Distributed Processing Symposium

Page updated

Google Sites

Report abuse