CFP and Submissions

HPCMASPA 2018 welcomes submissions of original work not previously published nor under review by another conference or journal. All categories of papers will be peer-reviewed and published in proceedings arranged for by IEEE Cluster.

Categories: (All dates AOE)

1. Technical Papers: (8 pages + additional reference-only pages)

Addressing completed research, best practice whitepapers, and other in-depth research and experience, etc. ~20 minute presentation + 5 minute questions.

  • Submissions Open: Apr 15
  • Abstracts: Waived
  • Papers: May 29 Jun 11 - Extended (Hard Deadline)
  • Notification: Jul 3
  • Camera Ready: Jul 31

2. Short and Work in Progress Papers: (4 pages + additional reference-only pages)

At least one session will be dedicated to Short/Work In Progress encouraging interactive audience discussion. Presentation time limit will be finalized based on the number of WIP papers accepted.

  • Submissions Open: Jul 5 - Coming soon!
  • Papers: Jul 18 - Extended (Hard Deadline)
  • Notification: Jul 23
  • Camera Ready: Jul 31

Submissions:

  • Web-based submissions through the HPCMASPA 2018 EasyChair site.
  • PDFs only.
  • Submissions must be compliant with the format used by IEEE Cluster. LaTex and Word templates can be found here.
  • NO additional pages can be purchased for this workshop.
  • Submissions must be in English.
  • Submission implies the willingness of at least one of the authors to register and present the work associated with submission.
  • Submissions will be evaluated on their technical soundness, significance, presentation, originality of work, and relevance and interest to the workshop scope.

Topics:

Including, but not limited to:

Data collection, transport, and storage

  • Monitoring methodologies and results for all HPC system components and support infrastructure (e.g., compute, network, storage, power, facilities)
  • Design of systems and frameworks for HPC monitoring which address HPC requirements such as:
    • Extreme scalability
    • Run time data collection and transport
    • Analysis on actionable timescales
    • Feedback on actionable timescales
    • Minimal application impact
  • Extraction and evaluation of resource utilization and state information from current and next generation components

Analysis of large-scale data and system information

  • Extraction of meaningful information from raw data, such as system and resource health, contention, or bottlenecks
  • Methodologies and applications of analysis algorithms on large scale HPC system data
  • Visualization techniques for large scale data (addressing size, timescales, presentation within a meaningful context)
  • Evaluation of correlative relationships between system state and application performance via use of monitored system data

Response to and utilization of processed data and system information

  • Mechanisms for feedback and response to applications and system software (e.g., informing schedulers, down-clocking CPUs)
  • HPC application design and implementation that take advantage of monitored system data (e.g., dynamic task placement or rank-to-core mapping)
  • System-level and Job-level feedback and responses to monitored system data
  • Job scheduling and allocation based on monitored system information (e.g. contention for storage or network resources)
  • Integration of system and facilities data for system and site operational decisions
  • Use of monitored system data for evaluation of future systems specifications and requirements
  • Use of monitored system data for validation of systems simulations

Experience reports and System Operations

  • Design and implementation of monitoring tools as part of HPC operations
  • Experiences with monitoring and analysis methodologies and tools in HPC applications
    • Note this is not meant to include application performance analysis tools such as open|speedshop or craypat
  • Experiences with monitoring and analysis tools for HPC systems specification/selection
  • Sub-optimal approaches taken because there currently isn’t another way (include associated gap analysis)
  • How not to do it, with explanations, benchmarks, or analysis of code to save the rest of us from trying it again