FTXS 2017 - Washington, D.C.

UPDATE: Deadline for submissions extended to April 8th.

The 7th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS) 2017

WHEN?

WHERE?

VENUE?

ROOM?

IN ASSOCIATION WITH?

REGISTER

PAST FTXSs

CALL FOR PAPERS

Workshop Keynote

John Daly - Laboratory for Physical Sciences

Faults and How to Live With Them

Workshop Agenda

Submission Essential Information

Submissions are expected in the following categories:

  • Regular papers presenting innovative ideas improving the state of the art in resilience, reliability, dependability, and/or fault-tolerance at the extreme scale.

  • Experience papers discussing the issues seen on existing extreme-scale systems, including analysis, evaluation, and interpretation.

Authors are invited to submit papers with unpublished, original work of a maximum of eight (8) pages for normal papers and six (6) pages for experience papers. Please follow the US Letter guidelines for ACM Proceedings Style.

Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM digital library. Submission implies the willingness of at least one of the authors to register and present the paper.

Submit a paper.

Important Dates

Submission of papers: March 16th, 2017 Extended: April 8th, 2017.

Author notification: April 20th, 2017

Camera ready papers: May 5th, 2017 (HARD DEADLINE BY HPDC!)

Workshop: June 26, 2017

Workshop Topics

Topics include, but are not limited to:

    • Failure data analysis and field studies

    • Power, performance, resilience (PPR) assessments / tradeoffs

    • Novel fault-tolerance techniques and implementations

    • Emerging hardware and software technology for resilience

    • Silent data corruption (SDC) detection / correction techniques

    • Advances in reliability monitoring, analysis, and control of highly complex systems

    • Failure prediction, error preemption, and recovery techniques

    • Fault-tolerant programming models

    • Models for software and hardware reliability

    • Metrics and standards for measuring, improving, and enforcing effective fault-tolerance

    • Scalable Byzantine fault-tolerance and security from single-fault and fail-silent violations

    • Atmospheric evaluations relevant to HPC systems (terrestrial neutrons, temperature, voltage, etc.)

    • Near-threshold-voltage implications and evaluations for reliability

    • Benchmarks and experimental environments including fault injection

    • Frameworks and APIs for fault-tolerance and fault management

Workshop Chair

Nathan DeBardeleben - Los Alamos National Laboratory

Workshop Organizing Committee

Keita Teranishi – Sandia National Laboratories

John Daly – Laboratory for Physical Sciences

Program Committee

Emmanuel Agullo – INRIA Bordeaux

Rizwan Ashraf – Oak Ridge National Laboratory

Leonardo Bautista Gomez – Barcelona Supercomputing Center

Aurélien Bouteiller – University of Tennessee Knoxville

Robert Clay – Sandia National Laboratories

James Elliott – Sandia National Laboratories

Christian Engelmann –Oak Ridge National Laboratory

Kurt Ferreira – Sandia National Laboratories

Marc Gamell – Rutgers University

Qiang Guan – Los Alamos National Laboratory

Sudhanva Gurumurthi –AMD

Saurabh Hukerikar – Oak Ridge National Laboratory

Hideyuki Jitsumoto – Tokyo Institute of Technology

Zhiling Lan – Illinois Institute of Technology

Scott Levy – Sandia National Laboratories

Naoya Maruyama – RIKEN AICS

Bogdan Nicolae – Huawei Research Germany

Yves Robert – ENS Lyon & Univ. Tenn. Knoxville

Vilas Sridharan – AMD

Peter Strazdins – Australian National University

Abhinav Vishnu – Pacific Northwest National Lab.

Panruo Wu – University of California at Riverside

Questions?

Please address FTXS workshop questions to Nathan DeBardeleben, Los Alamos National Laboratory (ndebard@lanl.gov)