FTXS 2017 - Washington, D.C.
UPDATE: Deadline for submissions extended to April 8th.
The 7th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS) 2017
WHEN?
WHERE?
VENUE?
ROOM?
IN ASSOCIATION WITH?
REGISTER
PAST FTXSs
CALL FOR PAPERS
Workshop Keynote
John Daly - Laboratory for Physical Sciences
Faults and How to Live With Them
Workshop Agenda
Submission Essential Information
Submissions are expected in the following categories:
Regular papers presenting innovative ideas improving the state of the art in resilience, reliability, dependability, and/or fault-tolerance at the extreme scale.
Experience papers discussing the issues seen on existing extreme-scale systems, including analysis, evaluation, and interpretation.
Authors are invited to submit papers with unpublished, original work of a maximum of eight (8) pages for normal papers and six (6) pages for experience papers. Please follow the US Letter guidelines for ACM Proceedings Style.
Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM digital library. Submission implies the willingness of at least one of the authors to register and present the paper.
Important Dates
Submission of papers: March 16th, 2017 Extended: April 8th, 2017.
Author notification: April 20th, 2017
Camera ready papers: May 5th, 2017 (HARD DEADLINE BY HPDC!)
Workshop: June 26, 2017
Workshop Topics
Topics include, but are not limited to:
Failure data analysis and field studies
Power, performance, resilience (PPR) assessments / tradeoffs
Novel fault-tolerance techniques and implementations
Emerging hardware and software technology for resilience
Silent data corruption (SDC) detection / correction techniques
Advances in reliability monitoring, analysis, and control of highly complex systems
Failure prediction, error preemption, and recovery techniques
Fault-tolerant programming models
Models for software and hardware reliability
Metrics and standards for measuring, improving, and enforcing effective fault-tolerance
Scalable Byzantine fault-tolerance and security from single-fault and fail-silent violations
Atmospheric evaluations relevant to HPC systems (terrestrial neutrons, temperature, voltage, etc.)
Near-threshold-voltage implications and evaluations for reliability
Benchmarks and experimental environments including fault injection
Frameworks and APIs for fault-tolerance and fault management
Workshop Chair
Nathan DeBardeleben - Los Alamos National Laboratory
Workshop Organizing Committee
Keita Teranishi – Sandia National Laboratories
John Daly – Laboratory for Physical Sciences
Program Committee
Emmanuel Agullo – INRIA Bordeaux
Rizwan Ashraf – Oak Ridge National Laboratory
Leonardo Bautista Gomez – Barcelona Supercomputing Center
Aurélien Bouteiller – University of Tennessee Knoxville
Robert Clay – Sandia National Laboratories
James Elliott – Sandia National Laboratories
Christian Engelmann –Oak Ridge National Laboratory
Kurt Ferreira – Sandia National Laboratories
Marc Gamell – Rutgers University
Qiang Guan – Los Alamos National Laboratory
Sudhanva Gurumurthi –AMD
Saurabh Hukerikar – Oak Ridge National Laboratory
Hideyuki Jitsumoto – Tokyo Institute of Technology
Zhiling Lan – Illinois Institute of Technology
Scott Levy – Sandia National Laboratories
Naoya Maruyama – RIKEN AICS
Bogdan Nicolae – Huawei Research Germany
Yves Robert – ENS Lyon & Univ. Tenn. Knoxville
Vilas Sridharan – AMD
Peter Strazdins – Australian National University
Abhinav Vishnu – Pacific Northwest National Lab.
Panruo Wu – University of California at Riverside
Questions?
Please address FTXS workshop questions to Nathan DeBardeleben, Los Alamos National Laboratory (ndebard@lanl.gov)