FTXS 2026

Workshop Overview

As indicated by our re-definition of the FTXS initialism, we have modified the focus of the workshop. This workshop builds on the long history of this workshop and continues to explore the theme of fault tolerance on large-scale systems. However, the focus is now on understanding and mitigating faults that occur in the context of AI systems (construed broadly). This website is currently just a placeholder but, watch this space for more details in the very near future.

News & Announcements

[May 7, 2026] Submission dates added.
[April 9, 2026] Our proposal was accepted! FTXS 2026 will be held in conjunction with SC26 in Chicago, IL 2026. 🎉🎉🎉

Important Dates

Paper submission closes: August 6, 2026
Author notification: September 4, 2026
Camera-ready papers: September 25, 2026
Workshop: Monday, November 16, 2026 (1:30pm - 5:00pm CST)

All deadlines are Anywhere-on-earth (AoE), the workshop start and end time are Central Standard Time.

Workshop Details

WHEN : Monday, November 16, 2026 (1:30pm - 5:00pm CST)
WHERE : Chicago, IL, USA
VENUE : McCormick Place
REGISTRATION : Register to attend SC26 HERE (registration opens July 8, 2026)
SUBMISSION : Papers should be submitted at: https://submissions.supercomputing.org/
UPDATES : Follow us on Twitter ( @ftxsworkshop ) for the latest news and updates on the workshop
QUESTIONS : TBD

Workshop Program

TBD

Workshop Scope

As AI systems and large language models (LLMs) scale to unprecedented levels of complexity and deployment, addressing their faults, trustworthiness, and explainability becomes critical to ensure reliable and responsible operation in real-world environments.

FAULTS:

Modern AI pipelines, including training, inference, retrieval-augmented generation (RAG), and agentic systems, are vulnerable to a variety of faults and subtle failures. These include distribution drift, prompt and tool variability, infrastructure noise, and limited-precision effects. Such faults can lead to cascading issues like hallucinations, instability, miscalibration, and silent regressions that waste computational resources, degrade system performance, and may mislead users.

TRUSTWORTHINESS:

Building trust in AI systems at scale requires robust mechanisms to detect, diagnose, and defend against faults and failures. Trustworthiness encompasses system reliability, reproducibility, and resilience under real-world variability. It also involves establishing rigorous measurement, benchmarking, and validation practices that provide confidence in AI-assisted decisions and operational outcomes.

EXPLAINABILITY:

Explainability plays a vital role as a diagnostic and attribution tool across the AI pipeline. By providing insights into the behavior of models, retrieval components, tools, infrastructure, and precision layers, explainability helps identify the root causes of faults and supports transparent evaluation. It enables stakeholders to understand, interpret, and trust AI system outputs, especially when addressing complex failure modes at scale.

We invite original research papers, case studies, and position papers on topics including, but not limited to:

- Faults, failures, and variability in AI/LLM pipelines at scale

- Detection and diagnosis of hallucinations, regressions, and silent errors

- Explainability and interpretability methods for fault attribution

- Measurement, benchmarking, and evaluation protocols for AI system reliability

- Techniques for distribution drift detection and mitigation

- Robustness and fault tolerance in training and inference systems

- Telemetry, monitoring, and observability for large-scale AI deployments

- Impact of infrastructure noise and limited-precision arithmetic on AI outputs

- Reproducibility and validation frameworks for AI systems at scale

- Case studies on operational AI system failures and recovery strategies

- Cross-disciplinary approaches combining HPC, AI, and systems reliability

- Tools and checklists to improve AI system trustworthiness in production

- Security and adversarial fault injection in AI pipelines

Submission Details

Submissions are solicited in the following categories:

Regular papers presenting innovative ideas improving the state of the art or discussing the issues seen on existing extreme-scale systems, including some form of analysis and evaluation. Regular papers must be at least six (6) pages and should not exceed eleven (11) pages including all text, appendices, figures, and references. Accepted regular papers that meet these requirements will be published.
Extended abstracts presenting preliminary results, proposing disruptive ideas, or challenging assumptions in the field. The inclusion of some form of preliminary results is encouraged. Extended abstract papers should not exceed four (4) pages, including all text, figures, and references. Extended abstracts will be evaluated separately and given shorter oral presentations. Given minimum publication requirements imposed by SC26, extended abstracts WILL NOT be published.

Submissions shall be submitted to https://submissions.supercomputing.org/ and must conform to the requirements established by IEEE at: https://www.ieee.org/conferences/publishing/templates.html. LaTeX and MS Word templates are also available at this link.

At least one author from every accepted paper is expected to attend the conference in-person and present the paper. Remote presentations are not permitted.

Reproducibility

Reproducibility is an important component of large-scale system research. However, the goal of our workshop is to encourage and facilitate discussion of novel approaches and preliminary results. As a result, it may not always be feasible to release reproducibility artifacts. Moreover, to the greatest extent possible, we want to minimize unnecessary obstacles to socializing new ideas. Therefore, while we strongly encourage authors to make their work as public and reproducible as possible, we do not explicitly require it.

Publication

Subject to publisher constraints, our workshop will publish all submissions accepted for inclusion in our workshop.

Workshop Chairs

TBD

Workshop Organizing Committee

TBD

Program Committee

TBD

Page updated

Google Sites

Report abuse