CoDaC

A CAV 2024 Workshop on Correct Data Compression, July 22, 2024, Montreal, Canada

Workshop Goals

Aims of CoDaC:

Computing is bottlenecked by the cost of moving data, especially affecting the future prospects of scaling HPC and ML. Moreover, many instruments produce so much data that most of it is never looked at (think rare events in space or medical images). Lossy compression of scientific-computing and ML datasets can reduce the sizes by a large factor (20-50x in many cases), and ML pipelines can often be directly trained using compressed data representations instead of the original data. But all this raises important correctness questions: Are the introduced errors tolerable? Are they guaranteed to meet the error bounds? Are artifacts introduced? There are also lower-level issues: The floating-point representations may introduce their own errors. Future hardware will likely increasingly exhibit bit-flips, introducing yet another source of error. How can time-tested verification approaches such as deductive reasoning, symbolic execution, model checking, and static analysis help answer these questions? CoDaC is perhaps the first workshop that brings the data-compression community and the verification community together to discuss solutions for these pressing problems.

As some of the CoDaC topics may be unfamiliar to the audience, we will provide some background to be provided through a tutorial of lossy compression by two leading experts, a keynote on current challenges and future directions for verification of data compression algorithms, a capsule summary of established verification methods (to be covered in a talk), and (last but not least) a panel where you can ask compression and verification experts anything. We are also including several opportunities for presentations. Please join us and help us brainstorm solutions.

Workshop Program [same as above promo slide's content] (July 22nd, Eastern Daylight Time)

8:30 - 9:00 : Breakfast/Coffee (provided)

9:00 - 9:40 : Tutorial-1 on Data Compression - Peter Lindstrom

        Lawrence Livermore National Laboratory

9:40 - 10:20 : Tutorial-2 on Data Compression - Sheng Di

        Argonne National Laboratory

10:20 - 10:30 : Discussions, Q/A

10:30 - 11:00 : Coffee Break

11:00 - 11:45 : Keynote-1 : Allison Baker (NCAR)

"Lossy Compression and Climate Simulation Data: Reducing Data Volume While Preserving Information

11:45 - 12:15 : Invited Talk : Hari Subramoni and DK Panda (Ohio State University)

"Accelerating Deep Learning - GPU-Based On-the-Fly Compression"

12:15 - 2:15 : Lunch (on our own)

2:15 - 3:00 : Keynote-2 : Johannes Ballé(Google)

 "Learned Data Compression"

3:00 - 3:30 : Invited Talk : Harvey Dam (Univ of Utah, slides) and Vinu Joseph (NVIDIA)

        "Correctness-Preserving Compression of Foundation Models"

3:30 - 4:00 : Coffee Break

4:00 - 4:20 : Contributed Talk: Alex Fallin and Martin Burtscher (Texas State Univ)

"Lessons Learned on the Path to Guaranteeing the Error Bound in Lossy Quantizers"

4:20 - 5:10 : Panel on Data Compression Verification Methods

5:10 - 5:30 : Open mic

5:30 - 6:00 : Summary Readout (scribed summary by organizers)