Practically FAIR 2023 Workshop

During ICPE 2023 in Portugal

15-16 April 2023

Coimbra, Portugal (Only in-person attendance)

FAIR Data Principles have emerged as a convenient way to describe the practice of offering data sufficiently publicly available and in usable enough formats that others can use it to either confirm research results or for new investigations. While these principles offer good guidance for data, they are silent on how to achieve these goals in systems that use data. Further, not all data categories are covered omitting in particular HPC data sets and non-data digital assets in general.

FAIR Data Principles seek to address Findability, Accessibility, Interoperability, and Reuse of digital assets. While this seems straightforward on the surface, delving a little deeper reveals challenges. For example, is the metadata provided meaningful for anyone beyond the original data providers? How do we tell if a particular copy of the data is authentic or if it has been tampered with? Is the data format readable by standard tools and in chunk sizes reasonable for science yet workable for tools on some platform? How do I describe data in a new domain without agreed upon metadata? For derived data sets from massive sources too large to access generally, how do we describe it such that others could generate the same subsets?

These kind of questions offer insights into the practical coding/software engineering aspects of adopting FAIR. This workshops seeks new contributions working to answer these and other similar questions as well as offering experiences about adopting FAIR within an application to help others see potential hurdles and a way it was crossed successfully.

The application domains targeted by this workshop runs the gamut from small, embedded systems all the way up to the largest scale out and scale up data and workflows. Each application has lessons that may offer help for other applications at wildly different scales.

Agenda

9:00-9:05 Welcome and Introduction

9:05-10:05 Keynote: "FAIR enabling re-use of data-intensive workflows and scientific reproducibility", Dr. Line Pouchard, Brookhaven National Laboratory, US Department of Energy

10:05-10:30 Fadoua Rafii, Horacio Gonzalez-Velez and Adriana E. Chis, "Automatic FAIR Provenance Collection and Visualization for Time Series"

10:30-11:00 Break

11:00-11:30 Invited talk: Daniel Pedro de Jesus Faria, "FAIR data and the biological sciences"

11:30-12:30 Open discussion

Topics of Interest:

Position and Experience papers related to scientific applications and platforms on related topics (particularly the topics listed below)
Big Data or AI workflow systems like Spark, Hadoop, and Tensorflow in conjunction with data management and reproducibility efforts and techniques
Domain specific metadata hierarchy development efforts
Data formatting and chunking issues and related support libraries
Approaches for testing FAIR compliance automatically
Approaches for dealing with the derived data space for FAIR
Data authentication approaches with low barriers to support while still offering strong guarantees
Programming framework support to better address FAIR principles
Testing of claimed FAIR data compliance to validate what the compliance level actually is for a third party investigator
Cross-language and cross-platform data portability issues that would affect FAIR compliance
Provenance tracking for FAIR marked artifacts within a workflow
Working with non-data FAIR digital assets using any technique or addressing any problem like those described above or domain specific examples
And any related topics.

Submissions accepted in EasyChair:

https://easychair.org/conferences/?conf=pfair23

Papers should be formatted in ACM format following ICPE formatting rules and can be 5 pages not including references.

Important Dates:

Submission Deadline (firm): Jan 27th
Responses to Authors: Feb 10th
Camera Ready due: Feb 20th 2023
Workshop: 15 or 16 April 2023

Proposed Program Committee:

Dmitry Duplyakin (Utah)
Rosa Filgueira (University of St. Andrews)
Balazs Gerofi (Intel)
Bingsheng He (Singapore)
Shadi Ibrahim (INRIA)
Brian Kocoloski (ISI)
Jakob Luettgau (UTK)
Tom Peterka (ANL)
Reed Milewicz (SNL)
Rocio Carratala Saez (UJI)
Karly Harrod (ORNL)
Hariharan Devarajan (LLNL)
Matthew Wolf (ORNL)

Organizing Committee:

Jay Lofstead (Sandia) gflofst@sandia.gov
Paula Olaya (UTK)