Data and software management & re-use with FAIR in the age of Big Data

A half-day tutorial part of the HICSS 59 SWT program

Hyatt Regency on January 6-9, 2026 (Maui)

This tutorial was first offered at HICSS 55 in January 2022 , 23, 24, 25 with great success. In 2026 we will emphasize data and software management practices with FAIR principles for AI.

https://hicss.hawaii.edu/

For those interested in submitting a paper, see the related mini-track under Software Technology

https://sites.google.com/view/cfp-hicss-trustworthy-ai/home

Which researcher has not wished, when a key lab member graduates, that they could avoid consuming precious time and resources in training new people about seemingly mundane data and tool issues? Good data and software management ensures preservation of knowledge and smooth continuation in the context of staff and student turn-over. Research data management plans require detailed specifications on how data produced in a proposed effort will be made available. Journal publishers increasingly require researchers to make their data and code available during peer-review, and as supplemental materials, or through a trusted digital repository. Some publishers now offer new journals focused exclusively on data. The most prominent example is “Scientific Data”, from the Nature publishing group.

Good stewardship practices are required to ensure that data is properly described for re-use, stored in medium and formats accessible to future users, and simply not lost in the lab. In the last ten years, there has been extensive research performed in the area of research data and software management. This includes the development of community practices, tools, and platforms to facilitate stewardship at every step of the data and software life cycle. As science becomes more collaborative, the Open Source movement has fueled growth in the development of collaboration tools that are fundamental to Research Data Management. In addition, with the advent of Big Data and the increased use of Artificial Intelligence methods in science, numerous datasets from independent sources are integrated to reveal systematic relationships between groups of variables, and analysis methods may be non-deterministic. Discoveries resulting from this use require more transparency and a different kind of scrutiny, data and software sharing than those resulting from smaller, less heterogenous datasets where analysis methods are better understood. State-of-the-art data and software stewardship research emphasizes the goal of establishing reproducibility of experiments and results. Research data management provides a systematic approach to reproducibility and thus inspire trust in data and methods.

Benefits

Establishing good data and software management practices provide researchers from all disciplines with the following benefits:

Data, software, and knowledge preservation throughout the life cycle of a project and beyond
Improved training and communication through tools and protocols
Systematic training practices that can be tailored to individual labs or projects
Improved collaboration within teams through well-established data and software development practices
Ease of re-use for future users and students, lab members, external collaborators, and stakeholders
Compliance with funding agency requirements
Interoperability with publishing platforms and fulfilment of publishers’ requirements
Data and software re-use embedded in current and future practices

In this tutorial, HICSS participants will learn state-of-the-art state-of-the-art data and software management practices to support preservation and reproducibility in 2023. The content of this tutorial is broadly applicable to the academic research disciplines represented at HICSS, as data and software have become an integral part of system sciences. The tutorial will be organized around three mini-sessions, each with a short demo or presentation of tools, and the bulk of time devoted to a hands-on participatory activities. Tools will be grouped around openness and FAIR evaluation, using DMP planning tools to describe how you’ll manage data and software in your lab and funded projects, and learning about furthering reproducible research through the ACM/SC initiative, NeuRIPS checklist, and IEEE/CodeOcean. The tutorial will conclude with a panel of open questions, including suggestions for future areas of emphasis for this tutorial.