GASP 2020 - Kegan Rice

Kiegan Rice-

NORC at the University of Chicago

Dr. Heike Hofmann, Iowa State University

Title: Adaptive Computational Reproducibility in a Shifting R Package Landscape

Abstract:

Modern data analysis processes can be conceptualized as modularized, linear pipelines of sequential decisions. In a complete data pipeline, a series of decisions are made about how to collect data, how to clean and transform data, and how to train models and quantify model success. These processes rely on the software tools utilized and the actions applied to data throughout the analysis pipeline. R, one of the most widely-used statistical software frameworks for data analysis, relies on user-developed “packages” for many data science and data analysis tasks. These packages are subject to change over time, which can impact computational reproducibility efforts, as well as frustrate users who are left to identify problem areas in broken data analysis code.

Most currently available tools for managing computational reproducibility in R focus on capturing static versions of code and packages, which do not provide assistance when incorporating package updates and additional functionality. We propose an adaptive approach to computational reproducibility in R that focuses on identifying changes in packages over time and differences in package versions across users and machines. In this presentation, I will describe the current state of computational reproducibility tools for R and present our proposed framework for managing computational reproducibility in R. I will also demonstrate the software tools we are developing to assist users as part of the “manager” package through an R package case study.

Page updated

Google Sites

Report abuse