Scientific Workflows and Dynamic Applications

Scientific discovery in the era of big data requires scientists to apply a complex sequence of computations, transformations, and reductions to a formidable volume of data. The Notre Dame Scientific Workflows and Dynamic Applications team develops, integrates, deploys, and operates cyberinfrastructure that enables data-intensive scientific exploration leveraging heterogenous computational resources at scale. From data-processing pipelines to dynamic, adaptive, and portable computational frameworks, the Scientific Workflows and Dynamic Applications team is interested in any approach that enables and accelerates data-intensive science.

Meet the Team

Developers and Engineers

Kenyi Hurtado

Dr. Kenyi Hurtado is an HPC Engineer at the Notre Dame Center for Research Computing and a member of the CMS collaboration. He works on submission infrastructure operations and software enhancement and maintenance of core workflow management components in CMS.

Cody Kankel

Cody Kankel is an HPC Administrator with the Center for Research Computing at the University of Notre Dame. He manages the software stack and batch scheduler for the CRC's computing infrastructure along with developing HPC submission techniques for the SCAILFIN project.

Benjamin Tovar

Dr. Benjamin Tovar is a research software engineer at the Cooperative Computing Lab, at the University of Notre Dame. In his current role, he is the lead maintainer of CCTools, a suite of tools to quickly enable scientist the use distributed, high-throughput computing.

Faculty

Professor Brenner's research centers on the advancement of computational infrastructure for scientific discovery. Three current areas of work are: Secure Cloud and Distributed Systems for Science, Architecture Aware Scientific Algorithms and Computational Social Science. He directs operations for the Notre Dame CRC.

Mike Hildreth

A professor in the Physics Department and Associate Dean for Research and Graduate Studies in the College of Science, Mike Hildreth is a particle physicist and a member of the CMS Experiment at the CERN LHC. He is currently Offline Software Coordinator for the CMS Experiment. He has led various efforts in reproducibility and distributed computing and is currently working on combining these in a way to accelerate machine learning.

Kevin Lannon

A professor in the physics department, Kevin is an experimental particle physicst and member of the CMS collaboration. He is co-leader of the U.S. CMS University Computing Facilities group. He is particularly interested in leveraging portability and reproducibility techniques to run complex and data-intensive HEP computational workflows on heterogenous resources.

Douglas Thain

Douglas Thain is Professor and Associate Chair in the Computer Science and Engineering Department, and directs the Cooperative Computing Lab, which develops and studies workflow systems and other tools for large scale distributed computing on clusters, clouds, and grids.

Projects

CMS Workflow Management

The CMS experiment generates several petabytes (PB) of data per year and several more PB of simulated data are generated to facilitate physics analysis. The CMS workflow management system is responsible for translating the data processing plan expressed by the physicists into computational jobs running on the available worldwide distributed computational resources.

Makeflow

Makeflow is a portable workflow manager in which tasks to be executed are described in terms of input and output files in a technology neutral way. Makeflow automatically figures out which tasks in the workflow depend on each other, which tasks can be executed in parallel, and appropriately executes them in a variety of different systems, including HTCondor, Slurm, Amazon EC2 and the bundled Work Queue system.

Work Queue

Work Queue is a framework for building master-worker applications that span thousands of machines drawn from clusters, clouds, and grids. Work Queue applications are written in C, Python, or Perl using a simple API that allows users to define tasks, submit them to the queue, and wait for completion. Tasks are executed by a standard worker process that can run on any available machine. Each worker calls home to the master process, arranges for data transfer, and executes the tasks. The system handles a wide variety of failures, allowing for dynamically scalable and robust applications.

CMS lacks tools for users to run their jobs at sites not conforming to WLCG standards. Lobster aims to alleviate this problem by providing a workflow management to run jobs depending on CMSSW that does not have any specific requirements to the cluster or need for administrative privileges.

The Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) Project deploys artificial intelligence and likelihood-free inference techniques and software using scalable cyberinfrastructure (CI) that is developed to be integrated into existing CI elements, such as the REANA system. The analysis of LHC data is the project’s primary science driver, yet the technology is sufficiently generic to be widely applicable to other data-intensive domains.

VC3 automates deployment of cluster frameworks to access diverse computing resources for collaborative science teams.