Introduction

Some of the most challenging computational problems faced by scientists result from the need to combine multiple applications in novel ways across distributed computing resources. Examples include weather forecasting, earthquake damage estimation, drug discovery, and simulating dark energy. To enable such applications, XSEDE provides a set of scientific workflow resources:
  • A host to submit and manage workflows from - see below
  • A vetted set of workflow systems, available on the submit host, and with example workflows
  • ECSS - allocatable support for workflow problems
  • Remote job and data management services on the resources
The video the the right provides an overview of scientific workflows on XSEDE.


Workflow Submit Host


We are currently evaluating a solution of providing an interactive submit host. The host will have some workflow systems pre-installed, and have example workflows which can be used to try thew workflow systems out, and as a base for when you want to plug your own codes into a workflow.

The system is available to XSEDE users with a current allocation. Use your XSEDE portal username and password and ssh to:

workflow.iu.xsede.org

If you are having problems or questions regarding this host, please contact that ECSS workflows team at ecss-workflows@xsede.org


Available workflow systems

Swift


Swift is a simple language for writing parallel scripts that run many copies of ordinary programs concurrently as soon as their inputs are available, reducing the need for complex parallel programming. The same script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving your computations from laptop or workstation to any XSEDE resource. Swift can run a million programs, thousands at a time, launching hundreds per second. This hands-on tutorial will give participants a taste of running simple parallel scrips on XSEDE systems and provide pointers for applying it to your own scientific work.




Makeflow


Makeflow is a workflow engine for executing large complex workflows, with workflows up to thousands of tasks and hundreds of gigabytes. In this section of the tutorial, users will learn the basics of writing a Makeflow, which is based on the traditional Make construct. In the hands-on example, the users will learn to write Makeflow rules, run a makeflow locally, as well as running the tasks on XSEDE resources. The users will be introduced to Work Queue, a scalable master/worker framework, and create workers on XSEDE resources and connect them to the makeflow. The users will learn to use Work Queue to monitor workflows and the basics of debugging makeflows.

Pegasus


Pegasus WMS is a configurable system for mapping and executing abstract application workflows over a wide range of execution environment including a laptop, a campus cluster, a Grid, or a commercial or academic clouds. Pegasus is a directed acyclic graph (DAG) based workflow system built on top of HTCondor DAGMan. On XSEDE, the workflow system uses GRAM5 for remote job management, and can handle serial, multi-threaded, as well as MPI jobs on XSEDE. The tutorial covers running workflows across multiple XSEDE sites, running MPI jobs, and running a set of serial tasks using the pegasus-mpi-cluster tool.


RADICAL-Cybertools


RADICAL-Pilot allows a user to run large numbers of tasks (i.e. simulations, data analysis, etc.) concurrently on a multitude of distributed computing resources. RADICAL-Pilot is a Pilot-Job system developed in Python and offers an API that provides freedom on selecting which resources to allocate and how to distribute the tasks over it. By providing an overlay on top of the allocated resources, the user does not need to worry about underlying infrastructure and middleware heterogeneity. Because of this overlay, the users individual tasks don't need to go through the queuing systems of the respective resources once the overlay has been established. Additionally, the user can specify input and output data on the tasks that will be handled transparently by the system. 

In this tutorial we demonstrate and let the user experiment with writing simple Python applications that use RADICAL-Pilot to execute tasks on distributed computing resources based on a selection of patterns (e.g. Bag-of-Tasks, Chained tasks, Coupled Tasks).