Home

The 2nd international workshop on
Scalable Workflow Enactment Engines and Technologies (SWEET'13)

held in conjunction with the 2013 SIGMOD conference -- 
June 23, 2013 (morning only) -- New York City, NY, USA

NEWThere will be an open call for a special issue of Future Generation Computer Systems (Elsevier), where extended versions of workshop papers can be submitted.


Logistics: please see the main SIGMOD/PODS pages

Workshop program

Session I
8:55am - Opening remarks

9:00 Keynote: Realizing the Potential of the Cloud for Workflow: Scalability, Security and Reproducibility [slides] [video: Accellerometers] [video2: e-science central]
        Prof. Paul Watson, Newcastle University, UK.
        Abstract:
Workflow has become an important method for building systems as it provides a high-level way to combine software components to create applications. With the advent of cloud computing, it was inevitable that there would be interest in cloud-based workflow systems. This creates a set of opportunities and challenges that are the subject of this talk. In it we discuss how clouds have the potential to overcome some of the limitations of previous, service-based approaches to workflows. We believe that the greatest potential comes in three areas: Scalability, Security and Reproducibility. However, we argue that realizing this potential will not be achieved just by porting existing workflow systems to the cloud; instead work is needed to fundamentally redesign workflow systems in order to fully exploit cloud computing. Through the talk, we will use examples from our work on the e-Science Central cloud platform to illustrate how that potential can be realised. e-Science Central is an open-source workflow platform that is supporting a wide range applications in the cloud, include those that run for weeks on hundreds of nodes.

9:45 Research papers:
  • Marc Nicolas Bux and Ulf Leser. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds [slides]
  • Panayiotis Neophytou, Panos Chrysanthis and Alexandros Labrinidis. A Continuous Workflow Scheduling Framework [slides]
10.30 - 10.45am Break

Session II
10.45 Research papers:
  • Nenad Stojnic and Heiko Schuldt. OSIRIS-SR – A Scalable yet Reliable Distributed Workflow Execution Engine [slides]
  • Marta Mattoso, Jonas Dias, Daniel de Oliveira, Kary Ocaña, Eduardo Ogasawara, Flavio Costa, Felipe Horta, Vítor Sousa and Igor Araújo. User-Steering on HPC Workflows: State of the Art and Future Directions [slides]
11:30 Invited talk: The Google Cloud Platform. [slides]
         Jelena Pjesivac-Grbovic, Google, Inc.
Abstract:
Google’s mission is to organize the world’s information and make it universally accessible and useful. This goal requires an infrastructure that is high performance, scalable, secure, geographically distributed and efficient. Today, Google infrastructure indexes billions of web pages, serves 3 billion hours of video per month, provides 350 million users of Gmail with 10GB of free storage, and much more.  Google Cloud Platform allows external users to build their applications and run their computations on top of the very same infrastructure that powers Google applications and tools.  In this talk I will discuss different products offered through Google Cloud Platform, with focus on Google Compute Engine and Google Cloud Storage.

12.15pm Closing


Invited speakers' bio:

Paul Watson is Professor of Computer Science and Director of the Digital Institute at Newcastle University, UK. There he leads a range of projects that design and exploit cloud computing solutions, including the $20M RCUK "Social Inclusion through the Digital Economy" research hub. His current research is focussed around the high-level "e-Science Central" cloud platform. Professor Watson joined Newcastle University from industry (ICL High Performance Systems) where he designed scalable database systems. He is a Chartered Engineer and a Fellow of the British Computer Society.

Jelena Pjesivac-Grbovic is a senior software engineer in Systems Infrastructure at Google, focusing on distributed data processing frameworks. She received PhD from the University of Tennessee, Knoxville under supervision of Dr. Jack J. Dongarra. Prior to joining Google she actively contributed to the implementation and optimizations of the MPI collective operations in OpenMPI and FT-MPI projects.







Workshop Motivation

One of the goals of computer system engineering has always been to develop systems that are easy to use and understand, but at the same time put great computational power at the fingertips of the end users. The cloud computing model has the potential for making this a realistic goal in the area of business and scientific data processing, by enabling simple access to large pools of data storage and computational resources. More specifically, we observe that cloud computing is facilitating the convergence of workflow-based processing with traditional data management, thereby providing users with the best of both worlds. Workflows are used extensively, both for business applications and in computational science. Common to the broad range of workflow systems currently in use are their relatively simple programming models, which are usually exposed through a visual programming style, and are backed by a well-defined model of computation. While the flexibility of workflows for rapid prototyping of science pipelines makes them appealing to computational scientists, recent applications of workflow technology to data-intensive science shows the need for a robust underlying data management infrastructure. At the same time, on the data management side of science and business, workflow-like models and languages are beginning to emerge, to make it possible for users with no application development resources but close to the data domain, to assemble complex data processing pipelines.

Workshop Focus and Goals


The goal of the workshop is to bring together researchers and practitioners to explore the state of the art in workflow-based programming for data-intensive applications, and the potential of cloud-based computing in this area. Concretely, the workshop is expected to provide insight into:
  • performance: efficient workflow-based data processing at scale, workflow parallelization
  • reproducibility and preservation of workflows
  • modelling: best practices in data-intensive workflow modeling and enactment,
  • applications: new applications that exploit the use of workflows for large-scale data processing.
Indeed, while it appears that workflow technology is well-positioned to benefit from the scalability of computing resources offered by a cloud infrastructure, at present only few examples of cloud-based workflow systems exist (Pegasus, eScience Central), along with experimental prototypes that show how MapReduce implementations can be exposed as workflow patterns. Conversely, the database-to-workflow trajectory would benefit from formal existing workflow models, as well as from the componentization of data processing functions, the visual programming paradigm, and the formal validation analysis tools that are currently available in the workflow space. However, some hybrid database-cloud solutions have already been developed, such as HadoopDB, which is an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. To further illustrate this convergence, below are some of the recent developments that workshop contributors may want to be aware of.
  • In business workflow management systems, progress has been made on developing expressive graphical languages to represent complex data-intensive workflows, e.g., in BPMN and YAWL. At the same time, the scalability of the enactment engines for large datasets continues to be an issue for some of the most popular scientific workflow systems, including Taverna, Kepler and Galaxy amongst others.
  • In the area of data integration, a number of data mashup framework and toolkits are available for ad hoc, often temporary integration across multiple heterogeneous data sources. These include Yahoo pipes and W3C’s XProc for pipelines of XML transformations, amongst others. The problem of optimizing such transformations is closely related to the problem in database systems where queries contain complex user-defined functions.
  • In the area of cloud computing and data management, new data storage and data transformation techniques have been developed to store data and execute complex computational tasks over it, in a distributed and scalable fashion, e.g., Google File System and Map-Reduce framework. This includes also the development of related easy-to-use data processing languages such as Yahoo’s Pig Latin.

Topics


The workshop aims to address issues of (i) Architecture, (ii) Models and Languages, (iii) Applications of cloud-based workflows. The topics of the workshop include, but as usual, are not strictly limited to:

Architectures:

  • scalable and parallel workflow enactment architectures,
  • cloud-based workflows,
  • efficient data storage for data-intensive workflows,
  • optimizing execution of data-intensive workflows,
  • workflow scheduling on HPC and cloud infrastructures.

Models, Languages:
  • design for reproducibility
  • languages for data-intensive workflows, data processing pipelines and data-mashups,
  • verification and validation of data-intensive workflows,
  • workflow-based programming models for cloud computing,
  • access control and authorisation models, privacy, security, risk and trust issues,
  • workflow patterns for data-intensive workflows.

Applications of cloud-based workflow:
  • bioinformatics,
  • data mashups,
  • semantic web data management,
  • big data analytics.

Acknowledgements



The SWEET organizers are grateful to
for sponsoring the workshop.






Ċ
Paolo Missier,
Nov 28, 2012, 6:42 AM