Home

The 1st international workshop on
Scalable Workflow Enactment Engines and Technologies (SWEET'12)

held in conjunction with  the 2012 SIGMOD conference in the city of Scottsdale, Arizona, USA on May 20th, 2012


News


Accepted papers and slides from presentations are available for download.

As an added incentive to publish papers in SIGMOD workshops, SIGMOD organizers plan to include a plenary poster session for workshop papers in the conference program. During this session, workshop authors who have registered for the main conference are given the option of presenting their work to the SIGMOD conference audience.


Keynotes and Tutorials

SWEET 2012 will feature the following keynotes and tutorials:
  • (Keynote) Flexibility without Anarchy: Analytics Infrastructure at Twitter
    Speaker: Jimmy Lin, Twitter
  • (Keynote) Data Processing Workflows @ Google
    Speaker: Pawel Garbacki, Google Inc.

  • (Tutorial) Oozie: a workflow engine for Hadoop
    Speaker: Mohammad Islam, Yahoo/Cloudera
For more details see the keynotes and tutorials page.

Motivation


One of the goals of computer system engineering has always been to develop systems that are easy to use and understand, but at the same time put great computational power at the fingertips of the end users. The cloud computing model has the potential for making this a realistic goal in the area of business and scientific data processing, by enabling simple access to large pools of data storage and computational resources. More specifically, we observe that cloud computing is facilitating the convergence of workflow-based processing with traditional data management, thereby providing users with the best of both worlds. Workflows are used extensively, both for business applications and in computational science. Common to the broad range of workflow systems currently in use are their relatively simple programming models, which are usually exposed through a visual programming style, and are backed by a well-defined model of computation. While the flexibility of workflows for rapid prototyping of science pipelines makes them appealing to computational scientists, recent applications of workflow technology to data-intensive science shows the need for a robust underlying data management infrastructure. At the same time, on the data management side of science and business, workflow-like models and languages are beginning to emerge, to make it possible for users with no application development resources but close to the data domain, to assemble complex data processing pipelines.

Workshop Focus and Goals


The goal of the workshop is to bring together researchers and practitioners to explore the state of the art in workflow-based programming for data-intensive applications, and the potential of cloud-based computing in this area. Concretely, the workshop is expected to provide insight into:
  • performance: efficient workflow-based data processing at scale, workflow parallelization
  • reproducibility and preservation of workflows
  • modelling: best practices in data-intensive workflow modeling and enactment,
  • applications: new applications that exploit the use of workflows for large-scale data processing.
Indeed, while it appears that workflow technology is well-positioned to benefit from the scalability of computing resources offered by a cloud infrastructure, at present only few examples of cloud-based workflow systems exist (Pegasus, eScience Central), along with experimental prototypes that show how MapReduce implementations can be exposed as workflow patterns. Conversely, the database-to-workflow trajectory would benefit from formal existing workflow models, as well as from the componentization of data processing functions, the visual programming paradigm, and the formal validation analysis tools that are currently available in the workflow space. However, some hybrid database-cloud solutions have already been developed, such as HadoopDB, which is an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. To further illustrate this convergence, below are some of the recent developments that workshop contributors may want to be aware of.
  • In business workflow management systems, progress has been made on developing expressive graphical languages to represent complex data-intensive workflows, e.g., in BPMN and YAWL. At the same time, the scalability of the enactment engines for large datasets continues to be an issue for some of the most popular scientific workflow systems, including Taverna, Kepler and Galaxy amongst others.
  • In the area of data integration, a number of data mashup framework and toolkits are available for ad hoc, often temporary integration across multiple heterogeneous data sources. These include Yahoo pipes and W3C’s XProc for pipelines of XML transformations, amongst others. The problem of optimizing such transformations is closely related to the problem in database systems where queries contain complex user-defined functions.
  • In the area of cloud computing and data management, new data storage and data transformation techniques have been developed to store data and execute complex computational tasks over it, in a distributed and scalable fashion, e.g., Google File System and Map-Reduce framework. This includes also the development of related easy-to-use data processing languages such as Yahoo’s Pig Latin.

Topics


The workshop aims to address issues of (i) Architecture, (ii) Models and Languages, (iii) Applications of cloud-based workflows. The topics of the workshop include, but as usual, are not strictly limited to:

Architectures:

  • scalable and parallel workflow enactment architectures,
  • cloud-based workflows,
  • efficient data storage for data-intensive workflows,
  • optimizing execution of data-intensive workflows,
  • workflow scheduling on HPC and cloud infrastructures.

Models, Languages:
  • design for reproducibility
  • languages for data-intensive workflows, data processing pipelines and data-mashups,
  • verification and validation of data-intensive workflows,
  • workflow-based programming models for cloud computing,
  • access control and authorisation models, privacy, security, risk and trust issues,
  • workflow patterns for data-intensive workflows.

Applications of cloud-based workflow:
  • bioinformatics,
  • data mashups,
  • semantic web data management,
  • big data analytics.

Acknowledgements


The SWEET organizers are grateful to
for sponsoring the workshop.