The 1st international workshop on
Scalable Workflow Enactment Engines and Technologies (SWEET'12)
held in conjunction with the 2012 SIGMOD conference in the city of Scottsdale, Arizona, USA on May 20th, 2012
News
Accepted papers and slides from presentations are available for download.
As an added incentive to publish papers in SIGMOD workshops, SIGMOD organizers plan to include a plenary poster session for workshop papers in the conference program. During this session, workshop authors who have registered for the main conference are given the option of presenting their work to the SIGMOD conference audience.
Keynotes and Tutorials
SWEET 2012 will feature the following keynotes and tutorials:
- (Keynote) Flexibility without Anarchy: Analytics Infrastructure at Twitter
Speaker: Jimmy Lin, Twitter
- (Keynote) Data Processing Workflows @ Google
Speaker: Pawel Garbacki, Google Inc. - (Tutorial) Oozie: a workflow engine for Hadoop
Speaker: Mohammad Islam, Yahoo/Cloudera
Motivation
One of the goals of computer system engineering has always been to develop systems that are easy to use and understand, but at the same time put great computational power at the fingertips of the end users. The cloud computing model has the potential for making this a realistic goal in the area of business and scientific data processing, by enabling simple access to large pools of data storage and computational resources. More specifically, we observe that cloud computing is facilitating the convergence of workflow-based processing with traditional data management, thereby providing users with the best of both worlds. Workflows are used extensively, both for business applications and in computational science. Common to the broad range of workflow systems currently in use are their relatively simple programming models, which are usually exposed through a visual programming style, and are backed by a well-defined model of computation. While the flexibility of workflows for rapid prototyping of science pipelines makes them appealing to computational scientists, recent applications of workflow technology to data-intensive science shows the need for a robust underlying data management infrastructure. At the same time, on the data management side of science and business, workflow-like models and languages are beginning to emerge, to make it possible for users with no application development resources but close to the data domain, to assemble complex data processing pipelines.
Workshop Focus and Goals
The goal of the workshop is to bring together researchers and practitioners to explore the state of the art in workflow-based programming for data-intensive applications, and the potential of cloud-based computing in this area. Concretely, the workshop is expected to provide insight into:
- performance: efficient workflow-based data processing at scale, workflow parallelization
- reproducibility and preservation of workflows
- modelling: best practices in data-intensive workflow modeling and enactment,
- applications: new applications that exploit the use of workflows for large-scale data processing.
- In business workflow management systems, progress has been made on developing expressive graphical languages to represent complex data-intensive workflows, e.g., in BPMN and YAWL. At the same time, the scalability of the enactment engines for large datasets continues to be an issue for some of the most popular scientific workflow systems, including Taverna, Kepler and Galaxy amongst others.
- In the area of data integration, a number of data mashup framework and toolkits are available for ad hoc, often temporary integration across multiple heterogeneous data sources. These include Yahoo pipes and W3C’s XProc for pipelines of XML transformations, amongst others. The problem of optimizing such transformations is closely related to the problem in database systems where queries contain complex user-defined functions.
- In the area of cloud computing and data management, new data storage and data transformation techniques have been developed to store data and execute complex computational tasks over it, in a distributed and scalable fashion, e.g., Google File System and Map-Reduce framework. This includes also the development of related easy-to-use data processing languages such as Yahoo’s Pig Latin.
Topics
The workshop aims to address issues of (i) Architecture, (ii) Models and Languages, (iii) Applications of cloud-based workflows. The topics of the workshop include, but as usual, are not strictly limited to:
Architectures:
- scalable and parallel workflow enactment architectures,
- cloud-based workflows,
- efficient data storage for data-intensive workflows,
- optimizing execution of data-intensive workflows,
- workflow scheduling on HPC and cloud infrastructures.
Models, Languages:
- design for reproducibility
- languages for data-intensive workflows, data processing pipelines and data-mashups,
- verification and validation of data-intensive workflows,
- workflow-based programming models for cloud computing,
- access control and authorisation models, privacy, security, risk and trust issues,
- workflow patterns for data-intensive workflows.
Applications of cloud-based workflow:
- bioinformatics,
- data mashups,
- semantic web data management,
- big data analytics.
Acknowledgements
The SWEET organizers are grateful to
