Workshop on Dynamic Distributed Data-Intensive Applications, Programming Abstractions, and Systems (3DAPAS)

To be held in conjunction with HPDC-2011, 8 June 2011, San Jose, CA

There has been a lot of effort in managing and distributing tasks where computational loads are dominant. Such applications have after all, been historically the drivers of "grid" computing.  There has, however, been relatively less effort on tasks where the computational load is matched by the data load, or even dominated by the data load. For such tasks to be able to operate at scale, there are conceptually simple run-time trade-offs that need to be made, such as determining whether to move data to compute versus keeping data localized and move computational tasks to operate on the data in situ, or possibly neither, and with data regenerated on-the-fly. Due to fluctuating resource availability and capabilities, as well as insufficient prior information about application requirements, such decisions must be made at run-time. Furthermore, resource, connectivity and/or storage constraints may require the data to be manipulated in-transit so that it is “made-right” for the consumer. Currently it is very difficult to implement these dynamic decisions or the underlying mechanisms in a general-purpose and scalable fashion.

Although the increasing volumes and complexity of data will make many problems data load dominated, the computational requirements will still be high.  In practice, data-intensive applications will encompass data-driven applications.  For example, many data-driven applications will involve computational activities triggered as a consequence of independently created data; thus it is imperative for an application to be able to respond to unplanned changes in data load or content.  Therefore, understanding how to support dynamic computations is a fundamental, but currently missing element in data-intensive computing.

This workshop will operate at the triple point of dynamic, distributed and data-intensive (3D) attributes. It will also focus on innovative approaches for scalability in the end-to-end real-time processing of scientific data. We refer to 3D applications as those are data-intensive, need to support and respond to dynamic data, and, either are fundamentally, or need to be, distributed. We are interested in papers that span the spectrum from the design of cyberinfrastructure to support 3D applications, to novel application examples. We are also looking to bring researchers together to look at holistic, rather than piecewise, approaches to the end-to-end processing and managing of scientific data.

3DAPAS builds upon a 3 year research theme on Distributed Programming Abstractions (DPA), which has held a series of related workshops (see: DPA Past Events) including but not limited to e-Science2008, EuroPar 2008 and the CLADE series. 3DAPAS will also draw on ideas from the ongoing 3DPAS Research Theme funded by the NSF and UK EPSRC.

Topics of interest include but are not limited to:

  • Case studies of development, deployment and execution of representative 3D applications
  • Programming systems, abstractions, and models for 3D applications
  • What are the common, minimally complete, characteristics of 3D application?
  • What are major barriers to the development, deployment, and execution of 3D applications? What are the primary challenges of 3D applications at scale?
  • What patterns exist within 3D applications, and are there commonalities in the way such patterns are used?
  • How can programming models, abstraction and systems for data-intensive applications be extended to support dynamic data applications?
  • Tools, environments and programming support that exist to enable emerging distributed infrastructure to support the requirements of dynamic applications (including but not limited to streaming data and in-transit data analysis)
  • Data-intensive dynamic workflow and in-transit data manipulation
  • Abstractions and mechanisms for dynamic code deployment and "moving the code to the data"
  • Application drivers for end-to-end scientific data management
  • Runtime support for in-situ analysis
  • System support for high end workflows
  • Hybrid computing solutions for in-situ analysis
  • Technologies to enable multi-platform workflows