About the project

In this JCJC project we investigate approaches to data management enabling an efficient execution of geographically distributed workflows running on multi-site clouds. We focus on a common scenario where workflows generate and process a huge number of small files, which is particularly challenging with respect to data management. As such workloads generate a deluge of small and independent I/O operations, efficient data and metadata handling is critical. We will explore means to better hide latency for data and metadata access and optimise transfers as a way of improving the global performance. The targeted solution leverages both the workflow semantics (e.g. data-access patterns) and the practical tools available on today’s clouds (e.g. caching services for PaaS clouds) to propose several strategies for decentralized data management. The system will be leveraged by real-life applications from bio-informatics, smart cities and nuclear physics.

OverFlow proposes a new, pioneering paradigm: Workflow Data Management as a Service - a general and easy to use cloud provided service that bridges for the first time the gap between single- and multi-site workflow data management. It aims to reap economic benefits from the geo-diversity while accelerating the scientific discovery through a democratisation of access to globally distributed data.

The goal of OverFlow is to study, design, implement and evaluate the Workflow Data Management as a Service. It will treat data storage, metadata management and file transfers as first-class citizens by building on a consistent, global view of the entire distributed datacenter environment. It will be proposed by the cloud providers to non-specialists, enabling simply to use data management of multi-site applications.


Main objectives:

  • Workflow Data Management as a Service
  • Fast access to huge sets of small data objects
  • Enabling cloud-based distributed metadata management
  • New strategies for efficient data transfers to/from the cloud and between datacenters


Duration: 01.10.2015 - 30.09.2019

Code: ANR-15-CE25-0003

Coordinator: Alexandru COSTAN