11th Workshop on Scientific Cloud Computing

ScienceCloud 2021

program

The workshop is scheduled virtually on June 21, 2021. Times are EDT (UTC - 4:00). Connection link available upon registration. Workshop start time : 6am PDT, 8am CDT, 9am EDT (USA), 3pm CEST (France, Sweden, Netherlands), 10pm JST (Japan).

The workshop recording is available here. Passcode: j*54wb?%


keynote speaker: kate Keahey

Kate Keahey is one of the pioneers of infrastructure cloud computing. She created the Nimbus project, recognized as the first open source Infrastructure-as-a-Service implementation, and continues to work on research aligning cloud computing concepts with the needs of scientific datacenters and applications. To facilitate such research for the community at large, Kate leads the Chameleon project, providing a deeply reconfigurable, large-scale, and open experimental platform for Computer Science research. To foster the recognition of contributions to science made by software projects, Kate co-founded and serves as co-Editor-in-Chief of the SoftwareX journal, a new format designed to publish software contributions. Kate is a Scientist at Argonne National Laboratory and a Senior Fellow at the Computation Institute at the University of Chicago.

Taking Science from Cloud to Edge

New research ideas require an instrument where they can be developed, tested — and shared. To support Computer Science experiments such instrument has to support a diversity of hardware configurations, deployment at scale, deep reconfigrability, and mechanisms for sharing so that new results can trigger further innovation. Most importantly -- since science does not stand still – such instrument requires constant adaptation to support an ever increasing range of experiments driven by emergent ideas and opportunities.

The NSF-funded Chameleon testbed for systems research and education (www.chameleoncloud.org) has been developed to provide all those capabilities. The testbed provides many thousands of cores and over 5PB of storage hosted at three sites (University of Chicago, Texas Advanced Computing Center, and Northwestern) connected by 100 Gbps networks. The hardware consists of a large homogenous partitions to facilitate experiments at scale, alongside a diverse set of hardware consisting of accelerators, storage hierarchy nodes with a mix of HDDs, SDDs, and NVMe, high-bandwidth I/0 storage, SDN-enabled networking hardware, and edge devices. To support experiments ranging from work in operating systems through networking to edge computing, Chameleon provides a range of reconfigurability options from bare metal to virtual machine management. To date, the testbed has supported 5,000+ users and 700+ research and education projects and has just been renewed until the end of 2024.

This talk will describe the goals, design strategy, and the existing and future capabilities of the testbed, as well as some of the research and education projects our users are working on. I will also describe how Chameleon is evolving to support new research directions, in particular edge and IoT-based research and applications. Finally, I will introduce the services and tools we created to support sharing of experiments, curricula, and other digitally expressed artifacts that allow science to be shared via active involvement and foster reproducibility.

eScience Invited talks

Christoph Kessler: Portable high-level programming of heterogeneous parallel systems with SkeP

Abstract:

We live in the era of parallel and heterogeneous computer systems, with multi- and many-core CPUs, GPUs and other types of accelerators being omnipresent. The execution and programming models exposed by modern computer architectures are diverse, parallel, heterogeneous, distributed, and far away from the sequential von-Neumann model of the early days of computing. Yet, the convenience of single-threaded programming, together with technical debt from legacy code, make us mentally stick to programming interfaces that follow the familiar von-Neumann model, typically extended with various platform-specific APIs that allow to explicitly control parallelism and accelerator usage. High-level parallel programming based on generic, portable programming constructs known as algorithmic skeletons can raise the level of abstraction and bridge the semantic gap between a sequential-looking, platform-independent single-source program code and the heterogeneous and parallel hardware. We present the design principles of one such framework, the latest generation of our open-source programming framework SkePU for heterogeneous parallel systems and clusters. The SkePU high-level programming interface is based on modern C++, leveraging variadic template metaprogramming and a custom source-to-source pre-compiler. SkePU provides currently seven (fully variadic) skeletons for data-parallel patterns such as map, reduce, stencil, scan etc., together with high-level data abstractions for skeleton call operands. SkePU can perform automated optimizations of the high-level execution flow, such as context-dependent best-backend selection among the supported platforms, operand data transfer and memory optimizations.


Amit Chourasia: Democratizing Scientific Data Management with SeedMeLab

Abstract:

Researchers have an increasing need to manage preliminary and transient results. This evolving and increasing corpus of data needs to be paired with its contextual information and findings. However, this amalgamation of data, metadata, context, and insights is often highly fragmented and dispersed on many systems including local/remote file servers, emails, presentations, and meeting notes. Much of this information becomes increasingly cumbersome to assimilate, use and reuse over time. Researchers often create ad-hoc strategies to manage this data using variety of tools that are loosely glued together, however this fragile system runs into many limitations and burdens them with requiring continued investments. In this talk I will introduce SeedMeLab – an open-source scientific data management system which overcomes limitations of ad-hoc systems by providing a robust feature set that includes data annotation, data discussion, data visualization and discoverability along with modular extensibility. It also provides full ownership, access control and branding; and can be deployable On-Premises or On-Cloud and available as a managed service.


Ben van Werkhoven: GPU code optimization and auto-tuning made easy with Kernel Tuner

Abstract:

Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade, and are seen as one of enabling factors in recent breakthroughs in Artificial Intelligence. While GPUs are used to enable scientific computing workloads in many fields, including climate modeling, artificial intelligence, and quantum physics, it is actually very hard to unlock to full computational power of the GPU. This is because there are many degrees of freedom in GPU programming, and often there are only a handful of specific combinations of thread block dimensions and other code optimization parameters, like tiling or unrolling factors, that result in dramatically higher performance than other kernel configurations. To obtain such highly-efficient kernels it is often required to search vast and discontinuous search spaces that consist of all possible combinations of values for all tunable parameters, which is infeasible to do by hand. This talk gives a brief introduction of Kernel Tuner, an easy-to-use tool for testing and auto-tuning OpenCL, CUDA, and Fortran, and C kernels with support for many search optimization algorithms that accelerate the tuning process.


Eric Coulter: HPC From the Ground to the Cloud

Abstract:

This talk describes how we've created and used a “Virtual Cluster toolkit”(VC) and “Container Template Library”(CTL) for building elastic, cloud-based HPC resources to support both Science Gateways and new HPC administrators. The VC toolkit has been invaluable for training for new HPC administrators and users in addition to creating production resources for approximately 25 projects, including SeaGrid, UltraScan, and multiple fast-track COVID-19 research projects. This was based on work from the XSEDE Cyberinfrastructure and Resource Integration (XCRI) team and the Cyberinfrastructure Integration Research Center (CIRC). These teams have been addressing the problem of disconnected and unequally distributed research computing on three fronts: by helping under-resourced institutions build XSEDE-like resources (XCRI), by providing container templates to ease installation and increase portability of scientific software (XCRI), and by enabling easy access to compute resources through science gateways (CIRC). The XCRI team provides both toolkits and hands-on consultation to growing institutions. CIRC supports scientific software developers through Science Gateways using the Airavata middleware, in addition to working with gateway providers to develop software and grow their community. Collaboration between the two resulted in the creation of the toolkits, which have proven beneficial to users and researchers at the edges of the cyberinfrastructure community in the US. In particular, the VC and CTL support several needs surrounding scientific software development in areas less served by established community software, such as rapid development cycles, easing deployment to extant HPC systems, and efficient use of limited cloud resources.

Workshop Overview

The 11th workshop on Scientific Cloud Computing (ScienceCloud) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running scientific computing workloads in such a complex ecosystem that results from the convergence of cloud, big data and machine learning. The focus is on the use of cloud-based technologies to meet new convergence challenges that are not well served by the current supercomputers and dedicated data centers. The workshop aims to address questions such as: What architectural changes to the current cloud frameworks (hardware, operating systems, networking and/or programming models) are needed to support the triple convergence? Dynamic information derived from remote instruments, coupled simulations, and sensor ensembles that stream data for real-time analysis and machine learning are important emerging trends. How can cloud technologies enable and adapt to these patterns? How are scientists using clouds? Are there scientific workloads that are suitable candidates to take advantage of emerging cloud computing resources with high efficiency? What factors are limiting the use of clouds or would make them more usable and efficient?

Topics

  • Scientific application cases studies on cloud infrastructures

  • Performance evaluation of cloud environments and technologies

  • Fault tolerance and reliability in cloud systems

  • Data-intensive workloads and tools in clouds

  • Use of programming models (e.g. Spark, Map-Reduce) and their implementations in cloud settings

  • Scalable and elastic architectures for cloud storage and I/O

  • Workflow and resource management in the cloud

  • Use of cloud technologies (e.g., NoSQL databases) for scientific applications

  • Data streaming and dynamic applications in the cloud

  • Heterogeneous resources (network, storage, compute) and edge/fog infrastructure

  • Application of cloud concepts in HPC environments or vice versa

  • Virtualized high performance I/O network interconnects

  • Virtualization, containers, and dynamic provisioning

  • Analysis of management complexity, cost, variability, and reproducibility of cloud and IoT environments

  • Stream processing for scientific applications in the cloud

  • Edge, Fog and hybrid Cloud - Edge / Fog computing

  • High-performance AI and deep learning frameworks (e.g. Tensorflow) and their implementation in cloud settings

  • Scalability and cost-effective elasticity of AI and deep learning (e.g. data-parallel training) for cloud infrastructures

SubmissioN

Important Dates:

Paper submission deadline: April 14, 2020 Extended: April 21 AoE

Paper notification due: April 25, 2021

Camera ready papers: April 28, 2021

Workshop: June 21, 2021


Paper Categories:

Authors are invited to submit:

    • Full 8-page papers

    • Short/work-in-progress 4-page papers

Formatting:

Authors are invited to submit papers describing unpublished, original research. All submitted manuscripts should be formatted using the ACM Master Template with sigconf format (please be sure to use the current version). All necessary documentation can be found at: https://www.acm.org/publications/proceedings-template. The maximum lengths are 8 and 4 pages (including all text, figures, and references). All papers must be in English. We use single-blind reviewing process, so please keep the authors names, publications, etc., in the text.

Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM Digital Library.

Papers conforming to these guidelines should be submitted through EasyChair: https://easychair.org/conferences/?conf=sciencecloud2021

Chairs

Alexandru Costan, IRISA / INSA Rennes, France (alexandru.costan@irisa.fr)

Bogdan Nicolae, Argonne National Laboratory, USA (bogdan.nicolae@acm.org)

Kento Sato, RIKEN Center, Japan (kento.sato@riken.jp)

Program Committee

Michael Sevilla, University of Santa Cruz, USA

Daniel S. Katz, University of Illinois Urbana-Champaign, USA

Dongfang Zhao, University of Nevada, USA

Elena Apostol, Universitatea Politehnica Bucharest, Romania

Kevin Brown, Argonne National Laboratory, USA

Anthony Kougkas, Illinois Institute of Technology, USA

Ryan Chard, Argonne National Laboratory, USA

Teng Wang, Florida State University, USA

Takakki Fukai, RIKEN, Japan

Radu Prodan, University of Klagenfurt, Austria

Mustafa Rafique, Rochester Institute of Technology, USA

Michael Schroettner, University of Duesseldorf, Germany

Pedro Silva, Hasso Plattner Institute, Germany