Abstracts and Bio of speakers

Scalability and data security: deep learning with health data on future HPC platforms, Georgia (Gina) Tourassi, ORNL

Abstract

Performing health data analytics at scale presents several challenges to classic HPC environments. Datasets contain personal health information (PHI) and are updated regularly, complicating data access on publicly accessible HPC systems. Moreover, the diverse group of tasks and models – ranging from neural networks for information extraction to knowledge bases for predictive modeling – have widely varying scales, hardware preferences, and software requirements. Both exascale systems and cloud-based environments have the opportunity to play important roles by addressing data security and performance portability. Cloud platforms provide out-of-the-box solutions for maintaining data security, while recent work has extended secure computing environments to systems like OLCF Summit. In this talk I will discuss how we are handling the need for scalable HPC resources with the data security requirements inherent in working with personal health information, in the context of the interagency partnership between the Department of Energy and the National Cancer Institute. As part of this partnership, we are developing state-of-the-art deep learning models to perform information extraction from cancer pathology reports for near real-time cancer incidence reporting. Our approach to addressing the patient privacy complexities involves integral roles for both traditional HPC resources and cloud-like platforms, playing to the relative strengths of both modalities.

Presenter Bio

Georgia (Gina) Tourassi received a B.S. degree in physics from the Aristotle University of Thessaloniki, Greece and a Ph.D. in biomedical engineering from Duke University. She joined ORNL in 2011 as the director of the Biomedical Sciences and Engineering Center after a long academic career in the department of radiology and the medical physics graduate program at Duke University Medical Center. She is currently distinguished research scientist, director of the Health Data Sciences Institute (HDSI), and group leader of the Biomedical Science, Engineering, and Computing (BSEC) group at ORNL. As institute director, Dr. Tourassi develops and manages its strategic agenda, scientific priorities, and roadmap while still leading her independent research activities. In addition, she is adjunct professor of radiology at Duke University and the University of Tennessee Graduate School of Medicine, joint UT-ORNL faculty of Mechanical, Aerospace, and Biomedical Engineering at the University of Tennessee at Knoxville and the Bredesen Center.

Dr. Tourassi’s research background and interests are in medical imaging, computer-assisted decision making, clinical informatics, human-computer interaction, and scalable data-driven biomedical discovery. She has made major contributions as a medical imaging informatics researcher, a pioneer in digital cancer epidemiology, and a leader in health data sciences. In medical imaging informatics, her research has been featured in numerous high profile publications and resulted in one patent, four invention disclosures, and a 2014 R&D100 Award. Building upon her multi-modality knowledge discovery expertise and taking advantage of the unique computing resources at ORNL, Dr. Tourassi pursued a new big health data research direction in digital cancer epidemiology for which she secured competitive funding under the Provocative Research Questions mechanism from the National Cancer Institute. This research has resulted in a pending patent, four invention disclosures, placement on the 2015 R&D100 finalist list, and recognition by NCI as a highly successful investigator.

HPC and Cloud Operations at CERN, Maria Girone, CERN openlab

Abstract

CERN was established in 1954, with the mission of advancing science for peace and exploring fundamental physics questions — primarily through elementary particle research. The Large Hadron Collider (LHC) at CERN is the world's most powerful particle accelerator colliding bunches of protons 40 million times every second. This extremely high rate of collisions makes it possible to identify rare phenomenon and to declare new discoveries such as the Higgs boson in 2012. The high-energy physics (HEP) community has long been a driver in processing enormous scientific datasets and in managing the largest scale high-throughput computing centres. Today, the Worldwide LHC Computing Grid is a collaboration of more than 170 computing centres in 42 countries, spread across five continents. Recently demonstrations at scale of both commercial cloud providers and HPC centers have been performed.

In 2026 we will launch the High Luminosity LHC (HL-LHC), which will represent a true exa-scale computing challenge. The processing capacity required by the experiments is expected to be 50 to 100 times greater than today, with storage needs expected to be on the order of exabytes. Neither the rate of technology improvement nor the computing budget will increase fast enough to satisfy these needs and new sources of computing and new ways of working will be needed to fully exploit the physics potential of this challenging accelerator. The growth of commercial clouds and HPC centres into the exa-scale represents a huge opportunity to increase the potential total resource pool, but even together this ecosystem may not be sufficient to satisfy the needs of our scientific workflows. The total computing required is pushing us to investigate alternative architectures and alternative methods of processing and analysis. In this presentation we will discuss the R&D activities to utilize HPC and cloud providers. We will summarize our progress and challenges in operating on dedicated resources and on shared and purchased allocations on HPC and cloud. We will outline the biggest impedance issues to interoperating these facilities, which often have similar challenges for data handing and scale but very different challenges in flexibility and operations. We will close by addressing forward looking projects together with industry partners to utilize techniques like Machine Learning and optimized hardware to fundamentally change how many resources are needed to extract science from the datasets.

Presenter Bio

Maria has a PhD in particle physics. She also has extensive knowledge in computing for high-energy physics experiments, having worked in scientific computing since 2002.

Maria has worked for many years on the development and deployment of services and tools for the Worldwide LHC Computing Grid (WLCG), the global grid computing system used to store, distribute, and analyse the data produced by the experiments on the Large Hadron Collider (LHC).

Maria was the founder of the WLCG operations coordination team, which she also previously led. This team is responsible for overseeing core operations and commissioning new services.

Throughout 2014 and 2015, Maria was the software and computing coordinator for one of the four main LHC experiments, called CMS. She was responsible for about seventy computing centres on five continents, and managed a distributed team of several hundred people.

Prior to joining CERN, Maria was a Marie Curie fellow and research associate at Imperial College London. She worked on hardware development and data analysis for another of the LHC experiments, called LHCb — as well as for an experiment called ALEPH, built on the accelerator that preceded the LHC.

Cloud and Supercomputing Platforms at NCI Australia: the Why, the How and the Future, Allan Williams, NCI

Abstract

Australia’s National Computational Infrastructure (NCI Australia) is a tier 1 provider of high performance computing and data services for Australian researchers spanning the higher education, government agency and industry sectors. NCI’s HPC facility lies in the range 20-150 on the top500 list, depending on stage within its lifecycle. The facility also manages on the order of 70PB of high-performance data storage capacity, comprising both projectized data spaces as well as high curated, functionalized FAIR data collections of national significance. Alongside its HPC capability provisioning, NCI has run a cloud architecture for internal and selected external purposes and over the past 5 years has progressively evaluated the most effective functional role that a cloud infrastructure might play in the context of a national facility. Strategically, its current focus with cloud is to build the infrastructure for major research communities that have the demand; the national strategic priority and the resourcing capabilities to partner NCI in development of services and functionalities beyond the provision of “bare metal” hardware as an infrastructure. One of the significant technical challenges associated with this is the need for data analytics that access the petabyte-scale datasets residing on NCI’s high performance storage file systems, necessitating a level of data transfer bandwidth and compute resourcing that are not typical of “conventional” cloud. In this presentation I will give an overview of the above issues as NCI Australia encounters them presently, providing examples of current activities and sketching the future as we see it at this point.

Presenter Bio

Mr Allan Williams is the Associate Director for Services and Technology at the National Computational Infrastructure (NCI) based at the Australian National University in Canberra, Australia. Allan is responsible for all operational and infrastructure aspects of the NCI including: user and research support services, Australia’s high performance cloud, lustre filesystems and Raijin, Australia’s largest research HPC System. During 2019, he has been overseeing the installation and commissioning of the next generation research supercomputer expected for full operation by 2020 along with planning our new cloud.

OpenStack and the Software-Defined Supercomputer, Stig Telfer, StackHPC

Abstract

Two long-held aspirations, "agile supercomputing" and "performant cloud”, are converging. Supercomputing performance can now be achieved in a cloud-native context. Cloud flexibility ensures different classes of workload may be targeted at the most effective resources.

For private cloud, OpenStack has become the de facto standard for IaaS. This presentation will introduce an open project to create private and hybrid cloud infrastructure specifically aimed at addressing the requirements of HPC. Through pooled development effort and a continuous process of validation, many complexities and overheads commonly associated with OpenStack operation are eliminated.

By combining extensive experience of both cloud and HPC infrastructure the functional gaps and performance bottlenecks of conventional cloud infrastructure have been identified and addressed, while maintaining the advantages in agility, flexibility and interoperability heralded by the cloud-native era.

This presentation will provide technical insights, an update on the capabilities of modern OpenStack, and an opportunity to become involved in a project aimed to deliver on the promise of cloud without sacrifice to performance.

Presenter Bio

Stig has a background in R&D working for various prominent technology companies, particularly in HPC and software-defined networking. Stig is now CTO for StackHPC, a consultancy specialising in the convergence of cloud, HPC, AI and big data. Stig is also co-chair of the OpenStack Scientific SIG, a globally-distributed grouping of research institutions using OpenStack for research computing use cases.

Computing Without Borders: Combining Cloud and HPC to Advance Experimental Science, Deborah Bard, NERSC

Abstract

Computing has always been part of experimental science, and is increasingly playing a central role in enabling scientific discovery. Technological advances are providing more powerful supercomputers and super-efficient specialized chip architectures. It is also advancing instrument technology, resulting in higher resolution detectors that produce orders of magnitude more data and allow access to exciting new scientific domains. The scale of the compute needs of such instruments is typically mixed, sometimes requiring cloud or small cluster computing and sometimes requiring dedicated access to a supercomputer. There is a vital dependence on how quickly the compute power can be available and how quickly the data can be transferred to the compute site, as many experiments require real-time data analysis. Using real-life examples from high energy physics, microscopy and genomics, I will discuss how experimental science is taking advantage of both cloud and near-Exascale HPC resources. I will outline some of the challenges we see in operating such workflows across multiple sites, with a focus on system responsiveness and data management issues.

Presenter Bio

Dr Debbie Bard leads the Data Science Engagement Group at the National Energy Research Scientific Computing Center (NERSC) at Berkeley National Lab. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and worked at Imperial College London and SLAC National Accelerator Laboratory before joining NERSC, where her group supports supercomputing for experimental science. Her interests focuses around data-intensive computing and machine learning at scale.

Perform Like a Supercomputer, Run Like a Cloud, Stathis Papaefstathiou, Cray Inc

Abstract

Without a doubt cloud has reshaped the enterprise datacenter and there is an ongoing debate as to what extent this disruption will also spill into Supercomputing. Many of the concepts and capabilities that we see today in cloud were spearheaded by the Grid computing initiatives that this community introduced in the early 90s. However, the technology and economics have dramatically involved since then. Cloud is an overloaded term used to capture a wide range of characteristics including business models, technologies, operational models and user access paradigms. Although some of these considerations will be assessed based on the individual requirements of an organization, there are cloud technologies that are becoming industry standards and an integral part of the Supercomputing industry.

This presentation will focus on how leveraging cloud technologies adapted to meet the needs of Supercomputing workloads is a dominant trend that is empowering system administrators to be more productive and end users to have an experience familiar and analogous to cloud environments.

In a cloud based system management solution all capabilities are exposed to the system administrators and DevOps personnel as an open, programmable fabric that can be either integrated to a broader management ecosystem or operated separately with open source, commercial, or home-grown tools. This is the foundation for providing the necessary automation and programmability to easily expose capabilities such as policy-based self-service, multi-tenancy, and elasticity.

The cloudification of the supercomputer system management infrastructure is also a requirement for the interoperation and integration of supercomputers with public cloud environments. Although today most organizations make a binary decision to deploy their solution to a public or private cloud, it is possible that in the future hybrid cloud might become more prevalent in our industry, especially with the proliferation of processor architectures and silicon specialization.

Finally, our community has been in the forefront of researching technologies and solutions that meet the extreme requirements of our industry. The availability of an open and programable computer fabric removes any barriers to collect data and experiment.

Presenter Bio

Stathis has been leading R&D at Cray Inc since 2017. He was hired to drive the development of the exascale generation products including the Shasta Supercomputers, Slingshot Network, and Storage products. This effort included major refresh of the Cray technologies, engineering approaches, and overall architecture to meet the requirements of a fast moving HPC market. Prior to joining Cray, Stathis held senior leadership positions at Microsoft, F5 Networks, and other companies where he shipped cloud, enterprise networks, OS, and hardware products. Stathis received his PhD from the University of Warwick and worked as a researcher in the areas of performance modeling for HPC systems in academia and at Microsoft Research.