Safe Data: Paradigms & Platforms

A workshop on enabling collaborative analysis of sensitive data

eScience 2017 | October 24th | Auckland, New Zealand

Workshop Description

Data is fast becoming a crucial, if not defining, asset for researchers. Entire fields, including those new to large scale computation (e.g., Science of Science), are quickly embracing data-driven research. However, the ever increasing scale and complexity of data and analyses is creating unique new challenges for researchers in the computational sciences. Today, the need for highly specialized computation has allowed a handful of well resourced researchers to design bespoke models, analytical tools, and metrics around powerful but highly guarded datasets, thereby balkanizing the research landscape. As a result, the returns to science and society on investment in these datasets has been comparatively low.

Without advances in collaborative, computational data infrastructure, producing and leveraging large, highly sensitive datasets will remain both overly expensive and inordinately challenging (technically). Given the current funding environment and resources allocated to, for example, SBE research, both the compute infrastructure and the ‘data science’ tools, techniques, and methods required to truly capitalize on the deluge of high quality but sensitive data are, sadly, only available to an elite few researchers. Reducing these barriers will not only democratize access to data and techniques, but also create new opportunities for deriving, obtaining, and disseminating new datasets. To address these challenges we require research infrastructure that can safely and securely democratize access to sensitive data, massive storage and compute, as well as the analytical tools and techniques needed to produce transformative research.

These challenges motivate our goal in convening this full day workshop. We aim to bring together a diverse group of experts around the motivating applications that illustrate the opportunities inherent in new approaches; the legal, policy, and ethical frameworks that define the risks associated with inappropriate data release or analyses; and the data management, data analysis, security, and other technologies that determine what the shape and scope of flexibly accessible data enclaves can and should be. Specifically, will charge this group with defining the new principles that may govern the collection and use of proprietary or human subjects data this new era, and with identifying promising new methods, policies, and technologies that may allow for the realization of those principles in different settings.

Our workshop’s panels and presentations will aim at addressing six significant challenges facing a data driven scientific community:

  1. Research data must be stored, accessed, analyzed, shared, and published in a secure manner while simultaneously ensuring that the technical restrictions do not inhibit research and innovation.
  2. Cutting edge, best practice data analytics techniques are not readily accessible to the “99%” of computational scientists. The technical “chasm” that exists between large and small scale research groups compounds the competitive advantage of well resourced groups.
  3. Research data is increasingly recognized as a primary scientific output. In order to increase research impact and maximize returns on research investments, researchers require that methods and data disseminate easily to enable others to explore and analyze that data.
  4. Large-scale, specialized compute resources are required to keep pace with advances in analytics and the velocity of data. While cloud providers like Amazon Web Services offer “unlimited” capacity to users, there is a significant technical hurdle to obtain the expertise required to take advantage of cloud capacity.
  5. Compute resources are often geographically and administratively distant from the storage systems on which datasets reside. Thus, costly (both in terms of time and financial cost) data transfers are required to move increasingly large data.
  6. Legal, policy, and ethical frameworks around handling highly sensitive data have not evolved in step with technologies. As a result, many capabilities exist that, if supported by appropriately current frameworks, could yield enormous benefits to science and society.

Agenda: October 24th

9:00am - 9:15am: Welcome and Opening Remarks, Ian Foster

9:15am - 10:00am: (Invited Speaker) Mark Gahegan, Director, Center for eResearch, The University of Auckland

10:00am - 10:30am: A Review of Privacy and Consent Management in Healthcare: A Focus on Emerging Data Sources. Muhammad Rizwan Asghar, TzeHowe Lee, Mirza Mansoor Baig, Ehsan Ullah, Giovanni Russello, Gillian Dobbie

10:30am - 11:00am: BREAK

11:00am - 11:30am: Safe Double Blind Studies as a Service. Tyler J. Skluzacek, Suhail Rehman and Ian Foster.

11:30am - 12:15pm: (Invited Speaker) Sue Bridger, Principal Solution Specialist, Microsoft New Zealand

12:15pm - 1:30pm: LUNCH

1:30pm - 2:00pm: A Comparative Evaluation of Blockchain Systems for Application Sharing Using Containers. Ian Taylor, Jarek Nabrzyski and Joel Neidig

2:00pm - 2:30pm: ERMRest: A Collaborative Data Catalog with Fine Grain Access Control. Karl Czajkowski, Carl Kesselman, Robert Schuler

2:30 - 3:00pm: Safe Collections and Stewardship on Cloud Kotta. Yadu Babuji, Kyle Chard, Eamon Duede and Ian Foster

3:00pm: BREAK

Keynote Speaker

Liz MacPherson

Chief Executive and Government Statistician

Statistics New Zealand

Call for Papers

Important Dates

Workshop: October 24th, 9:00am

Submission deadline: July 3rd, 2017 July 14th, 2017 (Extended)

Camera ready version due: August 23rd, 2017


Eamon Duede (UChicago) & Ian Foster (UChicago)

Program Committee:

Julia Lane (NYU)

Beth Plale (Indiana U.),

Adrian Thorogood (Global Alliance for Genomics and Health),

Charlie Catlett (Argonne National Lab),

Bill Howe (U. Washington),

Liz MacPherson (Statistics New Zealand),

Ari Feldman (UChicago)

Sponsored by