Data is fast becoming a crucial, if not defining, asset for researchers. Entire fields, including those new to large scale computation (e.g., Science of Science), are quickly embracing data-driven research. However, the ever increasing scale and complexity of data and analyses is creating unique new challenges for researchers in the computational sciences. Today, the need for highly specialized computation has allowed a handful of well resourced researchers to design bespoke models, analytical tools, and metrics around powerful but highly guarded datasets, thereby balkanizing the research landscape. As a result, the returns to science and society on investment in these datasets has been comparatively low.
Without advances in collaborative, computational data infrastructure, producing and leveraging large, highly sensitive datasets will remain both overly expensive and inordinately challenging (technically). Given the current funding environment and resources allocated to, for example, SBE research, both the compute infrastructure and the ‘data science’ tools, techniques, and methods required to truly capitalize on the deluge of high quality but sensitive data are, sadly, only available to an elite few researchers. Reducing these barriers will not only democratize access to data and techniques, but also create new opportunities for deriving, obtaining, and disseminating new datasets. To address these challenges we require research infrastructure that can safely and securely democratize access to sensitive data, massive storage and compute, as well as the analytical tools and techniques needed to produce transformative research.
These challenges motivate our goal in convening this full day workshop. We aim to bring together a diverse group of experts around the motivating applications that illustrate the opportunities inherent in new approaches; the legal, policy, and ethical frameworks that define the risks associated with inappropriate data release or analyses; and the data management, data analysis, security, and other technologies that determine what the shape and scope of flexibly accessible data enclaves can and should be. Specifically, will charge this group with defining the new principles that may govern the collection and use of proprietary or human subjects data this new era, and with identifying promising new methods, policies, and technologies that may allow for the realization of those principles in different settings.
Our workshop’s panels and presentations will aim at addressing six significant challenges facing a data driven scientific community:
9:00am - 9:15am: Welcome and Opening Remarks, Ian Foster
9:15am - 10:00am: (Invited Speaker) Mark Gahegan, Director, Center for eResearch, The University of Auckland
10:00am - 10:30am: A Review of Privacy and Consent Management in Healthcare: A Focus on Emerging Data Sources. Muhammad Rizwan Asghar, TzeHowe Lee, Mirza Mansoor Baig, Ehsan Ullah, Giovanni Russello, Gillian Dobbie
10:30am - 11:00am: BREAK
11:00am - 11:30am: Safe Double Blind Studies as a Service. Tyler J. Skluzacek, Suhail Rehman and Ian Foster.
11:30am - 12:15pm: (Invited Speaker) Sue Bridger, Principal Solution Specialist, Microsoft New Zealand
12:15pm - 1:30pm: LUNCH
1:30pm - 2:00pm: A Comparative Evaluation of Blockchain Systems for Application Sharing Using Containers. Ian Taylor, Jarek Nabrzyski and Joel Neidig
2:00pm - 2:30pm: ERMRest: A Collaborative Data Catalog with Fine Grain Access Control. Karl Czajkowski, Carl Kesselman, Robert Schuler
2:30 - 3:00pm: Safe Collections and Stewardship on Cloud Kotta. Yadu Babuji, Kyle Chard, Eamon Duede and Ian Foster
3:00pm: BREAK
Liz MacPherson
Chief Executive and Government Statistician
Statistics New Zealand
Workshop: October 24th, 9:00am
Submission deadline: July 3rd, 2017 July 14th, 2017 (Extended)
Camera ready version due: August 23rd, 2017