Safe Data: Paradigms & Platforms
A workshop on enabling collaborative analysis of sensitive data
eScience 2017 | October 24th | Auckland, New Zealand
Data is fast becoming a crucial, if not defining, asset for researchers. Entire fields, including those new to large scale computation (e.g., Science of Science), are quickly embracing data-driven research. However, the ever increasing scale and complexity of data and analyses is creating unique new challenges for researchers in the computational sciences. Today, the need for highly specialized computation has allowed a handful of well resourced researchers to design bespoke models, analytical tools, and metrics around powerful but highly guarded datasets, thereby balkanizing the research landscape. As a result, the returns to science and society on investment in these datasets has been comparatively low.
Without advances in collaborative, computational data infrastructure, producing and leveraging large, highly sensitive datasets will remain both overly expensive and inordinately challenging (technically). Given the current funding environment and resources allocated to, for example, SBE research, both the compute infrastructure and the ‘data science’ tools, techniques, and methods required to truly capitalize on the deluge of high quality but sensitive data are, sadly, only available to an elite few researchers. Reducing these barriers will not only democratize access to data and techniques, but also create new opportunities for deriving, obtaining, and disseminating new datasets. To address these challenges we require research infrastructure that can safely and securely democratize access to sensitive data, massive storage and compute, as well as the analytical tools and techniques needed to produce transformative research.
These challenges motivate our goal in convening this full day workshop. We aim to bring together a diverse group of experts around the motivating applications that illustrate the opportunities inherent in new approaches; the legal, policy, and ethical frameworks that define the risks associated with inappropriate data release or analyses; and the data management, data analysis, security, and other technologies that determine what the shape and scope of flexibly accessible data enclaves can and should be. Specifically, will charge this group with defining the new principles that may govern the collection and use of proprietary or human subjects data this new era, and with identifying promising new methods, policies, and technologies that may allow for the realization of those principles in different settings.
Our workshop’s panels and presentations will aim at addressing six significant challenges facing a data driven scientific community:
- Research data must be stored, accessed, analyzed, shared, and published in a secure manner while simultaneously ensuring that the technical restrictions do not inhibit research and innovation.
- Cutting edge, best practice data analytics techniques are not readily accessible to the “99%” of computational scientists. The technical “chasm” that exists between large and small scale research groups compounds the competitive advantage of well resourced groups.
- Research data is increasingly recognized as a primary scientific output. In order to increase research impact and maximize returns on research investments, researchers require that methods and data disseminate easily to enable others to explore and analyze that data.
- Large-scale, specialized compute resources are required to keep pace with advances in analytics and the velocity of data. While cloud providers like Amazon Web Services offer “unlimited” capacity to users, there is a significant technical hurdle to obtain the expertise required to take advantage of cloud capacity.
- Compute resources are often geographically and administratively distant from the storage systems on which datasets reside. Thus, costly (both in terms of time and financial cost) data transfers are required to move increasingly large data.
- Legal, policy, and ethical frameworks around handling highly sensitive data have not evolved in step with technologies. As a result, many capabilities exist that, if supported by appropriately current frameworks, could yield enormous benefits to science and society.
Beth A. Plale
Director, Data to Insight Center
Science Director, Pervasive Technology Institute (PTI)
Professor of Informatics, School of Informatics and Computing
Indiana University Bloomington
Chief Executive and Government Statistician
Statistics New Zealand
Call for Papers
We invite submissions of research on methods for addressing any of the 6 challenges listed above. Authors are invited to submit papers of 4-8 pages in length.
Authors are invited to submit unpublished, original work, using the IEEE 8.5 × 11 manuscript guidelines: double-column text using single-spaced 10 point font on 8.5 × 11 inch pages. Templates are available from:
Papers conforming to the above guidelines can be submitted through the workshop's paper submission system:
All submissions will be peer-reviewed. Accepted papers will be published in the companion proceedings of the eScience 2017 conference.
July 3rd, 2017
Camera ready version due: August 23rd, 2017