The current ”rising tide of scientific data” accelerates the need for e-infrastructures to support the lifecycle of data in research, from creation to reuse [RTW]. Different types of e-infrastructures address this need. Consortia like GÉANT and EGI build technical infrastructures for networking and grid computing whereas social sciences & humanities (SSH) consortia like ESS, SHARE, CESSDA, CLARIN and DARIAH build SSH research infrastructures upon such technical infrastructures, providing services to support digital research within the social sciences and the humanities. These services vary from online survey tools and repositories for creating and preserving data to resolvers and authentication services for enabling seamless access to different services or data.Together these services form a data infrastructure. Apart from building and maintaining such data infrastructure, important functions of research infrastructures include e.g. standardisation and engagement.
Challenges
The consortia and their infrastructures are primarily challenged by their own distributed nature. They consist of organisations who provide various types of services in different countries, with different business models & national environments, leading to different implementation of the services and standards. In addition, interdisciplinary research requires interdisciplinary reuse of of data and services, which makes the challenges even greater. An important challenge is to federate similar services, or to connect complimentary services within and across their data infrastructures. A prerequisite for addressing these challenges is a common understanding about their data infrastructures, their objectives, functions and solutions. Currently these data infrastructures are often troubled by fundamental misunderstandings about concepts like data and collections or archives and virtual research environments (VREs).
A common vision, model and architecture are instrumental in addressing these challenges. They provide the foundation for a common understanding. From the common understanding it is possible to compare the data infrastructures and identify where their components are similar and where they diverge. They provide a framework to identify and relate important aspects for central stakeholders. This allows the alignment of the technical implementations with the high-level goals.
The model is addressed here is a reference model. The reference model is an abstract model that provides a common comprehension of data infrastructures, which is fundamental to both the vision and the architecture. lt identifies and describes the concerns and related behaviour that the data infrastructures share and translates these into common types of information and computation. The reference model can guide the different consortia in defining these elements and corresponding policies for their data infrastructures. This allows for consistent internal development, maintenance and operation. Successively, the model allows structured comparison of the data infrastructures and identify opportunities for the development of interoperable or even shared services and standards.
This work is conducted as part of the DASISH FP7 project, which brings together the 5 SSH consortia to identify and address their synergies. It describes a reference model for a generic data infrastructure for social sciences and/or humanities based on the experience of the authors and an inventory of documentation from the existing SSH research infrastructures. It utilises the framework from the Reference Model of Open Distributed Computing [RM-ODP] to allow the description of the distributed nature and to support the translation from high-level representations into more technical descriptions. The reference model is in a draft state, as it is yet to be trialed by the SSH Research Infrastructures. As such, this version is not yet an authoritative reference model. However, it may be used as a good methodology and example for describing specific data infrastructures. Their results should feed back into this reference model and contribute to the development of the reference architecture. As such, the primary audience for this work are those involved in defining and building data infrastructures. Secondarily, this work should help other readers to understand and discuss data infrastructures.
The document is divided into three parts: Part one provides this introduction, the background on research infrastructures and architectural methods and a motivation for the current reference model. Part two describes the three viewpoints, respectively the enterprise viewpoint, the information viewpoint and the computational viewpoint. Part three discusses the current version and how it can be applied. Appendix A will provides examples of applying the reference model.
[RTW] Riding The Wave. http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf
[RM-ODP] Reference Model for Open Distributed Processing