This chapter describes SSH data infrastructures from an information viewpoint. It starts with an explanation of the viewpoint, its language and its construction. Then it will describe the information types within SSH data infrastructures.
This viewpoint the focus is on the information without considering representation, implementation or distribution details or platform-specific execution details to create a common abstract model for the shared information in the system. The aim of the information viewpoint is to provide a common understanding of the model of the shared information for all stakeholders of the reference model. The information viewpoint is independent from functions that transform and manipulate the data and the computational interfaces. This viewpoint defines a configuration of information objects, their behaviours, actions that can be performed upon the information object and constraints.
Information objects model the data about entities in the real world, their state and behaviour and interactions with other information objects. An information object has a type that share a common set of features and behaviours.
Processing of information is modelled as information actions also have types, which characterise the actions that share the same properties. Actions cause state changes in the information objects.
The information viewpoint language also defines three kinds of schema:
Dynamic schema, allowing organisation and description in terms of behaviours and state changes of information.
Invariant schema, allowing organisation and description in terms of predicates constraining information objects.
Static schema, allowing for configuration of assertions about information objects at a point in time (a given state).
Modelling Approach
The information types are identified from an inventory of the communities. These were then abstracted to common information types that are independent of the communities. For the purpose of overview, we have grouped these information types into four categories: Data, Agent, Service and Contract.
Each of the information types is then described in terms of typical invariant, possible states and transactions between those states. However, these descriptions are illustrative as we have not been able to derive these from the existing research infrastructures.
Currently, in this initial version, we have looked at existing communities, their behaviours and identified the information types that are used by these communities and have grouped for the purpose of overview and are likely to share common attributes and actions in the schemata.
This overview gives the identifies the information Object types and groups them for further discussion.
Observations that result from research activity are collected into a dataset, usually of similar format and structure. Concepts are used to consistently and semantically describe the the observations and dataset and concepts can be part of an ontology which is an inventory information type. Additional information about the observations is collected and created as metadata, which can be included in an inventory to facilitate discoverability and future reuse. Standards are used to ensure consistency in the definition, structure and description of the data information object types.
Data information object types can be content objects (observation, concept, and metadata), or container objects (dataset, inventory) grouping similar content object types.
Observation
An observation information object type consists of the data and metadata recorded as a result of an observation conducted.
Dataset
A dataset is a collection information object type consisting of a number of observation information objects.
Concept
A concept is a an abstracted information object type that is used to describe another information object in a controlled format. For example a thesaurus term, or a entity that has been formally defined.
Inventory
An inventory is a collection consisting of metadata about information objects, or concepts. Inventories are used to uniformly describe information objects to facilitate the activities of a research infrastructure.
Metadata
Metadata is additional data about an informational object and can be "descriptive" or "structural" in nature. A standard is normally employed to define the form of the metadata takes.
Typically the information actions performed upon these information object types are CRUD (Create, Read, Update and Delete) actions. Observations are often subject to processing actions such as combination, harmonisation, transformation, and transferal.
The research lifecycle determines the transitions between states. Data information objects tend to transition through the following significant state changes:
private to public
raw to processed
temporary to persistent
Identified information object types are agents, groups and roles. Agents are typically human users or machines acting on behalf of human users. They can be assigned roles, giving them certain privileges. In order to give multiple agents the same role, they can become member of the same group.
These types of information objects are usually administrative and used for either authorization or for capturing provenance. For authorization, the information can become inactive, as roles can be withdrawn. For provenance such information should still be retained.
Identified information object types are services and instances. Services can be human- or automated services and are described by their behaviour or interaction. Services are instantiated for a specific context. In this case, the parameters of the context are described.
These information objects are used to promote the services as well as for provenance. These information objects may change over time. These information objects may be part of contracts.
Identified information object types are standards and contracts. Contracts usually exist between two agents, whereas standards exist within a community. They are used to describe agreements between two agents or within a community.
The most important states and transaction of a contracts concern their validity.