What are the key enablers of trust within a dataspace?
Data governance
Establishes rules and policies for data management, ensuring data quality, security, and compliance with relevant regulations.
Participant roles and responsibilities
Defines the roles of different participants (e.g. data owners, data consumers, intermediaries) and their responsibilities within the dataspace.
Access and usage policies
Specifies who can access data, under what conditions, and how data can be used.
Compliance and certification
Ensures participants and components comply with standards and regulations through certification processes and regular audits.
Trust framework
A set of rules that builds and maintains trust among participants through specifying machine actionable identity management, authentication, and authorization mechanisms. According to the International Data Spaces Association (IDSA), “[a] trust anchor is an entity that issues certifications about an attribute. The accompanying trust framework is the set of rules imposed by the trust anchor to comply with its policies.”
Dispute resolution
Provides mechanisms for resolving conflicts and disputes between participants regarding data usage and compliance through the clearing house.
Monitoring and auditing
Implements continuous monitoring and auditing to ensure compliance with policies and detect any breaches or misuse of data.
What functions, or building blocks, are required in a dataspace?
Dataspace connector
The connector provides the interface between different participants’ systems and enables data transfer. A connector provides a common way to express, negotiate, and document the rules under which data is shared, and also with whom. It is not just in plain text but also machine-readable and technically enforceable.
Broker
The broker, or catalogue, acts as an intermediary that facilitates the discovery and connection of data providers and consumers. It helps participants find and access the datasets and services available within the dataspace. This is roughly the equivalent function of what ARDC Research Data Australia, AODN, or the Australian Data Archive provides.
Clearing house
The clearing house tracks and logs data exchanges, provides transparency and accountability in the usage of data and ensures that data transactions within the dataspace are conducted according to the agreed-upon terms.
Vocabulary hub
The vocabulary component refers to the standardised set of terms, definitions, and data models used within the dataspace to ensure consistency and interoperability across different systems and organisations. This is roughly equivalent to the service ARDC Research Vocabularies Australia provides.
Identity provider
The identity provider manages the authentication and verification of participants in the dataspace, ensuring that only trusted entities and authorised users can access and share data, thereby maintaining security and trust. This can be centralised or decentralised as part of a connector.
App store
The app store within a dataspace provides a curated selection of applications, tools, and services that participants can use to process, analyse, and interact with data. The coherent integrated approach to data management within a dataspace supports business intelligence applications, and machine learning or artificial intelligence applications. Services could include those in EcoCommons, Biosecurity Commons or Galaxy Australia.
What is a dataspace connector?
A dataspace connector is the service that provides access to data in accordance with the rules defined in the governance framework of that dataspace. A connector serves as the interface through which data owners and data consumers exchange data securely by enforcing access and usage policies.
There are a variety of dataspace connectors, including a number of popular open source connectors which vary in complexity and levels of security provided. Some examples include the FIWARE TRUE Connector, the Prometheus-X Dataspace Connector, the Eclipse (EDC) Connector, and the IDS Connector. View an example of a connector UI. Another description of ARDC connectors from ARDC is here.
Sovity demo of connector and associated UI
A full-service connector provides:
Security
It includes measures like encryption, authentication, and authorisation to ensure secure data exchanges.
Interoperability
It supports various data formats and communication protocols to enable smooth integration between different participants’ systems.
Data sovereignty
It enforces data usage policies to ensure that data providers retain control over their data.
Standardised interfaces
It uses standard APIs and protocols to facilitate consistent and reliable communication between participants.
Auditability
It monitors and logs actions to track data exchanges to enable compliance tracking against regulations.
Data connectors are secure and compartmentalised components, ensuring data flows safely between participants. While similar to quarantining, which blocks all traffic until threats are eliminated, compartmentalisation reduces risks to the entire system by limiting the impact of potential breaches. Compartmentalisation ensures controlled and monitored data exchanges can continue safely, isolating specific components to contain issues without requiring complete isolation from the entire network.
Typically, the clearinghouse provides reports on data transactions including the number of transactions, ensuring transparency by logging what data has been shared and under what conditions, including compliance with rules.
The data connectors relay necessary information about transactions to the clearinghouse to ensure adherence to governance policies and enable tracking for audits.
How does a dataspace address the high costs of custom data sharing agreements?
Dataspaces address this issue by offering a standardised approach to data governance and handling. This standardisation significantly reduces the need for crafting custom agreements and technical solutions for each data exchange. By using a common framework for data sharing agreements and interoperable secure systems, dataspaces streamline the process, lowering the time, effort, and costs associated with negotiating and enforcing these agreements. Additionally, the transparent governance provided by dataspaces ensures that all participants adhere to the established rules, further reducing the need for complex and costly bespoke arrangements.
Establishing bespoke, bilateral data sharing agreements is usually a time-consuming, highly granular, and costly process. This can create significant barriers to effective data sharing.
How does a dataspace allow me to retain control of my restricted access data?
In many data sharing arrangements, data owners effectively lose control over their data once it leaves their infrastructure, relying on the data consumer’s internal processes to adhere to agreed terms. A dataspace mitigates this concern by ensuring that data owners retain full control over their data through automated enforcement mechanisms implemented by the dataspace connector and other associated components, such as an authentication service, a broker (catalogue), and a clearing house (Figure 2).
Additionally, before accessing any data, data consumers must become members of the dataspace and complete an onboarding process. This process ensures that they understand the correct data usage policies and the consequences of misuse. Once onboarded, data consumers are subject to automated monitoring and compliance verification systems that continuously track data usage to ensure adherence to the established rules. Further, potential audits can be conducted to further verify that data consumers are abiding by these conditions. These measures provide data owners with greater assurance that their data remains under their control throughout its lifecycle, even after it has been shared.
If a data consumer misuses data, the provider has several options, such as:
revoking access
deleting data from the consumer’s environment
imposing fines or financial penalties
taking legal action.
Misuse can also result in reputational damage, loss of certification and increased scrutiny through additional audits.
How does a dataspace address the high costs of custom data sharing agreements?
Establishing bespoke, bilateral data sharing agreements is usually a time-consuming, highly granular, and costly process. This can create significant barriers to effective data sharing.
Dataspaces address this issue by offering a standardised approach to data governance and handling. This standardisation significantly reduces the need for crafting custom agreements and technical solutions for each data exchange. By using a common framework for data sharing agreements and interoperable secure systems, dataspaces streamline the process, lowering the time, effort, and costs associated with negotiating and enforcing these agreements. Additionally, the transparent governance provided by dataspaces ensures that all participants adhere to the established rules, further reducing the need for complex and costly bespoke arrangements.
What makes a dataspace interoperable?
Common standards are the key to interoperability. The more standardised the data, governance, semantics and infrastructure are, the easier a dataspace is to set up and manage.
Common standards include common data models, data formats, reference architectures, communication protocols etc. Further, by using shared vocabularies and ontologies, dataspaces can facilitate high levels of semantic interoperability.
Dataspaces provide a unified governance structure and policies as well as the technical infrastructure like APIs and middleware to facilitate exchange as agreed. Again, decentralised federated governance and related flexible modular infrastructure can facilitate interoperability with less standardised organisations, but growth in connectivity is associated with increased standardisation.
Dataspaces enable the creation of dynamic data ecosystems by fostering secure and decentralised data sharing between multiple stakeholders, including industries, governments, and research institutions. These ecosystems are built on distributed infrastructures, where data remains with the data owner and is shared only when needed, ensuring data sovereignty and trust. Dataspaces integrate participants across different domains through interoperable standards and common governance frameworks, facilitating smooth data exchange without the need for centralised control.
Figure 4. Scenarios of various data ecosystems and connected dataspaces. Source: Moller et al. 2024