OIMS

Latest news

Collaboration with Excellence in Breeding

May 23, 2019. The OIMS working group is joining forces with the team in EIB Platform developing an Enterprise Breeding System. The EBS is based on difference applications linked through a service layer that acts a s a data control. What was missing in the framework was a standard for metadata exchange. Moreover, OIMS offers scope for communication between applications based on metadata.

Presentations and discussions

May 21, 2019. Over the next few weeks we will be organizing a series of presentations and discussions on OIMS, the philosophy behind it, the key elements and the gaps that need to be filled (also see Events): [past events are in italics]

May 23, 2019: CIMMYT HQ and Zoom
May 23, 2019: EiB-EBS team
May 28, 2019: GEMS platform, UMN, Minneapolis, St. Paul MN, USA
June 27, 2019: CGIAR metadata working group
date TBD: CGIAR Platfrm for Big data in Agriculture webinar
date 2020 date TBD: at CCSS at Utrecht University

Interested to join the team or want a presentation as part of an event contact us.

Introduction

source: courtesy of Gideon Kruseman

Context

Increasingly funders are requiring publicly funded research organizations to make the data they collect available as global public goods. While this is the main driver behind many open access/ open data initiatives, there are other more compelling reasons to work toward well-organized data repositories. The cost of collecting data (again) often outweighs the costs of organizing it, and once it is well-organized the data can be repurposed for other research. Very often only a part of the data that is collected is actually used in the research for which it was initially intended and making the data accessible can create a lot of value for little money.

Good metadata is at the center of any data environment focussed at preventing the data lake from turning into a data swamp (read more..)

The key for managing the data lake is good metadata. However for that metadata to be useable it needs to be interoperable and machine-actionable. At the same time it needs to be human-readable, otherwise no one will be able to use it. We have been developing a metadata schema to do just that. OIMS is a human-readable, machine-actionable flexible and extensible, ontology-independent metadata schema focused on making messy socio-economic data FAIRER. FAIRER stands for Findable, Accessible, Interoperable, Reusable, Ethical and responsible, Reproducible guidelines for data. It builds on the FAIR data guidelines, but takes it further.

Philosophy

OIMS is a platform-independent, ontology-agnostic, machine-readable and human-intelligible, flexible and extensible metadata schema. Originally, it was intended to make the messy socio-economic data, consisting of structured, semi-structured and unstructured data with a high degree of variability and veracity, interoperable. Interoperability is a key aspect of the effort to make CGIAR open data FAIRER or FAIRRR (findable, accessible, interoperable, reusable, ethical or responsible and reproducible).

Because interoperability in an interdisciplinary and trans-disciplinary setting where we operate implies cutting across scientific disciplines, the development of OIMS has been in close collaboration with biological sciences data managers.

We distinguish three types of metadat:

technical metadata
descriptive metadata
structural metadata

Principles of OIMS

Introduction

The development of a machine-actionable, human-readable, platform and ontology independent, flexible and extensible meta-data schema that can address the issues of making socio-economic data interoperable with data both within the realm of social sciences as well as with data from other scientific disciplines has been a key goal of the Community of Parctice on Socio-economic Data.

Design principles

Data entities

The notion of data entities is based on the fact that what we understand by data varies. At a high level of aggregation we have data collections or studies. These may have sub-collections. Within this concept we have data sets. Data sets are groupings of one or more files with a persistent identifier. The files are the tangible assets and within those files we can have things like tables, variables and records. It is clear that there is a hierarchy amongst the data entities. Not all entity types are relevant for each case we are describing. Moreover, we distinguish between primary secondary and metadata data entities. Primary data entities Are core data entities such as collections, data sets, data files, tables, variables. The secondary data entities are supporting documentation and information linked to a primary data entity. Any data entity can have metadata. In order to understand how these data entities are related we need to document this in a standardized form.

JSON

We have opted to use the JSON format for the metadata schema, because it is flexible and can be read by both humans and machines.

More information about JSON can be found on the JSON website.

Source: Courtesy of Gideon Kruseman

Foundation

The foundation of the OIMS philosophy is a relatively small metadata schema that describes itself. It can also be used to describe any any schema that describes a particular metadata schema. This means that any metadata schema can be used as long as i someone takes the effort to describe the way the schema is structured in terms of the basic schema. This schema can be used to describe any other schema describing schemas.

Some specific OIMS schemas

Data entity linkages

to be completed

metadata schema information

to be completed

Developing an ontology to describe OIMS

Introduction

The ontology describing OIMS is done in collaboration with the Ontologies working group of the BIG DATA Platform.

Data entities

We distinguish between data entity objects (DataEntityObject) and data entity concepts (DataEntityConcepts). (read more...)

More information

Contact:

Effort coordinator: Gideon Kruseman <g.kruseman@cgiar.org>

Links

Google Sites

Report abuse