Biological Assay Digitalization

About

In the biotechnology and biomedical domains, recent text mining efforts advocate for machine-interpretable, and preferably, semantified, documentation formats of laboratory processes. This includes wet-lab protocols, (in)organic materials synthesis reactions, genetic manipulations and procedures for faster computer-mediated analysis and predictions. In this project, we devised the representation of semantified bioassays in the Open Research Knowledge Graph (ORKG). In particular, a semantification system was developed to generate, automatically and quickly, the critical semantified bioassay data mass needed to foster a consistent user audience to adopt the ORKG for recording their bioassays and facilitate the organisation of research, according to FAIR principles.

The ORKG-Assays microservice for Biological Assays Semantification

A bioassay is, by definition, a standard biochemical test procedure used to determine the concentration or potency of a stimulus (physical, chemical, or biological) by its effect on living cells or tissues. Toward machine-interpretability, a biossay description can be represented as fine-grained semantified triples using the Bioassay ontology (BAO) with main information categories such as perturbagen, format, design, detection technology, meta target, endpoint, that need to be captured in order for them to be a meaningful semantic representation, and which imports other ontologies as well such as the Cell Line Ontology (CLO), Gene Ontology (GO), and the NCBI Taxonomy.

To this end, the ORKG-Assays microservice supports the rapid assimiliation of digitalized knowledge in the scholarly data domain of biological assays (bioassays) with the help of an AI clustering-approach-based semantification service. The coronavirus pandemic situation has shed critical light on advancing the drug development research lifecycle for which bioassays are crucial, hence we the automatic semantification of this domain was addressed.

Jennifer D'Souza, Anita Monteverdi, Muhammad Haris, Marco Anteghini, Kheir Eddine Farfar, Markus Stocker, Vitor AP Santos, and Sören Auer. The Digitalization of Bioassays in the Open Research Knowledge Graph. arXiv preprint arXiv:2203.14574 (2022). The citation for our accepted paper in the DeXa 2022 proceedings is forthcoming.

Marco Anteghini, Jennifer D'Souza, Vitor AP Santos, and Sören Auer. Easy Semantification of Bioassays. arXiv preprint arXiv:2111.15182 (2021). The official citation for our accepted paper in the AiXiA 2021 proceedings is forthcoming.

Early Concept Development

Herein, we present our early development work on the representation of semantified bioassays in the Open Research Knowledge Graph (ORKG). In particular, we describe a transformer-model-based semantification system as a work-in-progress model to generate, automatically and quickly, the critical semantified bioassay data mass needed to foster a consistent user audience to adopt the ORKG for recording their bioassays and facilitate the organisation of research, according to FAIR principles.

Marco Anteghini, Jennifer D’Souza, Vitor AP Martins dos Santos, and Sören Auer. Representing semantified biological assays in the Open Research Knowledge Graph. In International Conference on Asian Digital Libraries, pp. 89-98. Springer, Cham, 2020. [Preprint available https://arxiv.org/abs/2009.07642]

Marco Anteghini, Jennifer D'Souza, Vitor AP Martins dos Santos, and Sören Auer. SciBERT-based Semantification of Bioassays in the Open Research Knowledge Graph. In Proceedings of the EKAW 2020 Posters and Demonstrations Session co-located with 22nd International Conference on Knowledge Engineering and Knowledge Management (EKAW 2020). Aachen: RWTH, 2020. A demonstration video of our work-in-progress concept is shown below.

Project Beginnings: #EUvsVirus Hackathon

Back in 2020, at the height of the Covid 19 pandemic, hackathons were beginning to be organized at global and member state level across the world. The European Commission - in close collaboration with EU member states – then hosted a pan-European hackathon to connect civil society, innovators, partners and buyers across Europe to develop innovative solutions to coronavirus. We participated as a 4 member team and proposed the idea of the digitalized publishing of Biological Assays. Our team submission was promoted as the following video. Check it out!