Hackathon:

Value in Knowledge Graphs

Privacy-Preserving Information Extraction with Bloom Filters

Hackathon is a part of workshop and will be conducted after the paper presentations on Sunday June 3. 2018

Tutors

Nils Lukas (Fraunhofer FIT, RWTH Aachen University)

Dr Oya Beyan (Fraunhofer FIT, RWTH Aachen University)

Dr. Ali Hasnain (Insight Centre for Data Analytics, National University of Ireland, Galway)



Introduction

Knowledge Graphs in the Biomedical domain are especially large in size and may capture data from many different topics. Integration of complementary graphs leads to an even more complex graph that captures more information and is thus more valuable. As part of the Personal Health Train (PHT) concept, we explore a privacy-preserving approach to predicting the value that integration. Being able to find suitable data sources without compromising their content allows to train machine learning algorithm on that source without compromising the data itself. As an effect, all benefits from using machine learning can be leveraged in domains where privacy plays a fundamental role. To make this possible, in this short Hachathon we will be working with Knowledge Graphs that capture Biomedical data, Information Extraction and Bloom Filters.

Why should you join?

You are dearly invited to join us for this hackathon if…

  • … you are interested about combining privacy and knowledge discovery in biomedical systems
  • … you want to learn more about information extraction from Knowledge Graphs
  • … you wonder why open data needs to handle privacy issues
  • … you like practical work such as programming and experimenting and want to learn about new frameworks in this emerging field

Don’t worry about having little to no pre-knowledge, we are happy to work with both novice and advanced participants.


What will be achieved?

At the beginning of the hackathon, we will give a short introduction to the prerequisites, such as bloom filters, general privacy issues and frameworks that can be used (Python or KNIME). Then, each team will be given a unique Knowledge Graph onto which they can apply information retrieval techniques to build up some experience with the given framework. Next, we will apply the Bloom Filters and discuss suitable metrics for valuing an unseen knowledge graph based on a query response that may contain false positives. Finally, each team should formulate queries for estimating the worth of an unseen Knowledge Graph and ultimately make a decision about which other teams Knowledge Graph complements their own Knowledge Graph the best.


What practical issues can be solved with this?

The Personal Health Train (PHT) concept, allows for training highly-specialized machine learning algorithms on private, distributed data (e.g. medical data in hospitals), without compromising the data at all. A reliable, privacy-preserving method of finding the most useful Knowledge Graph among many for training purposes is a fundamental research element to the project, which you will contribute to in this hackathon.