Workshop on Data Engineering for Data Science (DE4DS)
Ralf Schenkel, Universität Trier | Ansgar Scherp, Universität Ulm
Workshop 3
✳ Tue, 7th
☷ 09:00 - 12:30
☷ 13:30 - 15:00
☉ APB E006
Data engineering is a crucial part of any data science project: Data collection and metadata management are the prerequisite of any meaningful analysis and, in practice, take up the bulk of time spent in data science projects. This workshop is an initiative of the DBIS working group „Data Engineering for Data Science“. We intend to provide a venue for discussions, interactions and collaborations on the potential of data management research to data science projects.
09:00 - 10:00 (Room APB E001!)
Keynote by Markus Stocker: Machine actionable scientific information: State of the art and the road ahead
10:30 - 11:30 Provenance
W-3 1 [↗PDF] A Provenance Management Framework for Knowledge Graph Generation in a Web Portal (Kleinsteuber, Erik; Babalou, Samira; König-Ries, Birgitta)
W-3 2 [↗PDF] MLProvLab: Provenance Management for Data Science Notebooks (Kerzel, Dominik; König-Ries, Birgitta; Sheeba, Samuel)
W-3 3 [↗PDF] Recursive SQL and GPU-Support for In-Database Machine Learning (Schüle, Maximilian Emanuel)
11:30 - 12:30 Fairness, Responsibility
W-3 4 [↗PDF] VERIFAI - A Step Towards Evaluating the Responsibility of AI-Systems (Göllner, Sabrina; Tropmann-Frick, Marina)
W-3 5 [↗PDF] FAIR is not enough – A Metrics Framework to ensure Data Quality through Data Preparation (Restat, Valerie; Klettke, Meike; Störl, Uta)
W-3 6 [↗PDF] Predictive Maintenance for the Optical Synchronization System of the European XFEL: A Systematic Literature Survey (Grünhagen, Arne; Tropmann-Frick, Marina; Eichler, Annika; Fey, Görschwin)
13:30-15:00 Mining
W-3 7 [↗PDF] Data Extraction for Associative Classification using Mined Rules in Pediatric Intensive Care Data (Das, Pronaya Prosun)
W-3 8 [↗PDF] Reliable Rules for Relation Extraction in a Multimodal Setting (Engelmann, Björn; Schaer, Philipp)
W-3 9 [↗PDF] Workload Prediction for IoT Data Management Systems (Burrell, David; Chatziliadis, Xenofon; Tzirita Zacharatou, Eleni; Zeuch, Steffen; Markl, Volker)
W-3 10 [↗PDF] SportsTables: A new Corpus for Semantic Type Detection (Langenecker, Sven; Sturm, Christoph; Schalles, Christian; Binnig, Carsten)
We call for papers that report on novel contributions in this area, such as:
Interplay between data engineering and data science.
Dedicated database and dataflow architectures.
Managing data and event streams.
Scalable data processing in data science.
Managing metadata in data science projects.
Data provenance in data science projects.
Reproducibility and replicability of data analysis.
Knowledge discovery in data science applications.
Data and information visualization.
Data and information flow engineering and management.
Privacy preserving data, information, and information systems.
Impact of data quality and data preprocessing on the fairness of ML predictions.
Data management in machine learning applications.
Definition, execution and optimization of complex data science pipelines.
Dataset preprocessing and data integration for data science projects.
Handling of non-i.i.d. data for data science projects like graphs.
Development of dedicated benchmarks for evaluating data engineering solutions.
Reports about data science projects.
We also call for short reports on work in progress, applications, and tools (e.g., interesting use cases, problems, data sets, benchmarks, visionary ideas, system designs, and descriptions of system components and tools).
In addition to these regular submissions, we also welcome presentations of papers already accepted (any, maybe, presented) at major database conferences or journals, as long as they are a good match for the topics.
Submission format
Regular Papers: up to 12 pages ↗ LNI, plus unlimited references
Short reports on work-in-progress, applications, and tools: up to 6 pages LNI, plus unlimited references
Presentations of papers accepted at major database conferences and journals: 1 page LNI with the abstract and a reference to the original paper
All accepted papers will be presented at the workshop. Regular papers and short reports (but not papers accepted at major database conferences and journals) will be published in the proceedings of the BTW conference. A selection of best papers of the workshop will be invited to submit an extended article to the Datenbank Spektrum.
Organization
Ralf Schenkel, Universität Trier
Ansgar Scherp, Universität Ulm