ECMLPKDD Workshop on Automating Data Science (ADS2019)

Wurzburg, Germany, Friday 20 September 2019

Description

Scope

Data science is concerned with the extraction of knowledge and insight, and ultimately societal or economic value, from data. It complements traditional statistics in that its object is data as it presents itself in the wild (often complex and heterogeneous, noisy, loosely structured, biased, etc.), rather than well structured data sampled in carefully designed studies. It also has a strong computer science focus, and is related to popular areas such as big data, machine learning, data mining and knowledge discovery. It is therefore highly relevant to the ECMLPKDD community.

Data science is becoming increasingly important with the abundance of big data, while the number of skilled data scientists is lagging. This has raised the question as to whether it is possible to automate data science in several contexts. First, from an artificial intelligence perspective, it is interesting to investigate whether (data) science (or portions of it) can be automated, as it is an activity currently requiring high levels of human expertise. Second, the field of machine learning has a long-standing interest in applying machine learning at the meta-level, in order to obtain better machine learning algorithms, yielding recent successes in automated parameter tuning, algorithm configuration and algorithm selection. Third, there is an interest in automating not only the model building process itself (cf. the Automated Statistician) but also in automating the preprocessing steps (data wrangling) and the postprocessing steps (model deployment, monitoring and maintenance).

Topics

This ECMLPKDD workshop wants to bring together researchers from all areas concerned with data science in order to study whether, to what extent, and how data science can be automated. It will focus on the following Data Science topics:

Data Wrangling
Predictive Modeling
Exploratory Data Analysis
Inductive querying
Probabilistic Programming
Visual Analytics

and will aim at answering the following questions:

How can we automatically tune the parameters or configure algorithms? How can we apply this to machine learning and data science algorithms? This is related to expert / rule-based systems, information criteria, statistical learning theory, learning to learn, meta-learning, etc.
How can we assist users in their exploratory data mining tasks? Can we automate it? What type of interactivity is needed? How to obtain models of the user and of interestingness?
How can we support the data-wrangling process? How can inductive programming techniques help? Can it be realized fully automatically? What are the limitations and opportunities?
How can we automate data-driven story-telling? How can we explain learned models to the user? To what extent can natural language be used?
Can we (partially) automate Visual Analytics? Can we automatically visualize what is of interest to the user?
What is the trade-off between automation and interaction? To what extent is automation (un)desirable?
How can probabilistic programming and inductive querying techniques be used to facilitate data science ?
How can automation be married with the increasing tendency for personalization? With the impact on privacy and society of data science, are there any additional ethical issues to be taken into account?
How can the logs and other recorded information from the work of data scientists be used for assisting and automating part of their workflow?
Data Science for the expert versus for the layperson: different optimal trade-offs?

Detailed programme

10:30-11:15 Invited talk: Holger Hoos

11:15-12:30 Contributed talks (15 mins each incl. handover & questions)

The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas. Michael P. J. Camilleri and Christopher K. I. Williams. [paper]
Automating Common Data Science Matrix Transformations. Lidia Contreras-Ochando, Cesar Ferri and Jose Hernandez-Orallo. [paper]
DeepNotebooks: Deep Probabilistic Models Construct Python Notebooks for Reporting Datasets. Claas Alexander Voelcker, Alejandro Molina, Johannes Neumann, Dirk Westermann and Kristian Kersting. [paper]
Significance of Patterns in Data Visualisations. Rafael Savvides, Andreas Henelius, Emilia Oikarinen and Kai Puolamäki. [paper]
Supervised Human-guided Data Exploration. Emilia Oikarinen, Kai Puolamäki, Samaneh Khoshrou and Mykola Pechenizkiy. [paper]

12:30-12:45 Poster spotlights (2 minutes each incl. handover)

Meta-learning Based Evolutionary Clustering Algorithm. Dmitry Tomp, Sergey Muravyov, Andrey Filchenkov and Vladimir Parfenov. [paper]
Hyperboost: Hyperparameter optimization bygradient boosting surrogate models. Jeroen van Hoof and Joaquin Vanschoren. [paper]
Overview and unifying conceptualization of Automated Machine Learning. Zhengying Liu, Zhen Xu, Meysam Madadi, Julio Jacques Junior, Sergio Escalera, Shangeth Rajaa and Isabelle Guyon. [paper]
Automating Feature Construction for Multi-View Time Series Data. Arne De Brabandere, Pieter Robberechts, Tim Op De Beéck and Jesse Davis. [paper]
HyperUCB: Hyperparameter Optimization using Contextual Bandits. Maryam Tavakol, Sebastian Mair and Katharina Morik. [paper]
AutoxgboostMC – A system for multi-criteria AutoML. Florian Pfisterer, Stefan Coors, Janek Thomas and Bernd Bischl. [paper]
The autofeat Python Library for Automated Feature Engineering and Selection. Franziska Horn, Robert Pack and Michael Rieger. [paper]
Learning to go with the flow: on the adaptability of automated machine learning to evolving data. Bilge Celik and Joaquin Vanschoren. [paper]

Lunch break

14:00-14:45 Invited talk: Jesse Davis

14:45-15:45 Contributed talks (15 mins each incl. handover & questions)

Generic adaptation strategies for automated machine learning. Rashid Bakirov, Bogdan Gabrys and Damien Fay. [paper]
Meta-learning of textual representations. Jorge Madrid, Hugo Jair Escalante and Eduardo Morales [paper]
Simplifying the Algorithm Selection Using Reduction of Rankings of Classification Algorithms. Salisu Abdulrahman and Pavel Brazdil. [paper]
ReinBo: Machine Learning pipeline conditional hierarchy search and configuration with Bayesian Optimization embedded Reinforcement Learning. Xudong Sun, Jiali Lin and Bernd Bischl. [paper]

15:45-16:00 Poster spotlights (2 mins each incl. handover)

Towards Automated Configuration of Stream Clustering Algorithms. Matthias Carnein, Heike Trautmann, Albert Bifet and Bernhard Pfahringer. [paper]
Learning parsers for technical drawings. Dries Van Daele, Wannes Meert, Nicholas Decleyre and Herman Dubois. [paper]
On Predictive Spreadsheet Autocompletion with Constraints. Samuel Kolb, Stefano Teso and Luc De Raedt. [paper]
The ABC of Data: A Classifying Framework for Data Readiness. Laurens A. Castelijns, Yuri Maas and Joaquin Vanschoren. [paper]
SynthLog: A Language for Synthesising Inductive Data Models. Yann Dauxais, Clément Gautrais, Anton Dries, Arcchit Jain, Samuel Kolb, Mohit Kumar, Stefano Teso, Elia Van Wolputte, Gust Verbruggen and Luc De Raedt. [paper]
Pyconstruct: Constraint Programming Meets Structured Prediction. Paolo Dragone, Stefano Teso and Andrea Passerini. [paper]
Towards Automated Technical Analysis for Foreign Exchange Data. Fabian G.B. Schut, Jan N. van Rijn and Holger H. Hoos. [paper]

Coffee break (& start poster session)

16:30-18:00 Poster session

Keynote speakers

Holger Hoos (Leiden Univerity)
Jesse Davis (KU Leuven)

Call for Contributions

Types and format

We welcome submissions to the workshop of the following types:

Presentations of relevant work that has recently been published or has already been accepted for publication in journals such as DMKD, MLJ, JMLR, AIJ, JAIR, and major conferences such as SIGKDD, ICML, IJCAI, etc. The submission should in this case only consist of a copy of the other paper.
Long papers reporting on new material. Papers can be at most 16 pages in the Springer LNCS format. Please note that also shorter papers are welcome.
Extended abstracts that report on novel and preliminary ideas. Extended abstracts can be at most 6 pages in LNCS format.
Short position statements on automating data science, at most 6 pages in LNCS format.

Review process and LNCS proceedings

The program committee will review all submissions. It will also decide which accepted submissions can be presented orally, in spotlights and/or as posters. Authors of original accepted submissions (i.e. of types 2, 3, and 4) will be given the option of including their submission in an LNCS proceedings volume. Authors who prefer their contribution not to be formally published, so as not to preclude publication elsewhere, can opt-out of this possibility.

Note that for each accepted contribution at least one author must register for the conference.

Submission instructions

Submissions via easychair.

Dates

Submission deadline: ~~Friday, June 7, 2019~~ Extended: Friday, June 14, 2019
Acceptance notification : Friday, July 19, 2019
Camera-ready deadline: Monday, July 26, 2019

Organizers

Workshop chairs

Tijl De Bie (UGent, Belgium)
Luc De Raedt (KU Leuven, Belgium)
Jose Hernandez-Orallo (Universitat Politecnica de Valencia, Spain)

Programme Committee

Pavel Brazdil
Jesse Davis
Peter Flach
Cesar Ferri
Elisa Fromont
Holger Hoos
Ernesto Jimenez-Ruiz
Jefrey Lijffijt
Alfredo Nazabal
Siegfried Nijssen
Jose Oramas M.
Andrea Passerini
Maria Perez Ortiz
Bernhard Pfahringer
Padhraic Smyth
Alexandre Termier
Heike Trautmann
Gertjan van den Burg
Matthijs van Leeuwen
Joaquin Vanschoren
Chris Williams

Google Sites

Report abuse