## Master Class

The conference will be preceded by a Master Class on Tuesday June 26. The topic is *Data Science meets Combinatorial Optimization*.

For questions about the master class please contact the chair David Bergman (email).

**Master Class Schedule (Tuesday June 26)**

*08:55 - 09:00* David Bergman: Introduction to the Master Class

*09:00 - 10:00* Siegfried Nijssen: Introduction to Machine Learning and Data Mining

*10:00 - 10:45* Tias Guns: Data Mining using Constraint Programming

*10:45 - 11:15* Coffee Break

*11:15 - 12:15* Kate Smith-Miles: Instance Spaces for Objective Assessment of Algorithms and Benchmark Test Suites

*12:15 - 13:30* Lunch

*13:30 - 14:30* Bistra Dilkina: Machine Learning for Branch and Bound

*14:30 - 15:15* Elias B. Khalil: Learning Combinatorial Optimization Algorithms over Graphs

*15:15 - 15:45* Coffee Break

*15:45 - 16:45* Barry O'Sullivan: Recent Applications of Data Science in Optimization and Constraint Programming

**Speaker List and Abstracts**

- Bistra Dilkina (University of Southern California)

Branch and Bound solvers for Mixed Integer Programs (MIP) such as CPLEX, Gurobi and SCIP are used daily across different domains and industries to find solutions with optimality guarantees for NP-hard combinatorial problems. However, branch and bound involves a number of decisions that are currently addressed heuristically in solvers. Instead, the use of machine learning (ML) in this context promises to make better-informed, input-specific decisions in MIP branch-and-bound.

In this presentation, we will look at several components of the branch and bound algorithm that are amenable to data-driven decision-making. Leveraging the plethora of rich and useful data generated during the solving process, we illustrate the potential for ML in MIP on two crucial tasks in branch and bound: branching variable selection and primal heuristic selection. Empirical results show that such approaches can significantly improve the performance of a solver on both heterogeneous benchmark instances as well as homogeneous families of instances.

- Tias Guns (Vrije Universiteit Brussel)

In todays data-rich world, pattern mining techniques allow us to extract knowledge from data. However, such knowledge can take many forms and often depends on the application at hand. This calls for generic techniques that can be used in a wide range of settings. Declarative constraint-based mining offers an answer to this question.

By building on constraint programming solvers, a generic method that fits many pattern mining settings is obtained. However, there is a trade-off between generality and efficiency. This talk will discuss multiple efforts to reduce the trade-off, both at the modeling level and at the solving level.

- Elias B. Khalil (Georgia Tech)

The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems.

In this presentation, we will explore a set of recent approaches that combine deep neural networks with either supervised or reinforcement learning, with the aim of either predicting solutions or learning policies that construct solutions. A common theme arising from this research thread is that families of graph optimization instances often share structural similarities that can be discovered with machine learning, and eventually leveraged to obtain powerful specialized heuristics.

- Siegfried Nijssen (Université catholique de Louvain )

What is classification, regression, ranking, clustering, pattern mining, feature selection and probabilistic modeling? Which important variations of these problems have been studied in the literature? Which algorithms are most popular? These are the questions that this presentation intends to answer. It will provide an introduction to the main problems studied in machine learning and data mining. The specific approach that we will take is that we will treat data mining and machine learning problems as constraint satisfaction and optimization problems. This allows us to focus on the differences between the different problem settings and allows us to avoid discussing too much algorithmic detail. The presentation should give a solid basis in machine learning and data mining for the later presentations in the master class.

- Barry O'Sullivan (University College Cork)

Over the past number of years there has been a significant growth in interest in the fields of data analytics and data science in the context of operations research and constraint programming. We now often find ourselves incorporating techniques from machine learning and data mining into our optimisation tool-chains. On the other hand, viewing data science tasks through the lens of optimisation and constraint programming can create interesting opportunities to develop interesting new techniques and applications. In this class we will review some of the recent applications that sit at the intersection of data science and optimisation.

- Kate Smith-Miles (University of Melbourne)

Objective assessment of algorithm performance is notoriously difficult, with conclusions often inadvertently biased towards the chosen test instances. Rather than reporting average performance of algorithms across a set of chosen instances, we discuss a new methodology to enable the strengths and weaknesses of different algorithms to be compared across a broader generalised instance space. Initially developed for combinatorial optimisation, the methodology has recently been extended for machine learning classification, and to ask whether the UCI repository and OpenML are sufficient as benchmark test suites. Results will be presented to demonstrate: (i) how pockets of the instance space can be found where algorithm performance varies significantly from the average performance of an algorithm; (ii) how the properties of the instances can be used to predict algorithm performance on previously unseen instances with high accuracy; (iii) how the relative strengths and weaknesses of each algorithm can be visualized and measured objectively; and (iv) how new test instances can be generated to fill the instance space and offer greater insights into algorithmic power.