June 22nd: 9am - 12pm US Eastern Time

The Recent Past and Near Future of Clustering

STOC 2021

Online-Workshop

Organizers:

Vincent Cohen-Addad -- Google Zürich (vcohenad@gmail.com),

Frederik Mallmann-Trenn -- King's College London (frederik.mallmann-trenn@kcl.ac.uk),

Chris Schwiegelshohn -- Aarhus University (cschwiegelshohn@gmail.com)



Description:

Partitioning a metric space into dense regions or a graph into densely connected subgraphs are classic fundamental problems stemming from data analysis, machine learning and

operations research. The landscape of clustering problems has recently seen important developments driven by both theoretical results and new, fresh practical applications.


In the era of big data, machine learning, and ethical computing, clustering problems are often at the center of the stage. Classic challenges such as computing a good clustering as

quickly as possible now have to consider additional constraints such as available resources, guaranteeing privacy, fairness, or explainability. Many of these non-standard model

assumptions and constraints have only emerged recently, breathing new life into older fields while offering challenging problems for future algorithm design.


In this workshop we aim to integrate these recent developments in both classic algorithmic results as well as newer models. The scope ranges from complexity results, to fairness,

exploring new objective functions and different computation models.


Format:

The talks will be organized using the online-platform airmeet. The workshop has two 85 minute sessions.

Each session consists of two 30 minute talks, followed by a question session of 5 minutes. After the two talks of a session, participants can join a 15 minute open problem session on topics related to the talks.

There will be a 5 minute break between the two sessions.



To access talks, join us in the STOC venue: http://acm-stoc.org/stoc2021/venue.html

Talk 1:

Coresets for Clustering Problems


Abstract: The last two decades have seen a rise of streaming one-pass algorithms for clustering problems. One concept in the core of this success story are so-called coresets: Small summaries of input data which approximately preserve the cost of any solution. For example, for the k-means problem, a coreset is a weighted set of points S which satisfies that for any choice of k centers from the d-dimensional Euclidean space (a k-means solution), the sum of the squared distances of all points to their closest center (which is the k-means objective) is approximately the same for S and the original point set P. Coresets are immensely useful because a) they allow to basically run any algorithm on the coreset instead of huge set, b) they can often be computed in one-pass over the data and c) there exist very small coresets for classical clustering problems like k-means and k-median. In this talk, we review coreset results and point out directions for future research.





Melanie Schmidt is an assistant professor at the University of Cologne. She graduated from TU Dortmund with a thesis on "Coresets and streaming algorithms for the k-means problem and related clustering objectives" and spent PostDoc times at Carnegie Mellon in Pittsburgh and at University of Bonn. Her research interest lie on approximation algorithms for algorithmic data analysis, in particular clustering.



Talk 2:

Hardness of Approximation for Metric Clustering.

by Karthik C. S.

Abstract: A lot of attention has been invested over the last three decades by the theory community to understand the efficiency of (even approximately) computing optimal clusters. Despite this focus, our understanding of the approximability of these clustering tasks remains poor, and our understanding of the hardness of approximation of these tasks was even worse.


In this talk, I will survey the recently established conceptual framework to tackle the latter issue of hardness of approximation of clustering in L_p metrics (and in particular Euclidean metric), through which we are now able to significantly improve on the inapproximability bounds established more than twenty years ago.





Karthik is currently a Postdoctoral Fellow at the Courant Institute of Mathematical Sciences, New York University, hosted by Subhash Khot. He received his Ph.D. in 2019 from Weizmann Institute of Science where he was advised by Irit Dinur. He will be starting as an Assistant Professor at Rutgers University this Fall. His main research interests are in the intersection of complexity theory and combinatorial geometry.



Talk 3:

On Fair Clustering.


by Sara Ahmadian.


Abstract: From digital assistant, to movie recommendation, and self driving cars, machine learning is behind many day-to-day interactions with technology. While learning algorithms are not inherently biased, they may pick up and amplify the bias already present in the training data. Thus a recent line of work has emerged on revising traditional algorithms or devising new algorithms to factor in fairness. In this talk, I focus on adding fairness to clustering which is a fundamental problem in data mining and unsupervised machine learning. We introduce a notion of fairness that focuses on requiring a bounded representation of various groups of a sensitive feature, e.g. race, gender, etc., in each cluster.


In clustering, the goal is to organize objects into clusters such that elements in the same clusters are “similar”. There are various ways to express the similarity of objects. In metric settings, we are given a distance measure for the objects, and in a non-metric setting, we are given labels in the form of {+,-} for pairs of objects which identify whether two objects are similar (label +) or not (label -). We look at a fair k-center for the metric case and fair correlation clustering for the non-metric case. In addition to flat clusterings which focus on outputting just one set of clusters, I will talk about hierarchical clustering and how the algorithms for this problem can be modified to accommodate fairness constraints.





Sara Ahmadian is a research scientist in the Large-Scale Optimization research team, which is part of the broader NYC Algorithms and Optimization team at Google. Sara earned degrees in Combinatorics and Optimization (M.M. 2010, Ph.D. 2017) from University of Waterloo, where she was advised by Chaitanya Swamy and supported by a NSERC Fellowship. Sara is a recipient of 2017 University of Waterloo Outstanding Achievement in Graduate Studies (Ph.D.) designation for her PhD thesis. She worked as a software developer for a start-up company in Waterloo after completing her Master's and before starting her PhD. Prior to that, She completed Bachelor of Science in Computer Engineering in Sharif University of Technology, Tehran, Iran. Her research interests include combinatorial optimization, approximation algorithm, and design and analysis of algorithms.



Talk 4:

Hierarchical Clustering of Graphs.

Abstract: Hierarchical Clustering is a popular tool in unsupervised learning that partitions a dataset in a hierarchical manner. Despite its long history and plethora of heuristics, a principled framework for understanding its optimization properties had been missing. A few years ago, Dasgupta proposed a new objective function for this problem, spurring a flurry of work on designing new algorithms and shedding light on old algorithms for hierarchical clustering. In this talk, I will provide an overview of some of these developments and highlight several open questions in the area.



Vaggos Chatziafratis' primary interests are in Algorithms and Machine Learning Theory. He is currently a Visiting Faculty Researcher at Google Research in New York, hosted by Mohammad Mahdian and Vahab Mirrokni, where he is part of the Algorithms and Graph Mining team. Prior to that, he received his Ph.D. in Computer Science at Stanford, where he was part of the Theory group, advised by Tim Roughgarden and co-advised by Moses Charikar. His PhD thesis was on algorithms and their limitations for Hierarchical Clustering. Prior to Stanford, he received a Diploma in EECS from the National Technical University of Athens, Greece.