10:10 - 10:50
Speaker: Professor Julia Stoyanovich, New York University
Abstract: Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based; it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k. In the past few years, there has been much work on incorporating fairness and diversity requirements into algorithmic rankers, with contributions coming from the data management, algorithms, information retrieval, and recommender systems communities. In my talk I will offer a broad perspective that connects formalizations and algorithmic approaches across subfields, grounding them in a common narrative around the value frameworks that motivate specific fairness- and diversity-enhancing interventions.
11:00 - 11:40
Speaker: Dr. Behzad Golshan
Abstract: Creating and collecting labeled data is one of the major bottlenecks in machine learning pipelines and the emergence of large-scale deep neural models, which typically require a lot of training data, has further exacerbated the problem. While weak-supervision techniques have circumvented this bottleneck, existing frameworks either require experts to write a set of diverse, high-quality rules to label data, or require a labeled subset of the data to automatically mine rules. The process of manually writing rules can be tedious and time-consuming. At the same time, creating a labeled subset of the data can be costly and even infeasible in imbalanced settings. In this talk, we present Darwin, an interactive system designed to alleviate the task of writing rules for labeling text data in weakly supervised settings. Given an initial labeling rule, Darwin automatically generates a set of candidate rules for the labeling task at hand, and utilizes the annotator's feedback to adapt the candidate rules. We describe how Darwin operates over large text corpora (i.e., more than 1 million sentences) and supports a wide range of labeling functions (i.e., any function that can be specified using a context-free grammar). Finally, we demonstrate with a suite of experiments how Darwin enables annotators to generate weakly-supervised labels efficiently and with a small cost.