Invited Speakers

Quang Duong (Google Brain Medical)

Title: Medical ground-truth data: diagnosis at scale and its challenges.

Abstract: Crowdsourcing has enabled the collection, aggregation and refinement of human knowledge and judgement in increasingly complex problem domains at scale, especially for the purpose of developing machine learning (ML) technologies. The scale of such ground truth data generation efforts poses significant challenges to quality control, especially in the domain of medical labeling, crowdsourcing for building medical ground truth and knowledge. In this paper, we survey medicine-specific quality control problems: expertise diversity, workers’ fatigue, expert scarcity, ground truth definition uncertainty, and labor cost. In particular, we posit that expert diversity and workers’ fatigue introduce additional variability, i.e. some physicians may perform better diagnosing certain classes of cases and/or at different times. We present our analytical findings on physician labelers’ work patterns against those of typical crowdsourcing workers, and quantitatively compare different methods of compensating for low-quality work.


Phoebe Liu (Figure Eight)

Title: Crowdsourcing human behaviors for conversational social robot.

Abstract: The development of social robots which can interact with people using speech, gesture, and locomotion, presents many difficult challenges, from action segmentation and scene understanding based on multimodal behavioral inputs like speech and proxemics, to planning and executing complex actions combining natural language, gesture, and navigation. Furthermore, manual programming of interaction logic can be tedious and difficult. In this talk, I will present an approach for learning-by-imitation of generative autonomous robot behaviors using training data crowdsourced from real-world human-human interactions. There are several significant challenges in doing this. First, perception and scene understanding must be performed based on continuous data from sensors that often contain a high rate of sensor noise (e.g. ASR). Second, human social behavior spans a broad action space and contains many natural variations of trajectories, utterances, and gestures. Finally, collection of live, in-situ behavior data is much more costly than online crowdsourcing, so data-efficient techniques are necessary. To address these challenges, we have developed techniques for acquiring, abstracting, and reproducing verbal and non-verbal dialog behavior from a “crowd” of live people. This approach requires no manual design, annotation, or natural-language understanding, and I will show examples of how data-driven techniques can be used to learn not only basic interaction logic, but also capture an individual’s unique interaction style. Finally, I will discuss other field studies in which we have used data-driven techniques to enable social robots to provide guidance and services to people.


Steve Mussmann (Stanford University)

Title: Understanding the Bias and Data Efficiency of Uncertainty Sampling.

Abstract: Uncertainty sampling is perhaps the most common active learning algorithm used in practice. In this talk, we describe two phenomena about uncertainty sampling and the corresponding theoretical explanations that we developed to explain these phenomena. First, uncertainty sampling on a convex loss can converge to a wide variety of final error rates, and sometimes even lower than the error of random sampling, given infinite samples for a given dataset. We explain this phenomenon by finding that uncertainty sampling is implicitly optimizing the nonconvex zero-one loss. Second, uncertainty sampling with logistic regression on different datasets yields a wide range of data efficiencies (the factor reduction in the number of samples to reach the same error). We find both empirically and theoretically that this variation in data efficiency is inversely related to the error rate of the final classifier.


Jennifer Prendki (Alectio)

Title: Labeling Quality vs. Data Quantity: Are Larger Training Sets Always Better?

Abstract: Building highly accurate Machine Learning models, especially Deep Learning models, is contingent on obtaining large volumes of data; that is the reason why Big Data has triggered so much excitement among ML scientists. However, because most applications are still built using a supervised learning approach, those large volumes of data need to be labeled, and this is no easy task as most labeling is still done manually: this can be referred to as the “Big Data Labeling Crisis”. No labeling process can ever be 100% accurate, just like in the case of models. And if data is imperfect, then what is the point in trying to reach a perfect model accuracy? To that effect, in order to counter-balance any labeling imperfection in a training set, ML experts have traditionally relied on volume; that’s because higher volume of noise can usually be supported if datasets are larger. In short, the larger the training set, the more imperfect we can afford it to be. We are presenting here the results of a series of studies aiming at understanding in depth the relationship between the fraction of incorrect labels and the size of the training set in the context of classification, as well as the impact of such noise on the confusion matrix. In particular, we aim to show that labeling budget can be better spent on gathering annotations from more labelers than on labeling a larger volume of data, and that the maximum acceptable amount of noise differs from class to class. We also offer thoughts on how those findings should impact the relative amount of data across classes.


Yisong Yue (California Institute of Technology)

Title: Real-World Bayesian Optimization.

Abstract: Experiment design is hallmark of virtually all research disciplines. In many settings, one important challenge is how to automatically design experiments over large action/design spaces. Furthermore, it is also important for such a procedure to be adaptive, i.e., to adapt to the outcomes of previous experiments. In this talk, I will describe recent progress in using data-driven algorithmic techniques for adaptive experiment design, also known as active learning and Bayesian optimization in the machine learning community. Building upon the Gaussian process (GP) framework, I will describe case studies in personalized clinical therapy, nanophotonic structure design, protein engineering, and other real-world applications. Motivated by these applications, I will show how to incorporate real-world considerations such as safety, preference elicitation, and multi-fidelity experiment design into the GP framework, with new algorithms, theoretical guarantees, and empirical validation.