Optimizing the Wisdom of the Crowd:

Inference, Learning, and Teaching

Information

Time: 1:00 PM - 5:00 PM, Aug-04-2019

Location: Tubughneng 5 (Dena'ina Convention Center)

Slides (Google Drive shared link)

Abstract

The increasing need for labeled data has brought the booming growth of crowdsourcing in a wide range of high-impact real-world applications, such as collaborative knowledge (e.g., data annotations, language translations), collective creativity (e.g., analogy mining, crowdfunding), and reverse Turing test (e.g., CAPTCHA-like systems), etc. In the context of supervised learning, crowdsourcing refers to the annotation process where the data items are outsourced and processed by a group of mostly unskilled online workers. Thus, the researchers or the organizations are able to collect large amount of information via the feedback of the crowd in a short time with a low cost.

Despite the wide adoption of crowdsourcing services, several of its fundamental problems remain unsolved especially at the information and cognitive levels with respect to incentive design, information aggregation, and heterogeneous learning. This tutorial aims to: (1) provide a comprehensive review of recent advances in exploring the power of crowdsourcing from the perspective of optimizing the wisdom of the crowd; and (2) identify the open challenges and provide insights to the future trends in the context of human-in-the-loop learning. We believe this is an emerging and potentially high-impact topic in computational data science, which will attract both researchers and practitioners from academia and industry.

Comparing with the previous offering of the tutorials and workshops regarding crowdsourcing, the emphasis of this tutorial will be placed on various aspects including: (1) the history and recent emerging techniques on addressing the truth inference problems of crowdsourcing (i.e., inferring the ground truth labels of the crowdsourced items); (2) active learning with imperfect oracles (i.e., answering the question of which item should be labeled next and which oracle to be queried) and heterogeneous learning with multiple labelers (i.e., multi-task learning and multi-view learning with crowdsourced labels); (3) supervising the crowd workers to learn and label in the form of teaching (i.e., teaching the crowdsourcing workers a concept such as labeling an image or categorizing a document).

Tutorial Outline

1. Introduction (15 minutes)

Various types of crowdsourcing and their real-world applications.
Definitions of crowdsourcing models in computer science community.

2. Part I: Truth Inference (85 minutes)

Statistical models on inference (probabilistic graph model)
Optimization methods on inference (worker modeling and item modeling)
Challenges of inference (source correlations, heterogeneous types, etc.)

Coffee Break (30 minutes)

3. Part II: Learning with crowdsourcing (50 minutes)

Multiple imperfect labelers (re-labeling, single labeling)
Classical learning models (active crowdsourcing, re-active learning, learning with noisy labels)
Unified learning models (learning with inference, multi-view learning, multi-task learning)

4. Part III: Teaching with crowdsourcing (50 minutes)

Machine teaching fundamentals (definition, characterization)
Incremental crowd teaching (iterative machine teaching, crowd teaching with decayed memory)
Global crowd teaching (hypothesis transition model, teaching with explanation)

5. Future Outlook and Open Problems (10 minutes)

Tutors

Google Sites

Report abuse

Optimizing the Wisdom of the Crowd:

Inference, Learning, and Teaching

Information

Abstract

Tutorial Outline

Tutors

Yao Zhou

Fenglong Ma

Jing Gao

Jingrui He