Time: 1:00 PM - 5:00 PM, Aug-04-2019
Location: Tubughneng 5 (Dena'ina Convention Center)
Slides (Google Drive shared link)
The increasing need for labeled data has brought the booming growth of crowdsourcing in a wide range of high-impact real-world applications, such as collaborative knowledge (e.g., data annotations, language translations), collective creativity (e.g., analogy mining, crowdfunding), and reverse Turing test (e.g., CAPTCHA-like systems), etc. In the context of supervised learning, crowdsourcing refers to the annotation process where the data items are outsourced and processed by a group of mostly unskilled online workers. Thus, the researchers or the organizations are able to collect large amount of information via the feedback of the crowd in a short time with a low cost.
Despite the wide adoption of crowdsourcing services, several of its fundamental problems remain unsolved especially at the information and cognitive levels with respect to incentive design, information aggregation, and heterogeneous learning. This tutorial aims to: (1) provide a comprehensive review of recent advances in exploring the power of crowdsourcing from the perspective of optimizing the wisdom of the crowd; and (2) identify the open challenges and provide insights to the future trends in the context of human-in-the-loop learning. We believe this is an emerging and potentially high-impact topic in computational data science, which will attract both researchers and practitioners from academia and industry.
Comparing with the previous offering of the tutorials and workshops regarding crowdsourcing, the emphasis of this tutorial will be placed on various aspects including: (1) the history and recent emerging techniques on addressing the truth inference problems of crowdsourcing (i.e., inferring the ground truth labels of the crowdsourced items); (2) active learning with imperfect oracles (i.e., answering the question of which item should be labeled next and which oracle to be queried) and heterogeneous learning with multiple labelers (i.e., multi-task learning and multi-view learning with crowdsourced labels); (3) supervising the crowd workers to learn and label in the form of teaching (i.e., teaching the crowdsourcing workers a concept such as labeling an image or categorizing a document).
1. Introduction (15 minutes)
2. Part I: Truth Inference (85 minutes)
Coffee Break (30 minutes)
3. Part II: Learning with crowdsourcing (50 minutes)
4. Part III: Teaching with crowdsourcing (50 minutes)
5. Future Outlook and Open Problems (10 minutes)