Full-day Tutorial at ISWC 2013, Sidney, Australia

SLIDES AND SOURCE CODE AVAILABLE AT: https://github.com/maribelacosta/crowdsourcing-tutorial

Abstract

The Semantic Web has traditionally found itself in the situation in which a relatively small number of actors are responsible for the publication of significant amount of structured content on the Web to facilitate the development of novel services and applications. In recent times this process has been fundamentally democratized as social computing ideas and tools become pervasive. In this new scenario, it is the ‘crowd’ that drives content provisioning and maintenance – an open, ever growing group of individuals and organizations collaborating to build and curate large knowledge bases such as DBpedia, Freebase, and Wikidata, defining links between these sources, or describing digital artifacts in terms of ontological concepts and properties. Microtask crowdsourcing platforms, as one of the most popular instance of social computing technologies, are increasingly used to support such massively collaborative projects on semantic content management. This can be observed, among other things, in the increasing number of publications and development projects showcasing how specific Semantic Web problems such as entity extraction and linking, ontology mapping, semantic annotation, conceptual modeling, or query resolution and processing could be approached by assemblies of scalable, automatic and high-quality human components.

In this tutorial we will introduce the most popular approaches to microtask crowdsourcing for Semantic Web problems, as a mean to realize such hybrid content management architectures. We will explain the core notions and technologies, including Amazon Mechanical Turk, CrowdFlower and specifically purposed tools building upon the functionality of these platforms. We will address questions related to quality assurance, resource management, and workflow design, and discuss a series of technical and socio-economical challenges and open issues related to the application of microtask crowdsourcing in given Semantic Web scenarios. 

This tutorial is supported by SemData. SemData is a four year project, started on October 2013, funded under the International Research Staff Exchange Scheme (IRSES) of the EU Marie Curie Actions.

Organizers

Gianluca Demartini

  • Institution: eXascale Infolab, University of Fribourg, Switzerland
  • Email: gianluca.demartini[at]unifr.ch 

Elena Simperl

  • Institution: Web Science and Internet Group, University of Southampton, United Kingdom
  • Email: e.simperl[at]soton.ac.uk 

Maribel Acosta

  • Institution: AIFB Institute, Karlsruhe Institute of Technology, Germany
  • Email: maribel.acosta[at]kit.edu 

Schedule

Part

Description

Time

Presenters

Welcome           (15 min)

Short introduction of the tutorial topic and presenters.

09:00 – 09:15

Organizers

Microtask crowdsourcing fundamentals

Introduction of the core concepts and definitions behind microtask crowdsourcing, including task design, interface and experience design, quality assurance, resource management, incentives and motivators

Overview of well-known microtask crowdsourcing platforms

Examples of hybrid human-machine systems for semantic technologies, databases, and information retrieval


09:15-10:30

Elena Simperl 

Break


10:30-11:00

Amazon’s  Mechanical Turk hands-on

Use of the Mechanical Turk Web interface for requesters to easily crowdsource different types of microtasks

Use of the Mechanical Turk SDK with examples of programmatic task creation, status review, and result recollection

Use of CrowdFlower as alternative to Mechanical Turk

11:00-12:45

Maribel Acosta

Lunch break

-

12:45- 13:45

-

Microtask management and quality contol

Main components of a microtask crowdsourcing platforms

Extensions for complex tasks, task design patterns, time and performance estimation, work assignment

Quality control, including relationship between UI design on quality, qualification tests, master workers

Quality assessment: majority voting, machine learning techniques, manual assessment through the crowd

Payment strategies

13:45 – 15:30

Gianluca Demartini

Break

-

15:30- 16:00

-

Applications in semantic content management 

Discussion of two extended examples of how to use microtask crowdsourcing for entity resolution and Linked Data curation

16:00 – 17:00

Organizers

Current research directions in microtask crowdsourcing

The future of crowd work

Improve quality of work via motivation of the crowd

Ethics

Crowd training and management

Crowd career trajectories

17:00 – 17:15

Gianluca Demartini

Wrap-up


Summary of the tutorial and discussion with participants on open issues and potential new applications of microtask crowdsourcing to semantic technologies.

17:15 – 17:30

Organizers