Contextual Emotion annotation using crowdsourcing

Crowdsourcing techniques to find Contextual Emotion Labels in videos(KIXLab) :

Dialog videos contain rich contextual, emotional, and intentional cues of the characters and their surroundings. In this project we aim to find a crowdsourcing technique to leverage these rich emotion labels. The collection and aggregation process can be challenging because the temporal dimension of the dataset has to be considered, and the labels are multi-dimensional and can be highly subjective. We combat these challenges by exploring crowdsourcing techniques to design workflows and answer aggregation methods that efficiently collects multi-dimensional labels and overcome the subjective nature of the collected annotations.

Introduction: when it comes to contextual emotion annotation, existing crowdsourcing approaches on video annotation suffer from trade-offs between two key factors in task design: 1) being aware of the overall context of the video, and 2) being able to collect fine-grained data points at scale. For example, a task designer can assign an entire video to a single worker so that the worker could annotate the video while being aware of the context, but the granularity and abundance of the annotation will be low because of the excessive workload. On the other hand, if a designer divides the task into microtasks, fine-grained data collection is pos- sible, but the overall context of the video has to be sacrificed. These design issues can affect the quality of the collected annotations, which is a crucial factor for affective systems trained with emotion datasets.

Technical aspects:

  • Django, jquery, json

  • html, cs, bootstrap

Solution & Results: we explore the design space of context-aware emotion annotations and compare three workflow conditions by investigating related design factors. The three conditions that we experimented in Amazon Mechanical Turk are: 1) a whole video being annotated by one worker, 2) the video being uniformly divided by microtasks with multiple workers, and 3) the video being manually divided into clips with consistent local context (i.e., no drastic context change within the clip) and provided with a one-line global summary of the video for microtasks with multiple workers. From the experimental results, we could observe that the third condition produced highest quality annotations. We were also able to verify that microtask-based workflows enable collecting more fine-grained annotations. Lastly, we observed the potential for improving the quality of crowdsourced annotations by giving the artifact that conveys global context to workers who are looking only at local context.