AAAI 2022

D-Vlog: Multimodal Vlog Dataset for Depression Detection

Jeewoo Yoon, Chaewon Kang, Seungbae Kim, and Jinyoung Han

Abstract

Detecting depression based on non-verbal behaviors has received great attention. However, most prior work on detecting depression mainly focused on detecting depressed individuals in laboratory settings, which are difficult to be generalized in practice. In addition, little attention has been paid to analyzing the non-verbal behaviors of depressed individuals in the wild. Therefore, in this paper, we present a multimodal depression dataset, D-Vlog, which consists of 961 vlogs (i.e., around 160 hours) collected from YouTube, which can be utilized in developing depression detection models based on the non-verbal behavior of individuals in real-world scenarios. We develop a multimodal deep learning model that uses acoustic and visual features extracted from collected data to detect depression. Our proposed model employs the cross-attention mechanism to effectively capture the relationship across acoustic and visual features, and generates useful multimodal representations for depression detection. The extensive experimental results demonstrate that the proposed model significantly outperforms other baseline models. We believe our dataset and the proposed model are useful for analyzing and detecting depressed individuals based on non-verbal behavior.

Data Collection and Annotation Process

Data Collection

We aim to build a dataset containing the balanced numbers of depression and non-depression vlogs thereby helping a model to learn unique depression-related features that are distinct from the non-depression features. To this end, we first collected YouTube videos, which have been posted between 1st January 2020 and 31st January 2021 (13 months), by using the search keywords.

To label the collected videos into depression and non-depression vlogs, we recruited four annotators. We educate the annotators with the annotation criteria along with sample videos to ensure the similar level of annotation quality across all vlogs. The details of the data collection methods and the proposed model are described in our paper "D-Vlog: Multimodal Vlog Dataset for Depression Detection" published in AAAI '22.

Data Download

To download the dataset, please fill out the following form.

After form submission, please send a request email to yoonjeewoo@gmail.com

  • 961 vlogs

  • 816 subjects

  • acoustic features (npy files): 112MB

  • visual features (npy files): 597MB

  • depression/non-depression labels (csv file): 70KB