NAACL 2024

Detecting Bipolar Disorder from Misdiagnosed Major Depressive Disorder with Mood-Aware Multi-Task Learning

  Daeun Lee1Hyolim Jeon1,  Sejung Son1,   Chaewon Park, Jihyun An, Seungbae Kim2,   Jinyoung Han*1


1Department of Applied Artificial Intelligence, Sungkyunkwan University

2Department of Computer Science & Engineering, University of South Florida

(* = corresponding author)

Abstract

Bipolar Disorder (BD) is a mental disorder characterized by intense mood swings, from depression to manic states. Individuals with BD are at a higher risk of suicide, but BD is often misdiagnosed as Major Depressive Disorder (MDD) due to shared symptoms, resulting in delays in appropriate treatment and increased suicide risk. While early intervention based on social media data has been explored to uncover latent BD risk, little attention has been paid to detecting BD from those misdiagnosed as MDD. Therefore, this study presents a novel approach for identifying BD risk in individuals initially misdiagnosed with MDD. A unique dataset, BD-Risk, is introduced, incorporating mental disorder types and BD mood levels verified by two clinical experts. The proposed multi-task learning for predicting BD risk and BD mood level outperforms the state-of-the-art baselines. Also, the proposed dynamic mood-aware attention can provide insights into the impact of BD mood on future risk, potentially aiding interventions for at-risk individuals.

Annotation Process

To label the collected Reddit dataset, we recruited three researchers, who are knowledgeable in psychology and fluent in English, as annotators. With the supervision of a psychiatrist, the three trained annotators labeled 1,025 users and their 7,346 anonymized Reddit posts using the open-source text annotation tool Doccano. During annotations, we mainly consider two different label categories: (i) Diagnosis Type (e.g., MDD, BD) and (ii) BD Mood Level with a scale ranging from -3 to 3. If there is any conflict in the annotated labels across the annotators, all the annotators discuss and reach to an agreement under the supervision of the psychiatrists. 

Ethical Concerns

We carefully consider potential ethical issues in this work: (i) protecting users' privacies on Reddit and (ii) avoiding potentially harmful uses of the proposed dataset. The Reddit privacy policy explicitly authorizes third parties to copy user content through the Reddit API. We follow the widely-accepted social media research ethics policies that allow researchers to utilize user data without explicit consent if anonymity is protected (benton et al. 2017; Williams et al., 2017). Any metadata that could be used to specify the author was not collected. In addition, all content is manually scanned to remove personally identifiable information and mask all the named entities. More importantly, the BD dataset will be shared only with other researchers who have agreed to the ethical use of the dataset. This study was reviewed and approved by the Institutional Review Board (SKKU2022-11-038).

How to Request Access

While it is important to ensure that all necessary precautions are taken, we are enthusiastic about sharing this valuable resource with fellow researchers. To request access to the dataset, please contact Daeun Lee (delee12@skku.edu). Access requests should follow the format of the sample application provided below, which consists of three parts:

Part 0: Download a sample application form (Click)

Part 1: Applicant Information

*Please provide the following details about yourself:

Part 2: Dataset Access Application

*Please address the following points in your application:

Part 3: Ethical Review by Your Organization

*Please attach the ethical review by your organization in your application:

The dataset was produced at Sungkyunkwan University (SKKU) in South Korea, and the research conducted on this dataset at SKKU has been granted exemption from Institutional Review Board (IRB) evaluation by SKKU's IRB (SKKU2022-11-038). This exemption applies to the analysis of pre-existing data that is publicly accessible or involves individuals who cannot be directly identified or linked to identifiable information. Nevertheless, due to the potentially sensitive nature of this data, we require that researchers who receive the data obtain ethical approval from their respective organizations.

Please submit your access request to Daeun Lee (delee12@skku.edu) and ensure that you include all the necessary information and address the points outlined in the sample application.


Dataset Availability and Governance Plan

Inspired by the data sharing system of previous research (Zirikly et al. 2019), we have decided to establish a governance process for researcher access to the dataset, following the procedure outlined below.

Due to limitations in the number of available individuals, three out of the five authors will be selected to review access requests submitted in the format specified below. The outcomes of the review will result in the following responses:

The authors will prioritize and promote diversity and inclusivity among the reviewers and the community of researchers utilizing the dataset.

Reference

Zirikly, A., Resnik, P., Uzuner, O., & Hollingshead, K. (2019, June). CLPsych 2019 shared task: Predicting the degree of suicide risk in Reddit posts. In Proceedings of the sixth workshop on computational linguistics and clinical psychology (pp. 24-33)

Paper URL

https://aclanthology.org/2024.naacl-long.278/


Code

https://github.com/DSAIL-SKKU/Detecting-BD-from-Misdiagnosed-MDD_NAACL_2024

If our work was helpful in your research, please kindly cite this work:

BIBTEX

@inproceedings{lee2024detecting,

  title={Detecting Bipolar Disorder from Misdiagnosed Major Depressive Disorder with Mood-Aware Multi-Task Learning},

  author={Lee, Daeun and Jeon, Hyolim and Son, Sejung and Park, Chaewon and hyun An, Ji and Kim, Seungbae and Han, Jinyoung},

  booktitle={Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},

  pages={4954--4970},

  year={2024}

}


Acknowledgments

This research was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2022S1A5A8054322) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2023R1A2C2007625).