GitHub Link LinkedIn

Data Request

MDD&BD Risk (NAACL 2024)

With the supervision of a psychiatrist, the three trained annotators labeled 1,025 users and their 7,346 anonymized Reddit posts using the open-source text annotation tool Doccano. During annotations, we mainly consider two different label categories: (i) Diagnosis Type (e.g., MDD, BD) and (ii) BD Mood Level with a scale ranging from -3 to 3. If there is any conflict in the annotated labels across the annotators, all the annotators discuss and reach to an agreement under the supervision of the psychiatrists.

The BD dataset was clinically validated by psychiatrists, including 14 years of posts on bipolar-related subreddits by 818 BD patients, along with the annotations of future suicidality and BD symptoms. 

This dataset contains the assessment of the severity of suicidality of 866 Reddit users who had posted on the r/SuicideWatch subreddit from 2008 to 2015 and their 79,569 posts uploaded to 37,083 subreddits

These datasets contain the suicide-related and non-suicide-related Korean posts from Naver Cafe, and suicide-related dictionary data for generating suicide word embeddings for Chinese, English, and Korean, respectively.