Data Request
With the supervision of a psychiatrist, the three trained annotators labeled 1,025 users and their 7,346 anonymized Reddit posts using the open-source text annotation tool Doccano. During annotations, we mainly consider two different label categories: (i) Diagnosis Type (e.g., MDD, BD) and (ii) BD Mood Level with a scale ranging from -3 to 3. If there is any conflict in the annotated labels across the annotators, all the annotators discuss and reach to an agreement under the supervision of the psychiatrists.
The BD dataset was clinically validated by psychiatrists, including 14 years of posts on bipolar-related subreddits by 818 BD patients, along with the annotations of future suicidality and BD symptoms.
User posts (pkl file)
This dataset contains the assessment of the severity of suicidality of 866 Reddit users who had posted on the r/SuicideWatch subreddit from 2008 to 2015 and their 79,569 posts uploaded to 37,083 subreddits
Suicide Dictionary (csv file) : 5.6KB
Suicide-oriented Word Embedding & Suicide Dictionary for English, Chinese, and Korean (EMNLP Findings 2020)
These datasets contain the suicide-related and non-suicide-related Korean posts from Naver Cafe, and suicide-related dictionary data for generating suicide word embeddings for Chinese, English, and Korean, respectively.
Suicide Dictionary
Suicide-oriented Word embedding