EMNLP Findings 2020

Cross-Lingual Suicidal-Oriented Word Embedding toward Suicide Prevention

Daeun Lee1Soyung Park2,  Jiwon Kang1Daejin Choi3,   Jinyoung Han*1


1Department of Applied Artificial Intelligence, Sungkyunkwan University

2National Assembly Research Service

3Department of Computer Science & Engineering, Incheon National University

(* = corresponding author)

Abstract

Early intervention for suicide risks with social media data has increasingly received great attention. Using a suicide dictionary created by mental health experts is one of the effective way to detect suicidal ideation. However, little attention has been paid to validate whether and how the existing dictionaries for other languages (i.e., English and Chinese) can be used for predicting suicidal ideation for a low-resource language (i.e., Korean) where a knowledge-based suicide dictionary has not yet been developed. To this end, we propose a cross-lingual suicidal ideation detection model that can identify whether a given social media post includes suicidal ideation or not. To utilize the existing suicide dictionaries developed for other languages (i.e., English and Chinese) in word embedding, our model translates a post written in the target language (i.e., Korean) into English and Chinese, and then uses the separate suicidal-oriented word embeddings developed for English and Chinese, respectively. By applying an ensemble approach for different languages, the model achieves high accuracy, over 87%. We believe our model is useful in accessing suicidal ideation using social media data for preventing potential suicide risk in an early stage.

Method

Data collection: To develop models for predicting suicidal ideation for a post written in Korean, we collected the suicide-related and non-suicide-related Korean posts from Naver Cafe (http://cafe.naver.com/). To improve the model performance, we further collected suicide-related dictionary data for generating suicide word embeddings for Chinese, English, and Korean, respectively. Note that all the collected data is anonymized, hence no user information can be identifiable.

Cross-lingual Suicidal Ideation Model: We propose a suicidal ideation detection model that can identify whether a given post includes suicidal ideation or not. To utilize the existing suicide-related dictionaries developed for other languages(i.e., English and Chinese) in word embedding, our model translates a post written in the target language (i.e., Korean) into English and Chinese, and then uses the separate word embeddings developed for English and Chinese, respectively. By applying an ensemble approach for different languages, our proposed model finally predicts suicidal ideation of the given post in Korean.

Post Attention Model: We apply the attention mechanism to reflect the important suicide-related information.

Word Embedding Model: We adopt a suicidal-oriented word embedding that refines a word embedding to capture domain knowledge from a pre-built suicide-related dictionary. The model identifies whether a given sentence contains suicidal expression or not. We made three different suicidal word embeddings, since different suicidal dictionaries exist (e.g., word-level for Korean and Chinese vs. sentence-level for English).

Data Download

To download the dataset, please send email to delee12@skku.edu.


Paper URL

https://aclanthology.org/2020.findings-emnlp.200/


Code

https://github.com/DSAIL-SKKU/Cross_Lingual_Suicidal_Oriented_Word_Embedding_EMNLP_Findings_2020

If our work was helpful in your research, please kindly cite this work:

BIBTEX

@inproceedings{lee-etal-2020-cross,

title = {Cross-Lingual Suicidal-Oriented Word Embedding toward Suicide Prevention},

author = {Lee, Daeun and Park, Soyoung and Kang, Jiwon and Choi, Daejin and Han, Jinyoung},

booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},

year = {2020},

publisher = "Association for Computational Linguistics",

url = "https://www.aclweb.org/anthology/2020.findings-emnlp.200",

pages = "2208--2217",

}

Acknowledgments

This research was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1A02085647) and the MSIT (Ministryof Science and ICT), Korea, under the ICAN (ICT Challenge and Advanced Network of HRD) program (2020-0-01816) supervised by the IITP (Institute of Information & Communications TechnologyPlanning & Evaluation).