Social Health Text Mining

Mental Health Research

This project aims to understand various mental health-related issues, including depression and suicidal tendencies, by analyzing social media health text. One component of this project tries to identify distinguishing psychological and linguistic signals between generic depression and suicidal thoughts within these two types of posts. The other component tries to determine the key determinants of suicidal tendencies (in review).

Medical Named Entity Corpus

Recognizing biomedical entities in the text has significance in biomedical and health science research, as it benefits myriad downstream tasks, including entity linking, relation extraction, or entity resolution. While English and a few other widely used languages enjoy ample resources for automatic biomedical entity recognition, it is not the case for Bangla, a low-resource language. On that account, in this paper, we introduce BanglaBioMed, a Bangla biomedical named entity (NE) annotated dataset in standard IOB format, the first of its kind, consisting of over 12000 tokens annotated with the biomedical entities. The corpus is created by collecting Bangla text from a list of health articles and then annotated with four distinct types of entities: Anatomy (AN), Chemical and Drugs (CD), Disease and Symptom (DS), and Medical Procedure (MP). 

Health Text Discourse Mode Classification 

This study introduces BanglaSocialHealth, a corpus of health-related social media posts in Bengali, annotated at the sentence level for four expression modes: narrative (NAR), informative (INF), suggestive (SUG), and inquiring (INQ). We outline the annotation process and present various statistics, such as the median and mean word lengths across different sentence modes. Furthermore, we employ classical machine learning (CML) classifiers and transformer-based language models to categorize sentence modes. This corpus and analysis make a valuable contribution to Bengali NLP research in medical and health fields, offering potential applications in tasks like question-answering, misinformation detection, and information retrieval.

[C1] Sazzed, S.;, Comparative analysis of affective and linguistic features in online depression and suicidal discussion forums , In 34th ACM Conference on Hypertext and Social Media (ACM HT), 2023.


[C1] Sazzed, S., BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla, In BioNLP @ Association of Computational Linguistic (ACL), 2022.


[C1] Sazzed, S.;, Discourse Mode Categorization of Bengali Social Media Health Text , In WASSA @ Association of Computational Linguistic (ACL), 2023.