EMNLP-IJCNLP

Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content

Reddit dataset:

https://github.com/mesbahs/ADR_EMNLP

Datasets:

Twitter Dataset: The PSB 2016 Social Media Shared Task Twitter dataset (i.e. collected as explained in (Nik-farjam et al., 2015)) is a widely used manually annotated training data for ADR detection. The original dataset contained a total of 2,000 tweet IDs but at the time of this study we were able to retrieve text from only 643 tweets, which we acknowledge might have an effect on the performance of the trained models.
Reddit dataset: Reddit is a discussion website where users share and discuss problems/ideas about different topics. Reddit also contains sub-reddits such as AskDocs, DiagnoseMe, or Bipolar where users share information about their health-related issues. To create a labeled training data set, we used the set of drug names mentioned in http://diego.asu.edu/downloads/publications/ADRMine/drug_names.txt to collect 1,626 Reddit posts containing at least one drug name. We then recruited a medical doctor (i.e. with more than 20 years of experience) to annotate the ADRs (mentions of adverse drug reactions) in the collected posts following the annotation guidelines suggested in (Karimi et al., 2015), which specify: 1) exclude Leading prepositions, qualifiers, or possessive adjectives from selecting the ADR span, to avoid inconsistency. For instance, in the sentence “ it increases my anxiety” only anxiety should be annotated; and 2) annotate all relevant contexts for an ADR concept. For example, in the sentence “ I have a severe muscle pain”, “severe muscle pain” should be annotated (not just “muscle pain”). To validate the labels, two of the authors manually checked again the annotations and found some ADRs which were not detected by the annotator; also, ambiguous ADRs where identified and discussed with the medical expert. From all the annotated posts, 600 posts with 9,326 sentences contained at least one ADR which were split into training and testing as shown in Table 2. We disclose a small subset of our dataset as supplementary material. Upon acceptance we will publish the complete annotated dataset.

Qualitative Analysis:

False Positives. Manual inspection of the posts reveal that most of the false positives are due to 1) Mis-recognizing indications as an ADR, i.e. an illness for which the drug has been prescribed is recognized as an adverse drug reaction (Chowd-hury et al., 2018). For instance in the two posts“I started effexor after having pretty severe post-partum depression”and “depression hurts cymbalta can help”, depression is labeled as ADR even though it is an indication. However, depression commonly occur as ADR as well in other posts, which might be the cause for this error (Chowdhury et al., 2018); 2) Ignoring negative verbs. As an example the word manic in “The only one that didn’t make me manic, Wellbrutin” and vomiting in “@uclaibd I never had bleed-ing or vomiting just a lot of fatigue” are detected as ADRs due to the structure of the posts. However the model was not able to distinguish the negative verbs; 3) Mis-labeling ADR-related words as an ADR: For instance in the post “temperature would start to rise, depression weakens” the word depression was recognized as ADR; 4) Mistakes in manual annotation in the test data. For instance in the Tweet ”Ive had no appetite since I started on prozac” , the annotators did not annotate no appetite as an ADR. However, our model was able to predict it correctly as an ADR, but due to this mistake in test data is considered a false positive.

False Negatives. False negatives are likely to occur in posts that are ambigious or overly complex. For example, in the post “Im just wondering if its safe to take tramadol 15h after vyanse and if promethazine and melatonin would lower my chances of a seizure” the word seizure was not detected as an ADR. It must be noted how, in this specific case, even human annotators debated if seizure is indeed an ADR of tramadol, or an indication of vyanse. In another example “Am I the only one that grinds the shit out of their teeth on Vyvanse”. The expression grinds the shit out of their teeth is a long description of the slang ADR teeth grind, which has been described in a very unstructured and informal way. This is hard to handle for phrase detectors like CRF or BLSTM-RNN as some level of abstraction would be necessary to deal with this.

True Positives. VAE was able to detect terms such as "tiiiiired", "zombieish", "stomach hurt" which were not detected by the other methods. In general VAE is good at detecting unigrams/ bigrams even with small amount of data however is still not able to recognise long phrases such as "may not switch your brain", "3 days of hell", "electronic shocks in your brain" which we are not detected by the other comparison methods as well.

Google Sites

Report abuse