Using Natural Language Processing to Detect Drug Discussion on Reddit
The National Drug Early Warning System (NDEWS), funded by the National Institute on Drug Abuse, aims to detect early signs of emerging drug usage by analyzing data from various sources and to disseminate information to public health, addiction, and drug epidemiology researchers. The organization’s mission came as a result of rising numbers of deaths from drug use, continued emergence of new psychoactive substances, changes in drug usage, and association between COVID-19 and drug use. In this research, conducted as one part of NDEWS, Natural Language Processing (NLP) algorithms were used to train a transformer neural network to monitor drug-related social media communities on Reddit. We quantify discussion related to drug use and identify trends in discussion resulting in the prediction of emerging new drug trends. Posts and comments from ~80 subreddits dedicated to drug discussion dating back to 2010 are matched by keyword search to historical drug trends surrounding the emergence of eight New Psychoactive Substances. A Transformer network pretrained for NLP is fine-tuned for Named Entity Recognition (NER) to determine whether or not a given post/comment contains a drug-related term. After testing, the network’s accuracy is assessed in terms of how accurately it classifies drug trends withheld during training. Future areas of research include modifying the datasets and network used to be able to classify data for more advanced search queries, such as flagging posts that contain a specific, desired term or determining what term is being discussed in posts flagged as containing a drug term.