Lexicon
Sentiment Lexicon
Sentiment analysis is an approach to text classification that assesses the degree of opinion or emotion in a given text, positive or negative, and is derived from consumer sentiment for a given product (Ignatow & Mihalcea, 2018). Sentiment analysis involves a type of artificial intelligence that iteratively teaches the computer to detect sentiment from the text (Ignatow & Mihalcea, 2018). Sentiment analyses are usually conducted by referencing a prebuilt, open-source library of sentiments (“sentiment lexicon”), which are textual collections of positive or negative sentiment (e.g., OpinionFinder). The sentiment score indicates how positive or negative the words comprising the narrative are. The higher the score, the greater the number of positive words making up the text.
Rule-based sentiment scoring and machine learning-based sentiment scoring are two commonly used methods in sentiment analysis. Given that we do not have training data for the machine learning method, and a machine learning-based sentiment tool trained on police reports does not exist, we apply the rule-based method. The rule-based method requires a dictionary or lexicon that defines the sentiment score on each word and with specific rules to handle negation and relationship in words such as adverbs (e.g., very happy is more positive than happy). We used the lexical-based approach to calculate the sentiment score for the entire report, the maximum sentiment score for the paragraph, and the maximum sentiment score for the paragraph (Baccianella et al., 2010).
The chart below links to an open-source, adaptable sentiment lexicon that shows a list of words and their iterations. This list was developed by our research team as a result of coding thousands of sexual assault police reports and observing examples of signaling language.
References:
Ignatow, G., & Mihalcea, R. (2018). An introduction to text mining: Research design, data collection, and analysis. SAGE Publications, Inc.
Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). LREC 2010, Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf