Police officers spend many hours a week documenting their findings when reporting to a 911 call. There is so much detail in these reports that they remain an untapped resource for future data analytics by the police department. The reports are currently being analyzed by human experts and categorized into the following categories: “Substance Abuse”, “Mental Health”, “Domestic/Social”, “Nondomestic/Social”, and “Other”. To assist the experts and reduce the amount of time that is spent on reading and analyzing, we are proposing the use of large language models (LLMs) to tag police reports based on their content.
Kennesaw State University has been given four years of police reports and the corresponding data to use for research. There are approximately 196,000 reports. 300 reports have been annotated by social workers to identify tags such as "Domestic/Social", "Non-domestic/Social", "Mental Health", and "Substance Abuse." Two models, Mistral-7B and TinyLlama, have been trained and fine-tuned to reduce the time needed to complete police report documentation. Both models output both the tag and the reason for the chosen tag, so one of the potential uses is for it to be used to train human analyzers in the future.