KCDD Dataset

This page is for datasets and benchmark code for review.

Dataset Card 


This paper presents a Korean Crime Dialogue Dataset (KCDD) as the first Korean dialogue dataset for classifying violence that occurred in offline settings. KCDD contains 22,249 dialogues and has four criminal classes that meet the international legal standards(ICCS) and one clean class <Serious Threats, Extortion or Blackmail, Harassment in the Workplace, Other Harassment, and Clean Dialogue>


The KCDD can be noncommercially used with a custom license CC-BY-NC 4.0.

Benchmark Description 

We then propose a Relationship-Aware BERT, as the strong baseline for the proposed dataset. The model shows that understanding varying relationships among interlocutors improves the performance of crime dialogue classification.


1) Preferences

 pip install -r requirements.txt 

2) Train & Eval

You can adjust the hyper-parameters via the config file inside the config folder.

When you run this code, it will automatically evaluate each model for the dev and test sets.

The trained models and evaluation results are saved in the ./ckpt folder.

python3 Relationship_aware_BERT.py --config_file Relationship_aware_BERT.json