First Set of Training Data

The first set of training data contains a total of 900 training instances across 6 fine grained categories of insincere questions. Each instance contains two fields : qid and Label.

qid is question-id field from the Quora labelled dataset. Due to Quora dataset sharing constraints, we cant provide the text of the question corresponding each qid in our dataset. Hence please extract the corresponding question (text) from Quora Insincere question dataset available at: https://www.kaggle.com/c/quora-insincere-questions-classification/data.

Label is manual labelling of the question into the 6 fine grained categories. The label category mapping is as follows

Label Category

1 rhetorical

2 sexual content

3 hate speech

4 hypothetical

5 other

0 Not an insincere question


Training data will be shared with the registered participants.

Please note that the use of this dataset is strictly restricted to academic research only