Opinion dataset with classes (Noise, objective, Positive, Negative, neutral sentiment, question, ads, miscellaneous)
Dataset length - 5000 for each social media (twitter and reddit)
This dataset has three level-annotations:
Level 1: It has three classes NOISE, OBJECTIVE, SUBJECTIVE and these three classes are marked with 0,1,2 respectively.
Level 2: It divides the SUBJECTIVE class further into three categories: NEUTRAL, NEGATIVE,POSITIVE and these are marked with 0,1,2 respectively.
Level 3: It divides the NEUTRAL class further into Four categories: NEUTRAL SENTIMENTS, QUESTIONS, ADVERTISEMENTS, MISCELLANEOUS and these are marked with 0,1,2,3 respectively.
A post which is in QUESTIONS class will have Level 1 marking - 2
Level 2 marking - 0
Level 3 marking - 1
A post which is in OBJECTIVE class will have Level 1 marking - 1
Level 2 marking -[Blank]
Level 3 marking -[Blank]
A post which is in NEGATIVE class will have Level 1 marking - 2
Level 2 marking - 1
Level 3 marking - [Blank]
QnA Dataset.
Queries and comments
length - approx 30k for reddit dataset
marked as relevant and not relevant.
Here every Qs is given with respective comments and relevant score/likes.
The column "Relevant" contains binary labels -- relevant or not
Training data: Mailed to the registered candidates
Test Data: Mailed to the registered candidates.