Dataset

Dataset for Task 1

Opinion dataset with classes (Noise, objective, Positive, Negative, neutral sentiment, question, ads, miscellaneous)

Level 1: It has three classes NOISE, OBJECTIVE, SUBJECTIVE and these three classes are marked with 0,1,2 respectively.

Level 2: It divides the SUBJECTIVE class further into three categories: NEUTRAL, NEGATIVE, POSITIVE and these are marked with 0,1,2 respectively.

Level 3: It divides the NEUTRAL class further into Four categories: NEUTRAL SENTIMENTS, QUESTIONS, ADVERTISEMENTS, MISCELLANEOUS and these are marked with 0,1,2,3 respectively.

A post which is in QUESTIONS class will have Level 1 marking - 2

Level 2 marking - 0

Level 3 marking - 1

A post which is in OBJECTIVE class will have Level 1 marking - 1

Level 2 marking -[Blank]

Level 3 marking -[Blank]

A post which is in NEGATIVE class will have Level 1 marking - 2

Level 2 marking - 1

Level 3 marking - [Blank]

Dataset for Task 2

For Reddit:

Queries - Approx 1K
Comments - Approx 26K
Marked as relevant and not relevant.

Here every Qs is given with respective comments and relevant score/likes.
The column "Relevant" contains binary labels -- relevant or not

For YouTube:

Two sub-tasks built from cryptocurrency YouTube transcripts.
- 1. Question Answering (Q&A): answer a question using its source transcript.
  2. Multiple-Choice (MCQ): select the correct option (1 of 4) for a question.
Each sub-task is split into train / validation / test (1000 / 250 / 500).
Detailed file formats and fields will be provided with the data.

Page updated

Google Sites

Report abuse