Classification of InSincere Question - Detailed Task Description

Community Question Answering(CQA) has seen a spectacular increase in popularity in the recent past. With the advent and popularity of sites like Yahoo! Answers , Cross Validated, Stack Overflow , Quora , more and more people now use these web forums to get answers to their questions. These forums give people the ability to post their queries online, and have multiple experts across the world answer them, while being able to provide their opinions or expertise to help other users, a quality that encourages more participation and consequently has lead to their popularity

While general purpose CQA forums typically contain Information Seeking Questions (ISQ), the ugly spectre of misbehavior raises its head often on these platforms, wherein individuals often post questions intended to troll specific groups/communities, intended to incite hate speech, or involving objectionable content. Individuals also post opinions in the disguise of questions, thereby inciting arguments on such CQA forums. Often rhetorical and insincere questions do get posted by invididuals as a joke on these forums. Hence content moderators on such CQA forums need to filter such questions so that the CQA site contains good quality content which is not offensive in nature. Such questions which are not true requests for information are termed as Non-Information Seeking Questions (NISQ) or Insincere Questions.

Given the scale of CQA forums on the web, identifying Insincere Questions becomes challenging by human moderators. Hence automated approaches are needed, in addition to manual moderation to filter insincere questions in CQA sites. While this can be simplistically formulated as a binary classification task (either a question is sincere/insincere), insincere or non-information seeking questions exhibit diverse characteristics and hence there are a number of reasons why a question may be classified as insincere. These include the following categories:

● Rhetorical questions: questions which are non-neutral and convey a opinion or take a stand. For instance, a true request for information is not about taking a stance or expressing an opinion. On the other hand, questions which try to phrase a opinion as a question are not requests for information and are typically insincere questions. Such questions are rhetorical in nature. A few examples of such questions are:

o Q1: Can we all now admit that President Trump doesn't really want Congress to pass legislation replacing DACA to protect dreamers?

o Q2: How much more political fumbling will it take for Republicans to turn on Trump?

Question Q1 carries the implicit opinion that Trump is against DACA and Q2 carries the implicit opinion that Trump has been fumbling in US politics. While these two questions are phrased as questions, neither of them are information seeking in nature. We call such insincere questions as rhetorical questions.

● Disparaging/Inflammatory questions: These are questions which are intended to insult/attack certain groups of people. For instance, often sexist and trolling comments are posted as questions which are intended to offend specific minority groups or individuals. A few examples of such questions are:

o Q3:Do Trump haters, by definition, realize that they are bigots?

o Q4: When will Hillary Clinton finally pay for her crimes and go to jail?

o Q5: Why are there so many dirty Jews on Quora?

On examining Q5, it is clear that the question does not seek any information and is intended to be disparaging and hurtful to a specific group of people namely Jews. Similarly Q4 is intended to disparage an individual namely Hilary Clinton.

● Hypothetical or Unreal questions: These are questions which are not real and meant to be fictitious. Typically they are hypothetical and have unreal context. Hence they are not true requests for information. Examples of such questions include:

o Q6: Have you ever shot your mom?

o Q7: If both Honey Singh and Justin Bieber fall from the 5th floor, who will survive?

It is easy to see that both Q6 and Q7 portray unreal situations and are not intended to requests for information.

● Objectionable content questions: These are questions which uses sexual content for shock value.

Given that insincere questions on CQA forums contain subtle different nuances, we propose the task of Fine Grained Categorization of Insincere Questions(FGCIQ) on CQA forums. Instead of considering this as a simple binary short text classification problem (CQA questions are typically short texts), we propose that segmenting insincere/non-information seeking questions will not only provide better identification, but also help to enable effective counter-measures based on the fine grained category it belongs to. While sexually explicit questions need immediate question takedown/user being suspended, rhetorical questions may only need the poser of the question to be cautioned.

Literature:

There has been no prior work on FGCIQ. Prior work has proposed treating this problem as a binary short text classification problem. We propose a fine grained categorization of CQA questions for identifying insincere questions into the following six categories namely

1. Rhetorical questions

2. Hate speech/inflammatory questions

3. Hypothetical questions

4. Sexually explicit/objectionable content questions

5. Other (which is the catch-all bucket for insincere questions which cannot be classified as any of the first four categories)

6. Sincere/true Information Seeking questions