Related Work

Misinformation Detection

The internet is free and available to everyone, where everybody can read, share, publish information in any kind of format (i.e. text, books, videos . . . etc). That implies that information and knowledge can be produced and transmitted at unprecedented speed and breadth, which has positively affected and contributed to the well-being of the world population. Yet, there is also a drawback of that ease of producing and sharing information at that scale and speed, which is the prevalence of misinformation on the web.

Being a breeding ground for misinformation, studying and detecting misinformation on the web has attracted many scholars from many areas. For example, investigations have been made to study the factors that enhance or lower the credibility of information and what are the methods that users can use to evaluate and assess the credibility of online sources. In addition, many surveys have been published that assess the credibility of information on the web and particularly on Twitter. In addition, scholars have proposed models of how misinformation propagates in social networks and how their spread may be mitigated. More research has been done to automate the assessment of posts on social media. The majority of the research done in that domain worked on engineering features that enable the automatic detection of fake news, rumors and falsified contents in social media platforms. For instance, Gupta et al. employs the influence patterns and social reputation to predict if images shared on Twitter are fake or legit. Kwon et al detect linguistic, structural and temporal features of rumors on Twitter.

Qazvinian et al aim to predict the factuality of tweets and discover the source of misinformation. In recent years, a particular theme of misinformation has gained a resurgence of interest, commonly known as Fake News. The increased interest largely comes from growing concerns and fears about the large impact of fake news on shaping opinions of the public. One of the major concerns about fake news is the speed and scope by which misinformation can spread online. Fuller et al highlighted that with the tremendous growth of online communication, the capability of people to get deceived through online systems has also grown. The outcome of the spread of fake news could dramatically affect our life financially and politically. For example, Gentzkow revealed that fake news was widely shared during the 3 months prior to the election with 30 million total Facebook shares of 115 known pro-Trump fake stories and 7.6 million of 41 known pro-Clinton fake stories. In addition, fake news that surrounds natural disasters, such as Hurricane Sandy and the Japanese earthquake in 2011 could amplify panic and disorder.

While fake news has largely received attention from the public, they also have received increasing attention from the research community. Several attempts have been made to detect and understand the source, characteristics and nature of fake news. Those attempts have been classified as a) content-based identification which classifies news into fake or legit based on the content of the information being published, b) feedback-based methods which depend on the feedback received by users on social media and c) intervention-based techniques that aim to build computational solution to actively detect and mitigate the spread of fake news. Detecting fake news based on their contents could be based on the linguistic cues and features to discriminate fake news from legit news. For example, Driscoll et al was one of the first to attempt to detect fake news by studying transcripts of statements made by individuals in criminal investigations, they determined the credibility of information given in those statements by using a method called Scientific Content Analysis (SCAN) which identifies cures of deception in criminal investigation statements. Such method would be considered impractical, since it relies on using experienced trained professionals to identify the veracity of information. Yet, using linguistic cues based methods have the limitation of being generalizable over different topics and domains.

Detecting misinformative YouTube videos

Detection of misinformation. The internet is free and available to everyone, where everybody can read, share, publish information in any kind of format (i.e. text, books, videos . . . etc). That implies that information and knowledge can be produced and transmitted at unprecedented speed and breadth, which has positively affected and contributed to the well-being of the world population. Yet, there is also a drawback of that ease of producing and sharing information at that scale and speed, which is the prevalence of misinformation on the web.

Being a breeding ground for misinformation, studying and detecting misinformation on the web has attracted many scholars from many areas. For example, investigations have been made to study the factors that enhance or lower the credibility of information and what are the methods that users can use to evaluate and assess the credibility of online sources. In addition, many surveys have been published that assess the credibility of information on the web and particularly on Twitter. In addition, scholars have proposed models of how misinformation propagates in social networks and how their spread may be mitigated. More research has been done to automate the assessment of posts on social media. The majority of the research done in that domain worked on engineering features that enable the automatic detection of fake news, rumors and falsified contents in social media platforms. For instance, Gupta et al. employs the influence patterns and social reputation to predict if images shared on Twitter are fake or legit. Kwon et al detect linguistic, structural and temporal features of rumors on Twitter.

Qazvinian et al aim to predict the factuality of tweets and discover the source of misinformation. In recent years, a particular theme of misinformation has gained a resurgence of interest, commonly known as Fake News. The increased interest largely comes from growing concerns and fears about the large impact of fake news on shaping opinions of the public. One of the major concerns about fake news is the speed and scope by which misinformation can spread online. Fuller et al highlighted that with the tremendous growth of online communication, the capability of people to get deceived through online systems has also grown. The outcome of the spread of fake news could dramatically affect our life financially and politically. For example, Gentzkow revealed that fake news was widely shared during the 3 months prior to the election with 30 million total Facebook shares of 115 known pro-Trump fake stories and 7.6 million of 41 known pro-Clinton fake stories. In addition, fake news that surrounds natural disasters, such as Hurricane Sandy and the Japanese earthquake in 2011 could amplify panic and disorder.

While fake news has largely received attention from the public, they also have received increasing attention from the research community. Several attempts have been made to detect and understand the source, characteristics and nature of fake news. Those attempts have been classified as a) content-based identification which classifies news into fake or legit based on the content of the information being published, b) feedback-based methods which depend on the feedback received by users on social media and c) intervention-based techniques that aim to build computational solution to actively detect and mitigate the spread of fake news. Detecting fake news based on their contents could be based on the linguistic cues and features to discriminate fake news from legit news. For example, Driscoll et al was one of the first to attempt to detect fake news by studying transcripts of statements made by individuals in criminal investigations, they determined the credibility of information given in those statements by using a method called Scientific Content Analysis (SCAN) which identifies cures of deception in criminal investigation statements. Such method would be considered impractical, since it relies on using experienced trained professionals to identify the veracity of information. Yet, using linguistic cues based methods have the limitation of being generalizable over different topics and domains.