Results and Discussion

After cleaning and preprocessing both datasets as described in the previous section, we trained classifiers as follows:

For the YouTube dataset, we have trained three classification algorithms: Multinomial Naïve Bayes (MNB), Stochastic Gradient Descent (SGD), and Logistic Regression (LR) on each topic individually. We found that logistic regression outperformed the other classifiers for every misinformation topic, reporting the best accuracy, F1-score, precision and recall.
For the dataset of Amazon reviews, we trained two classification algorithms: Multinomial Naïve Bayes (MNB) and Linear Support Vector Classifier, where the linear support vector classifier outperformed the Multinomial Naïve Bayes reporting an accuracy of 80.93% with 70.61%, 77.73% and 67.01% f1-Score, precision and recall respectively.
The following table reports the accuracy, f1-score, precision and recall for all algorithms trained on each topic for every platform.

Table: Testing Accuracy, Precision, Recall, F1-Score of each classifier trained on each dataset. MNB, SGD, and LR on YouTube. MNB and LSVC on Amazon.

Why Linear Support Vector Classifier (LSVC) is better than the Multinomial Naïve Bayes (MNB)?

It is commonly know that Naïve Bayes treats features as independent while SVC observes the interactions between features to a certain degree, we can observe that in our results on the Amazon reviews dataset where features are not independent as Naïve Bayes prefers, hence leaner SVM models better learns and captures the features and thus outperforms the Multinomial Naïve Bayes classifier.

Why Logistic Regression is better than other algorithms for the other datasets?

Logistic regression vs Multinomial Naïve Bayes: As mathematically proved, when the data grows large a discriminative model such as logistic regression is better than a generative modes such as multinomial naïve Bayes, which holds in our case since we have hundreds of thousands of comments under each topic as shown in the table. That's why logistic regression outperforms Multinomial Naïve Bayes.
Logistic regression vs Stochastic Gradient Descent: Both are discriminative, yet Stochastic Gradient Descent is an optimization technique well know for its ability to well scale to large datasets, yet it requires tuning a number of hyperparameters. Also, in logistic regression using gradient descent the optimizer goes through all data points to update the weights and achieve a global optima after sever iterations. On the other hand, stochastic gradient descent updates the weights by randomly sampling data points to update the weights instead of going through all samples as in logistic regression which accumulates errors and updates weight accordingly, then achieves a better classifier.

Due to the varying sized of both datasets, we tested different classification algorithms. And as we expected, user comments and reviews have huge potentials to help in detecting misinformative contents in online platforms.

Future Work

Although users comments and reviews gave great accuracy in detecting misinformative contents, there are other promising features that could help improve the classification results which require extensive feature engineering in the future. Such feature are but not limited to:

Video statistics: such as counts of views, likes, dislikes and comments.
Video transcription: the textual transcription of videos would greatly help in classifying their contents.
Like count on a comment: would give a signal on how important a comments is with respect to other comments on the same video.
Item description: would be a strong indicator of the stance of an item toward misinformation.
Item number of ratings: are potential features that would help in detecting misinformative items on Amazon.
Number of people who found a review helpful of an Amazon item is similar to the like count of a comment on a YouTube video. Which could help in signifying the importance of a review with respect to other reviews on the same item.
Item rating: while annotating Amazon items, we noticed that reviews with lower ratings are a great source of information on how an item is misinformative or biased, where customer with an opposing point of view to the content of an item (book or video) usually give lower ratings to such items when reviewing them.

Also, we will investigate in the future, the possibility of having a general and dynamic model that regardless of the topic of misinformation, could detect misinformative content based on user comments. We can approach that by investigating how classification algorithms would perform when trained on all topics instead of training a classifier for each misinformation topic.