Benefit from the development of Social networks, users can easily and rapidly spread information to others. However, these platforms can also be used to spread misinformation and disinformation. Therefore, checking tweets' contexts and reply contexts are crucial works for blocking fake information.
Our dataset is collected from Twitter, and includes 209,008 samples. We provide three files: training data, development data that can be used during practice, and evaluation data used for ranking the submissions (will be release on the first day of the competition). Positive samples are the tweets with the hashtag #fakenews. Negative samples are from EmotionGIF last year.
More information about our dataset is available here.
Given the labeled training data, you will need to predict the label (fake news) for all the tweets in the unlabeled evaluation dataset.
Participants have to register both a google_sheet and a codalab account to join the SocialNLP2021 competition. SocialNLP2021 competition link will be sent to participants email on April 30, where you can download our datasets and upload your predictions.
The metric that will be used to evaluate entries is F1-score from Scikit-learn.
More information about the reported metrics.