Datasets

QAES dataset is the first publicly available trait-specific annotations for Arabic Automated Essay Scoring, built on the Qatari Corpus of Argumentative Writing (QCAW). It includes a diverse collection of essays in Arabic, each of them annotated with holistic and trait-specific scores, including Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Grammar. In total, it comprises 195 Arabic essays (with lengths ranging from 239 to 806 words) across two distinct argumentative writing tasks.

Download

- Download QAES dataset from here.

Related Publication

- May Bashendy, Salam Albatarni, Sohaila Eltanbouly, Eman Zahran, Hamdo Elhuseyin, Tamer Elsayed, Walid Massoud, and Houda Bouamor. QAES: First Publicly-Available Trait-Specific Annotations for Automated Scoring of Arabic Essays. Proceedings of Second Arabic Natural Language Processing Conference (ArabicNLP'24), Thailand, 2024.

Authority Finding in Twitter Dataset

This dataset is offered as a shared task (Task 5: Authority Finding in Twitter) at CheckThat! 2023 Lab. The task is defined as follows: Given a tweet stating a rumor, a model has to retrieve a ranked list of authority Twitter accounts that can help verify the rumor, i.e., they may tweet evidence that supports or denies the rumor. This dataset is offered in Arabic. The collection comprises 150 rumors (expressed in tweets) associated with a total of 1,044 authority accounts and a user collection of 395,231 Twitter accounts (members of 1,192,284 unique Twitter lists).

Download

- Download the dataset from here.

Related Publications

Alberto Barrón-Cedeño, Firoj Alam, Tommaso Caselli, Giovanni Da San Martino, Tamer Elsayed, Andrea Galassi, Fatima Haouari, Federico Ruggeri, Julia Maria Struß, Rabindra Nath Nandi, Gullal S. Cheema, Dilshod Azizov, and Preslav Nakov. The CLEF-2023 CheckThat! Lab: Checkworthiness, Subjectivity, Political Bias, Factuality, and Authority of News Articles and Their Sources. ECIR 2023.
Fatima Haouari and Tamer Elsayed: Detecting Stance of Authorities towards Rumors in Arabic Tweets: A Preliminary Study. ECIR 2023.

AuSTR: The First Authority STance towards Rumors Dataset

AuSTR is the first Authority STance towards Rumors (AuSTR) dataset, where evidence is retrieved from authority timelines in Arabic Twitter. AuSTR contains 409 pairs covering 171 unique claims, where 41 are true and 130 are false. Among those pairs, 118 are disagree (29%), 62 are agree (15%), and 229 are unrelated (56%).

Download

- Download the dataset from here.

Related Publication

- Fatima Haouari and Tamer Elsayed: Detecting Stance of Authorities towards Rumors in Arabic Tweets: A Preliminary Study. ECIR 2023

IDRISI: Large-scale Twitter Location Mention Prediction Dataset

IDRISI is the largest-scale publicly-available Twitter Location Mention Prediction (LMP) dataset, in both English and Arabic languages. Named after Muhammad Al-Idrisi, who is one of the pioneers and founders of the advanced geography.

Download

- Download the dataset from here.

Related Publications

To be listed soon.

ArPFN: Arabic User Credibility Dataset

ArPFN is first Arabic users dataset which was developed for the task of identifying users who are prone to spread fake news in Arabic Twitter by leveraging two Arabic misinformation datasets, ArCOV19-Rumors and AraFacts. ArPFN consists of 1,546 users, of which 541 are prone to spread fake news.

Download

- Download the dataset from here.

Related Publication

- Zien Sheikh Ali, Abdulaziz Al-Ali, and Tamer Elsayed: Detecting Users Prone to Spread Fake News on Arabic Twitter. OSACT 2022

QRCD: Qur'anic Reading Comprehension Dataset

QRCD is composed of 1,093 tuples of question-passage pairs that are coupled with their extracted answers to constitute 1,337 question-passage-answer triplets. A question might have more than one answer in the passage; therefore, a typical reading comprehension system is expected to extract all of them and return a ranked list of answer spans.

Download

- Download v1.2 of the dataset in JSONL (JSON Lines) format from here.
- Download v1.1 of the dataset in JSONL (JSON Lines) format from here.
- Download v1.1 of the dataset in JSON format similar to the SQuAD v1.1 format from here.

Related Publications

- Rana Malhas and Tamer Elsayed. Arabic Machine Reading Comprehension on the Holy Qur’an using CL-AraBERT. Information Processing & Management, 59(6), p.103068, 2022.
- Rana Malhas and Tamer Elsayed: AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(6), pp.1-21, 2020.

AyaTEC: Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an

AyaTEC is a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task. AyaTEC includes 207 questions (with their corresponding 1,762 answers) covering 11 topic categories of the Holy Qur’an that target the information needs of both curious and skeptical users. The answers to the questions (each represented as a sequence of verses) in AyaTEC were exhaustive—that is, all qur’anic verses that directly answered the questions were exhaustively extracted and annotated.

Download

- Download v1.2 of the dataset from here.
- Download v1.1 of the dataset from here.

Related Publication

- Rana Malhas and Tamer Elsayed: AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(6), pp.1-21, 2020.

ArCov19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection

ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020. We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims. We then manually-annotated the tweets by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic. We aim to support two classes of misinformation detection problems over Twitter: verifying free-text claims (called claim-level verification) and verifying claims expressed in tweets (called tweet-level verification). Our dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious.

Download

- Download the dataset from here.

Related Publication

- Fatima Haouari , Maram Hasanain , Reem Suwaileh and Tamer Elsayed: ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection . WANLP 2021.

ArCov-19: Arabic COVID-19 Twitter Dataset

ArCOV-19 is an Arabic COVID-19 Twitter dataset that covers the period from 27th of January till 31st of March 2020 (and still ongoing). It is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic that includes around 748k popular tweets (according to Twitter search criterion) alongside the propagation networks of the most-popular subset of them. The propagation networks include both retweets and conversational threads (i.e., threads of replies). ArCOV-19 is designed to enable research under several domains including natural language processing, data science, and social computing, among others.

Download

- Download the dataset from here.

Related Publication

- Fatima Haouari , Maram Hasanain , Reem Suwaileh and Tamer Elsayed: ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks. WANLP 2021.

CheckThat! 2021 Fact Checking Arabic Datasets (Tasks 1,2)

Our members, Maram Hasanain, Fatima Haouari, Watheq Mansour, Zien Sheikh Ali, and Dr. Tamer Elsayed, built the Arabic datasets for Tasks 1 and 2 at CheckThat! 2021 lab. The tasks are defined as follows:

Task 1 - Check-Worthiness Estimation : Given a claim, detect whether it is worth fact-checking.
Task 2 - Verified Claim Retrieval: Given a check-worthy claim, and a set of previously fact-checked claims, determine whether the claim has been previously fact-checked.

Download

- Task 1 datasets can be found here.
- Task 2 datasets can be found here.

Related Publications

- Shaden Shaar, Maram Hasanain, Bayan Hamdan, Zien Sheikh Ali, Fatima Haouari, Alex Nikolov, Mucahid Kutlu, Yavuz Selim Kartal, Firoj Alam, Giovanni Da San Martino, Alberto Barrón-Cedeño, Ruben Miguez, Javier Beltrán, Tamer Elsayed, Preslav Nakov: Overview of the CLEF-2021 CheckThat! Lab Task 1 on Check-Worthiness Estimation in Tweets and Political Debates. Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum
- Shaden Shaar, Fatima Haouari, Watheq Mansour, Maram Hasanain, Nikolay Babulkov, Firoj Alam, Giovanni Da San Martino, Tamer Elsayed, Preslav Nakov: Overview of the CLEF-2021 CheckThat! Lab Task 2 on Detecting Previously Fact-Checked Claims in Tweets and Political Debates. Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum
- Preslav Nakov, Giovanni Da San Martino, Tamer Elsayed, Alberto Barrón-Cedeño, Rubén Míguez, Shaden Shaar, Firoj Alam, Fatima Haouari, Maram Hasanain, Watheq Mansour, Bayan Hamdan, Zien Sheikh Ali, Nikolay Babulkov, Alex Nikolov, Gautam Kishore Shahi, Julia Maria Struß, Thomas Mandl, Mucahid Kutlu, and Yavuz Selim Kartal: Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. CLEF 2021
- Preslav Nakov, Giovanni Da San Martino, Tamer Elsayed, Alberto Barrón-Cedeño, Rubén Míguez, Shaden Shaar, Firoj Alam, Fatima Haouari, Maram Hasanain, Nikolay Babulkov, Alex Nikolov, Gautam Kishore Shahi, Julia Maria Struß, and Thomas Mandl: The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. ECIR 2021

ArTest: The First Test Collection for Arabic Web Search with Relevance Rationales

ArTest is the first large-scale test collection designed for the evaluation of ad-hoc search over the Arabic Web. ArTest uses ArabicWeb16, a collection of around 150M Arabic Web pages as the document collection, and includes 50 topics, 10,529 relevance judgments, and (more importantly) a rationale behind each judgment.

Download

- Download the dataset from here.

Related Publication

- Maram Hasanain, Yassmine Barkallah, Reem Suwaileh, Mucahid Kutlu, and Tamer Elsayed: ArTest: The First Test Collection for Arabic Web Search with Relevance Rationales. SIGIR 2020.

Background Relevance Dataset: Annotations and Analysis for Background Linking

We built this dataset by annotating a subset of the query articles and their corresponding judged articles provided by TREC 2018 news track dataset. We annotated 227 articles, 25 query articles and 202 judged articles (an average of 8 per query) distributed as follows: 51 judged articles of relevance 4, 35 of relevance 3, 33 of relevance 2, 33 of relevance 1, and 50 of 0 relevance.

Download

- Download the dataset from here.

Related Publication

- Marwa Essam and Tamer Elsayed: Why is That a Background Article: A Qualitative Analysis of Relevance for News Background Linking. CIKM 2020

CheckThat! 2020 Arabic Datasets (Tasks 1,2,3)

Our members, Maram Hasanain, Fatima Haouari, Reem Suwaileh, Zien Sheikh Ali, and Dr. Tamer Elsayed, built the Arabic datasets for Tasks 1, 2, and 3 at CheckThat! 2020 lab. Tasks are defined as follows:

Task 1 - Check-Worthiness on tweets: Predict which tweet from a stream of tweets on a topic should be prioritized for fact-checking.
Task 2 - Verified claim retrieval: Given a check-worthy tweet claim, and a set of previously-checked claims, determine whether the claim has been already fact-checked.
Task 3 - Evidence retrieval: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of evidence snippets for the claim.

Related Publications

Maram Hasanain, Fatima Haouari, Reem Suwaileh, Zien Sheikh Ali, Bayan Hamdan, Tamer Elsayed, Alberto Barrón-Cedeño, Giovanni Da San Martino, Preslav Nakov: Overview of CheckThat! 2020 Arabic: Automatic Identification and Verification of Claims in Social Media. Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum
Shaden Shaar, Alex Nikolov, Nikolay Babulkov, Firoj Alam, Alberto Barrón-Cedeño, Tamer Elsayed, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Giovanni Da San Martino, Preslav Nakov: Overview of CheckThat! 2020 English: Automatic Identification and Verification of Claims in Social Media. Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum
Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Hamdan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali: Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media. CLEF 2020

CheckThat! 2019 Arabic Dataset (Task 2)

Our members, Maram Hasanain, Reem Suwaileh, and Dr. Tamer Elsayed, built Task 2 dataset at CheckThat! 2019 lab.

Task Definition

Given a claim associated with a set of Web pages P (that constitute the results of Web search in response to using the claim as a search query), identify which of the Web pages (and passages of those Web pages) can be useful in assisting a human who is fact-checking the claim.

More details about the task can be found here.

Download

You can download data from here.

Related Publication

- Maram Hasanain, Reem Suwaileh, Tamer Elsayed, Alberto Barrón-Cedeño, Preslav Nakov: Overview of the CLEF-2019 CheckThat! Lab: Automatic Identification and Verification of Claims. Task 2: Evidence and Factuality. Working Notes of CLEF 2019- Conference and Labs of the Evaluation Forum
- Tamer Elsayed, Preslav Nakov, Alberto Barrón-Cedeño, Maram Hasanain, Reem Suwaileh, Giovanni Da San Martino, and Pepa Atanasova: Overview of the CLEF-2019 CheckThat! Lab: Automatic Identification and Verification of Claims. CLEF 2019

CheckThat! 2018 Arabic Datasets (Tasks 1,2)

Our members, Reem Suwaileh and Dr. Tamer Elsayed, build Task 1 and 2 datasets at CheckThat! 2018 lab.

Task 1 Definition: Given a transcription of a political debate/speech, predict which claims should be prioritized for fact-checking.
Task 2 Definition: Given a check-worthy claim in the form of a (transcribed) sentence, determine whether the claim is likely to be true, half-true, or false.

Download

You can download data from here.

Related Publications

- Preslav Nakov, Alberto Barrón-Cedeño, Tamer Elsayed, Reem Suwaileh, Lluís Màrquez, Wajdi Zaghouani, Pepa Atanasova, Spas Kyuchukov, and Giovanni Da San Martino: Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. CLEF 2018
- Pepa Atanasova, Lluis Marquez, Alberto Barron-Cedeno, Tamer Elsayed, Reem Suwaileh, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, Preslav Nakov: Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness. Working Notes of CLEF 2018- Conference and Labs of the Evaluation Forum
- Alberto Barron-Cedeno, Tamer Elsayed, Reem Suwaileh, Lluis Marquez, Pepa Atanasova, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, Preslav Nakov: Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 2: Factuality .Working Notes of CLEF 2018- Conference and Labs of the Evaluation Forum

Web Search for Fact Checking Dataset

Fact Checking Dataset: supports the problem of re-ranking Web search results for better fact-checking. The dataset is a test collection that comprises 22 claims and 20 Web search results for each claim collected from a commercial search engine.

Download

- Download the dataset from here.

Related Publication

- Khaled Yasser, Mucahid Kutlu, and Tamer Elsayed. Re-ranking Web Search Results for Better Fact-Checking: A Preliminary Study. Proceedings of The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy.

WebCrowd25k

WebCrowd25k dataset includes three related parts:

Crowd Relevance Judgments. 25,099 information retrieval relevance judgments collected on Amazon’s Mechanical Turk platform. For each of the 50 search topics from the 2014 NIST TREC WebTrack, we selected 100 ClueWeb12 documents to be re-judged (without reference to the original TREC assessor judgment) by 5 MTurk workers each (50 topics x 100 documents x 5 workers = 25K crowd judgments). Individual worker IDs from the platform are hashed to new identifiers. We collect relevance judgments on a 4-point graded scale. (See SIGIR’18 & HCOMP’18 papers).
Behavioral Data. For a subset of the judgments, we also collected behavioral data charactering worker behavior in performing the relevance judging. Behavioral data was recorded using MmmTurkey, which captures a variety of worker interaction behaviors while completing MTurk Human Intelligence Tasks. (See HCOMP’18 paper)
Disagreement Analysis. We inspected 1000 crowd judgments for 200 documents (5 judgments per document, where the aggregated crowd judgment differs from the original TREC assessor judgment), and we classified each disagreement according to our disagreement taxonomy. (See SIGIR’18 paper.)

Download

Download the entire dataset here. Please refer to the included README files and associated publications for further details.
Another source for download is here.

Related Publications

Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations. In Proceedings of the 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2018.
Mucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, Tamer Elsayed, and Matthew Lease. Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? In Proceedings of the 41st international ACM SIGIR conference on Research and development in Information Retrieval, 2018.

Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), pages 139-148, 2016. Best Paper Award. [ pdf | blog-post |data | slides ]
Brandon Dang, Miles Hutson, and Matthew Lease. MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Amazon Mechanical Turk. In 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, 2016. 3 pages. arXiv:1609.00945. [ pdf | sourcecode ]

EveTAR: The first Arabic Test Collection for multiple Information Retrieval Tasks in Twitter

The first Arabic Test Collection for multiple information retrieval tasks in Twitter. It supports Event detection, Ad-hoc search, Timeline generation, and Real-time summarization. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with a substantial average inter-annotator agreement (Kappa value of 0.71).

Related publications

[December 21st, 2017] Final published Information Retrieval Journal (IRJ) article on Springer that describes the 2nd version of the collection:
- Hasanain, M., Suwaileh, R., Elsayed, T., Kutlu, M., Almerekhi, H. EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets. Information Retrieval Journal (2017). https://doi.org/10.1007/s10791-017-9325-7
[July 17th, 2016] SIGIR 2016 paper that describes the first version of the collection:
- Hind Almerekhi, Maram Hasanain, and Tamer Elsayed. EveTAR: A New Test Collection for Event Detection in Arabic Tweets. Proceedings of the 39th annual international ACM SIGIR conference on Research and development in information retrieval: SIGIR ’16, Pisa, Italy, July 2016. Download.

Download

Download EveTAR v2.0 (supporting 4 tasks, and with 4 subsets) from here.
Download EveTAR v1.0 (supporting event detection only) from here.

ArabicWeb16: Largest Public Arabic Web Crawl

A public Web crawl of 150,211,934 Arabic Web pages with high coverage of dialectal Arabic as well as Modern Standard Arabic (MSA). We expect ArabicWeb16 to support various research areas such as ad-hoc search, question answering, filtering, cross-dialect search, dialect detection, entity search, blog search, and spam detection among others.

Download

For further information on dataset download and statistics, visit ArabicWeb16 website.

Related Publication

Reem Suwaileh, Mucahid Kultlu, Nihal Fathima, Tamer Elsayed, and Matthew Lease. ArabicWeb16: A New Crawl for Today’s Arabic Web. Proceedings of the 39th annual international ACM SIGIR conference on Research and development in information retrieval: SIGIR ’16, Pisa, Italy, July 2016.

DART: Dialectal Arabic Tweets Dataset

Dialectal Arabic Tweets (DART) Dataset is a new large manually-annotated multi-dialect dataset of Arabic tweets. The Dialectal ARabic Tweets (DART) dataset has about 25K tweets that are annotated via crowdsourcing, and it is well-balanced over five main groups of Arabic dialects: Egyptian, Maghrebi, Levantine, Gulf, and Iraqi.

Download

- Download the dataset from here (.zip).

Related Publication

Khaled Yasser, Mucahid Kutlu, and Tamer Elsayed: Re-ranking Web Search Results for Better Fact-Checking: A Preliminary Study. CIKM 2018

AutoTweet: Dataset for Detecting Automatically-Generated Arabic Tweets

We provide two datasets to study automation behavior in Arabic tweets. The 2 datasets are released in tab-separated text files. We describe the content of each as follows:

Full Dataset: it contains a total of 11764 unique UserIDs and a total of 1281708 TweetIDs.
Labeled Tweets dataset: it contains the TweetIDs, TweetText, and Labels of a total of 3503 tweets that were obtained from crowdsourcing. 1944 of the tweets are labeled as automated tweets and 1559 are labeled as manual tweets.

Download

- Download AutoTweet-Dataset-v1.0 from here (.zip).

Related Publications

Hind Almerekhi and Tamer Elsayed: Detecting Automatically-Generated Arabic Tweets. AIRS 2015

Journalists Questions on Twitter

We provide 2 datasets to support question identification and question-type classification in Arabic tweets of journalists. The 2 datasets are released in tab-separated text files. We describe the content of each as follows:

Labelled Tweets dataset: A list of tweets' ids for Arabic tweets labelled by crowdsourcing. Each tweet is associated with one label: question tweet or not. A question tweet is a tweet that has at least one interrogative question.
Labelled Question Tweets dataset: A list of tweets' ids for Arabic question tweets labelled by recruiting in-house annotators. Each question tweet is associated with one label which is the question type (given a taxonomy of 8 types).

Download

- ArQAT-JQ-Dataset-v1.0: download zip file.

Related Publication

Maram Hasanain, Mossaab Bagdouri, Tamer Elsayed, Douglas Oard: What Questions Do Journalists Ask on Twitter?. The Workshops of AAAI Conference on Web and Social Media 2016

Answerable Question Identification in Arabic Tweets

Download

- ArQAT-AQI-Dataset-v1.0: download txt file.

Related Publication

- Maram Hasanain, Tamer Elsayed, and Walid Magdy: Identification of Answer-Seeking Questions in Arabic Microblogs. CIKM 2014

Question Identification in Arabic Tweets

Download

- ArQAT-QI-Dataset-v1.0: download zip file.

Related Publication

- Maram Hasanain, Tamer Elsayed, and Walid Magdy: Identification of Answer-Seeking Questions in Arabic Microblogs. CIKM 2014

Google Sites

Report abuse

Datasets

We strive to design and create Arabic datasets for practical tasks and make them publicly available for the community to advance research on Arabic IR and Arabic NLP.

QAES: The First Trait-Specific Annotations for Arabic Automated Essay Scoring

Download

Related Publication

Authority Finding in Twitter Dataset

Download

Related Publications

AuSTR: The First Authority STance towards Rumors Dataset

Download

Related Publication

IDRISI: Large-scale Twitter Location Mention Prediction Dataset

Download

Related Publications

ArPFN: Arabic User Credibility Dataset

Download

Related Publication

QRCD: Qur'anic Reading Comprehension Dataset

Download

Related Publications

AyaTEC: Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an

Download

Related Publication

ArCov19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection

Download

Related Publication

ArCov-19: Arabic COVID-19 Twitter Dataset

Download

Related Publication

CheckThat! 2021 Fact Checking Arabic Datasets (Tasks 1,2)

Download

Related Publications

ArTest: The First Test Collection for Arabic Web Search with Relevance Rationales

Download

Related Publication

Background Relevance Dataset: Annotations and Analysis for Background Linking

Download

Related Publication

CheckThat! 2020 Arabic Datasets (Tasks 1,2,3)

Related Publications

CheckThat! 2019 Arabic Dataset (Task 2)

Task Definition

Download

Related Publication

CheckThat! 2018 Arabic Datasets (Tasks 1,2)

Download

Related Publications

Web Search for Fact Checking Dataset

Download

Related Publication

WebCrowd25k

Download

Related Publications

EveTAR: The first Arabic Test Collection for multiple Information Retrieval Tasks in Twitter

Related publications

Download

ArabicWeb16: Largest Public Arabic Web Crawl

Download

Related Publication

DART: Dialectal Arabic Tweets Dataset

Download

Related Publication

AutoTweet: Dataset for Detecting Automatically-Generated Arabic Tweets

Download

Related Publications

Journalists Questions on Twitter

Download

Related Publication

Answerable Question Identification in Arabic Tweets

Download

Related Publication

Question Identification in Arabic Tweets

Download

Related Publication

(c) 2024 bigIR research group at Qatar University