Dr. Sanaa Kaddoura

Datasets

This page presents a collection of datasets developed as part of my research in artificial intelligence and LLMs. Each dataset has been carefully curated and annotated to support the development and evaluation of machine learning and large language models. These resources are intended to support researchers, students, and practitioners working on socially impactful AI applications. Detailed descriptions and access information are provided for each dataset below.

Multiclass English Hate Speech Dataset

If you use the dataset, cite this paper: "Mapping Multiclass-Targeted Hate Speech in Online Discourse: An Open Dataset"

Paper Link

Dataset for Arabic Word Sense Disambiguation

If you use the dataset, cite this paper: "A Comprehensive Dataset for Arabic Word Sense Disambiguation." Data in Brief (2024): 110591

Paper Link

Dataset for Arabic Spam and Ham Tweets

If you use the dataset, cite this paper: "Dataset of Arabic spam and ham tweets"

Paper Link

Page updated

Google Sites

Report abuse