Spam SMS in Dravidian Languages
The Dravidian Spam SMS dataset has Spam and Ham messages in English, Tamil, Telugu, Kannada, and Malayalam languages. Nearly 7700 messages were collected by sending friends and other contacts a Google form. Language experts (reading and writing skills) were used to label the messages of corresponding languages carefully. The dataset also includes the Tamil verbatim messages written in English. For example, “Nee Nalama”. The Ham messages are mostly normal. Spam messages include business, annoying, and unnecessary messages an anonymous user sends. Detailed information on the dataset is given in the image. The dataset does not have the user's personal or banking information like the other datasets.
Dataset is available in IEEE Dataport. To Download Please refer to the Link : dx.doi.org/10.21227/dcym-pd69
Cite this dataset as : Ramanujam Elangovan, Abirami A M, June 2, 2023, "Spam SMS in Dravidian Languages", IEEE Dataport, doi: https://dx.doi.org/10.21227/dcym-pd69.
Spam SMS in Hindi Language
The Hindi Spam SMS Dataset comprises 3,894 messages, each labeled as either spam or ham. This dataset was meticulously curated with contributions from students who encountered these messages in their daily lives. The messages were collected from their own experiences as well as those shared by friends and peers, ensuring a diverse and realistic representation of SMS communication in Hindi. It offers a representative sample of real-world Hindi text messages for analysis. The dataset primarily contains messages written in Hindi, reflecting its origin's linguistic and cultural context. The ham messages include normal conversations, while spam messages typically consist of unsolicited promotional content, irrelevant information, or annoying messages from anonymous users. Importantly, this dataset has been curated with privacy considerations and does not include sensitive personal or financial information, distinguishing it from other datasets in this domain..
Dataset is available in IEEE Dataport. To download, Please refer to the Link: https://dx.doi.org/10.21227/5y8x-n678
Cite this dataset as : Rajkamal Tutu Ponnekanty Y, Ashutosh Sahoo, Faizal Shanavas Puthiyaveettil, Ramanujam Elangovan, Abirami A M, December 18, 2024, "Spam SMS in Hindi Language", IEEE Dataport. https://dx.doi.org/10.21227/5y8x-n678.