Multi-modal Trolling Memes Identification in Dravidian Languages

Abstract

A meme is a combination of image(s) captioned with a witty, catchy, or sarcastic text. Though memes are created with the intention of having fun or to evoke humor, a person or group of people may become upset if a meme crosses the boundary of humor. Such memes are referred as troll memes and memes shared with the intention of trolling need to be filtered out from social media as they may hurt the sentiments of people and create unhealthy atmosphere in the society. The increasing number of social media users and the increasing troll memes are demanding automated tools to identify these troll memes and filter them out. Processing the memes in low-resource regional languages like Kannada, Tamil, Telugu, etc., and classifying them as Trolls or Not_Trolls is a challenging task due to scarcity of resources and computational tools for low-resource languages. Further, as memes are made up of both image and text, processing memes present a unique opportunity for multi-modal learning approaches. Analyzing both visual and textual elements in these memes can enhance the learning ability of the models. To address the identification of troll memes, we intend to organize a demonstration on "Multi-modal Troll Memes Identification in Dravidian Languages" that aims to address multi-modal (text and image) learning with different feature representations for Dravidian languages (Kannada, Tulu and Tamil languages). The demonstration covers an introduction to multi-modal learning and hands-ons (demo codes) on feature representations to identify trolling memes.

Introduction

A meme is a combination of image(s) captioned with a witty, catchy, or sarcastic text. Though memes are created with the intention to have fun or to evoke humor, a person or group of people may become upset if a meme crosses the boundary of fun or humor. Further, these memes may be offensive or threatening targeting an individual or a group. However, in recent years, memes are becoming popular specifically for trolls.

As per the definition, “Trolling is the activity of posting a message via social media that tend to be offensive, provocative, or menacing to distract and digressive or off-topic content with the intent of provoking the audience" (Bishop, 2014). People are encashing the anonymity and freedom provided by social media to create interesting and funny memes of their choice by using pictures of celebrities, caricatures or cartoons and embedding textual content on these images in their native language or code-mixed text. Memes can be broadly classified into: i) Trolling memes - memes created with humorous information with the intent of targeting or provoking an individual, a group or a community and ii) Non-Trolling memes - memes created just for fun or to convey some information directly without hurting anybody’s sentiments.

Most of the research works carried out to identify troll memes have focused on English language leaving behind other languages specifically underresourced Indian languages like Kannada, Malayalam, Telugu, Tamil, etc. Memes comprise both image and text and considering only text or only image may not be a feasible approach for developing learning models. This has encouraged the researchers to develop multi-modal approaches which takes into account both text and image modalities to train the learning models. Further, the textual information in such trolls usually come in code-mixed format, where the a comment/troll can contain words belonging to more than one language at sentence/word/sub-word level, adding another dimension in developing learning models for classifying memes. As troll memes are increasing, there is a huge demand to develop tools/models to identify troll memes in code-mixed under-resourced languages, such as Kannada, Telugu, Malayalam, etc

The demonstration comprises the following points:

• Code-mixed text

• Multi-modal data

• Feature representation for multi-modal data

• Multi-modal learning models for the identification of troll

By focusing on the above mentioned aspects, our demo aims to provide valuable insights into the classification of troll memes, particularly in Dravidian languages. Through this study, we emphasize the significance of employing multi-modal learning techniques.

Broad Categories

• Natural Language Processing

• Under-resourced language

• Multi-modal learning models

• Joint representation

Details of the Speakers

Hosahalli Lakshmaiah Shashirekha

Email: hlsrekha@mangaloreuniversity.ac.in

Website: https://mangaloreuniversity.ac.in/shashirekha

Professor, Department of Computer Science,

Mangalore University, Mangalore - 574199, India

Additional Link:

Google scholar Research Gate

Asha Hegde

Email: hegdekasha@gmail.com

PhD student, Department of Computer Science

Website: https://sites.google.com/view/ashahegde/home

Mangalore University, Mangalore - 574199, India

Additional Link:

Google scholar Research Gate

Prior Experience of the Speakers

The team has a wide experience in processing codemixed, Dravidian and low resource language texts and have successfully organized a shared task on word-level LI in code-mixed Tulu at FIRE 2023 (Balouchzahi et al., 2022; Lakshmaiah et al., 2022).

H. L. Shashirekha is a Professor of Computer Science at Mangalore University, India. Her specialization lies in NLP for low-resource and Dravidian languages with broad experience in code-mixed text processing especially in Kannada language (Hegde et al., 2022, 2023). She has co-authored more than 70 scientific publications with an h-index of 13. She is an active researcher and has organized shared tasks in various workshops including Dravidianlangtech2022, ICON2022, Dravidianlangtech2023, and FIRE2023 .

Asha Hegde has participated in more than 10 shared tasks and is a highly experienced researcher in organizing shared tasks on low-resource languages, such as Machine Translation in Dravidian languages-ACL2022, CoLI-Kanglish shared task@ICON2022, Sentiment Analysis in Tamil and Tulu- DravidianLangTech@RANLP 2023, and Tulu word-level LI (Coli-Tunglish). She has published more than 13 research articles with H.L. Shashirekha and most of them are on applications for Dravidian languages (Hegde et al., 2022, 2023).

Student Volunteers

Kavya G

Email: kavyamujk@gmail.com

Lecturer at Department of Computer Science

Mangalore University, Mangalore - 574199, India

website: https://sites.google.com/view/kavya-g/home

Sharal Coelho

Email: sharalmucs@gmail.com

PhD student

Department of Computer Science

Mangalore University, Mangalore - 574199, India

Website: https://sites.google.com/view/sharalcoelho/home

References

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov, and Alexander Gelbukh. 2022. A Comparative Study of Syllables and Character Level N-grams for Dravidian Multi-script and Code-mixed Offensive Language Identification. Journal of Intelligent & Fuzzy Systems, (Preprint):1– 11.
Jonathan Bishop. 2014. Representations of ‘Trolls’ in Mass Media Communication: A Review of MediaTexts and Moral Panics Relating to ‘Internet Trolling’. In International Journal of Web Based Communities, pages 7–24. Inderscience Publishers Ltd.
Asha Hegde, Sharal Coelho, and Hosahalli Shashirekha. 2022. MUCS@ DravidianLangTech@ ACL2022: Ensemble of Logistic Regression Penalties to Identify Emotions in Tamil Text. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 145–150.
Asha Hegde, Hosahalli Lakshmaiah Shashirekha, Anand Kumar Madasamy, and Bharathi Raja Chakravarthi. 2023. A Study of Machine Translation Models for Kannada-Tulu. In Third Congress on Intelligent Systems: Proceedings of CIS 2022, pages 145–161. Springer.
Shashirekha Hosahalli Lakshmaiah, Fazlourrahman Balouchzahi, Mudoor Devadas Anusha, and Grigori Sidorov. 2022. Coli-machine learning approaches for code-mixed language identification at the word level in kannada-english texts. Acta Polytechnica Hungarica, pages 123–141.

Page updated

Google Sites

Report abuse