AIMS

The project aims to analyze online hate speech by examining the linguistic forms and the pragmatic strategies that prejudice and discrimination exhibit in computer-mediated communication.

Hate speech is a phenomenon that spans across a variety of linguistic levels, not only at the lexcial, morphological, syntactic and semantic levels, but also (inter)actionally and pragmatically. Therefore, it is important to cross it with the social variables, and corresponding analytical dimensions, from which it stems, such as age, gender, ethnicity, nationality and, more broadly, identity (be it sexual, social, ideological, or religious).

First, the project aims to construct comparable corpora of texts from various forums (comment boards of news sites and social networks) that represent these variables.

Then, it aims to analyze and annotate the linguistic structures and the discoursal organization of these texts.

Finally, it aims to contextualize the data obtained in the sociological, cultural, psychological, legal, and educational reality of the community, so as to design concerted strategies of awareness and combat.

METHODOLOGY

The methodology is primarily linguistic, opening up to a number of parallel scientific domains. The project combines two main areas of Linguistics: Corpus Linguistics and Pragmatics. Furthermore, it  includes approaches from other academic areas, such as Computer Science, Culture Studies, Sociology, Psychology, Education, and Law.

The first step will be the compilation of comparable corpora, produced in online written interaction and on a range of similar topics, but in different languages: Portuguese and English.

The second step will be the annotation of the texts compiled at different levels, that is, to parse and tag them according to different analytical purposes. Annotation will begin by metadata, which, among other information, will stratify the textual samples according to sociolinguistic variables such as age, gender, ethnicity, nationality and social class. This will pave the way for studies in language variation.

Structural taggers will be inserted next, providing information about the morpho-syntactic organization of the texts, including segmentation, tokenization, part-of-speech tagging, and lemmatization. It is hoped that this will allow for an analysis of instances of pronominalization, passivization (as in auxiliary verb/participle constructions), nominalization (as in -ing and -tion suffixes), and modalization, which, among other structural mechanisms, carry discursive intent.

Finally, content taggers will help design a descriptive ontology of the corpora that will lend itself to various interpretive tasks. This analytical treatment of language will deal not only with the lexical-semantic recurrences and the stylistic patterns of the electronic polylogues, but also with the discursive and pragmatic organization of the texts.