AIMS
AIMS
The project aims to analyze online hate speech by examining the linguistic forms and the pragmatic strategies through which prejudice and discrimination are expressed in computer-mediated communication.
Hate speech is a phenomenon that spans multiple linguistic levels—not only lexical, morphological, syntactic, and semantic, but also interactional and pragmatic. It is therefore essential to examine it in relation to the social variables, and corresponding analytical dimensions, from which it arises, such as age, gender, ethnicity, nationality, and, more broadly, identity (sexual, social, ideological, or religious).
First, the project seeks to construct comparable corpora of texts drawn from various online forums, including comment sections of news sites and social networks, representing these variables
Second, it will analyze and annotate the linguistic structures and the discoursal organization of the collected texts.
Finally, the project aims to contextualize the resulting data within the sociological, cultural, psychological, legal, and educational realities of the community, with a view to designing coordinated strategies for awareness-raising and intervention.
METHODOLOGY
The methodology is primarily linguistic, opening up to a number of parallel scientific domains. The project combines two main areas of Linguistics: Corpus Linguistics and Pragmatics. Furthermore, it includes approaches from other academic areas, such as Computer Science, Culture Studies, Sociology, Psychology, Education, and Law.
The first step will consist in compiling comparable corpora produced through online written interaction, focusing on a range of similar topics but in different languages, namely Portuguese and English.
The second step will involve the annotation of the compiled texts at different levels, that is, parsing and tagging them according to distinct analytical purposes. Annotation will begin with metadata, which, among other information, will stratify the textual samples according to sociolinguistic variables such as age, gender, ethnicity, nationality, and social class. This will pave the way for studies in language variation.
Structural taggers will then be applied, providing information on the morpho-syntactic organization of the texts, including segmentation, tokenization, part-of-speech tagging, and lemmatization. This is expected to enable the analysis of instances of pronominalization, passivization (e.g. auxiliary verb–participle constructions), nominalization (e.g. -ing and -tion suffixes), and modalization, which, among other structural mechanisms, convey discursive intent.
Finally, content taggers will be used to design a descriptive ontology of the corpora, lending itself to a range of interpretive tasks. This analytical treatment of language will address not only lexical-semantic recurrences and stylistic patterns in electronic polylogues, but also the discursive and pragmatic organization of the texts.