In this interinstitutional project, we aim to explore the rhetorical relations in Brazilian Portuguese (BP) through Rhetorical Structure Theory (RST) annotation—a widely adopted discourse theory in Computational Linguistics, extensively used in various Natural Language Processing (NLP) research areas such as automatic summarization (AS), anaphora resolution, machine translation, sentence polarity classification, among others.
The project is structured into two main phases: (i) revisiting and analyzing the RST annotation of the CSTNews corpus and reviewing existing work in BP; and (ii) conducting a study of RST annotation to identify new linguistic cues for marking rhetorical discourse relations, inspired by Das and Taboada (2019) for English.
Through these activities, we aim to expand the project to include initiatives that enhance linguistic resources in RST for Portuguese, proposing computational applications and comparative studies of rhetorical relations between BP and other natural languages.
This project is linked to the POrtuguese processing - Towards Syntactic Analysis and Parsing (POeTiSA), which aims to develop linguistic-computational tools and applications for Brazilian Portuguese, with the goal of advancing the state of the art in the field of Natural Language Processing.
The POeTiSA project is part of the Natural Language Processing initiative (NLP2) of the Center for Artificial Intelligence (C4AI) of the University of São Paulo, sponsored by IBM and FAPESP (grant #2019/07665-4). The center is part of the FAPESP Engineering Research Centers Program and is committed to state-of-the-art research in Artificial Intelligence, exploring both foundational issues and applied research. See the web portal of NLP2 at this link. The POeTiSA initiative is also supported by the Ministry of Science, Technology and Innovation, with resources from Law n. 8,248, of October 23, 1991, under the PPI-SOFTEX, coordinated by Softex and published as Residency in TIC 13, DOU 01245.010222/2022-44. The project also counts with an additional research grant for a related project coordinated by Prof. Ivandré Paraboni (FAPESP #2021/08213-0).
Furthermore, our project benefits from Scientific Initiation scholarships granted by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) through calls for proposals from the Programa Institucional de Bolsas de Iniciação Científica (PIBIC), and Fundação de Amparo à Pesquisa do Estado da Bahia (FAPESB) through the public notice for the "First Projects Program".