POeTiSA

POrtuguese processing - Towards Syntactic Analysis and parsing

POeTiSA is a long term project that aims at growing syntax-based resources and developing related tools and applications for Brazilian Portuguese language, looking to achieve world state-of-the-art results in this area. On the resource side, we focus on the production of a large and comprehensive multi-genre corpus of Universal Dependencies-based part of speech and syntactically annotated texts, including mainly news texts and user-generated content (tweets and online comments). Regarding the tools, we aim to investigate recent neural and distributional-based methods for training robust parsing models for Portuguese. The project also envisions the production of applications on opinion mining and sentiment analysis tasks that may benefit from syntactic knowledge, as opinion summarization, helpfulness prediction, aspect identification, deception detection and emotion classification.

This project is part of the Natural Language Processing initiative (NLP2) of the Center for Artificial Intelligence (C4AI) of the University of São Paulo, sponsored by IBM and FAPESP (grant #2019/07665-4). The center is part of the FAPESP Engineering Research Centers Program and is committed to state-of-the-art research in Artificial Intelligence, exploring both foundational issues and applied research. See the web portal of NLP2 at this link. The POeTiSA initiative is also supported by the Ministry of Science, Technology and Innovation, with resources from Law n. 8,248, of October 23, 1991, under the PPI-SOFTEX, coordinated by Softex and published as Residency in TIC 13, DOU 01245.010222/2022-44. The project also counts with an additional research grant for a related project coordinated by Prof. Ivandré Paraboni (FAPESP #2021/08213-0).