Resources and tools

POeTiSA: POrtuguese processing - Towards Syntactic Analysis and parsing

Porttinari shall be a large multi-genre corpus of Brazilian Portuguese texts that are manually annotated according to the Universal Dependencies model. As reported in this paper, it is currently under construction and is composed by news texts, user generated content and transcribed speech selected from the following corpora:

The 1st version of the journalistc portion of Porttinari is already available at this link, whick includes its three partitions, namely, Porttinari-base, Porttinari-check and Porttinari-automatic, as detailed in this paper.

The last annotated version of DANTEStocks corpus (version 1.1, of May 03, 2024) is available at this link. In relation to the previsous versions, it incorporates the following improvements: exclusion of sentences written entirely in English and correction of the tokenization process, decimal point representation and CoNLL-U structuring issues. For the interested user, the previous versions of this corpus are also available at the following links: versions of December 15, 2022 and November 16, 2022.

Other corpora and lexical resources

Tools and applications

Related third-party products