Resources and tools

Find below a glimpse of the several resources and tools that have been produced by NILC. There are many more research products that may be found at the websites of the projects.

Lexical resources

PortiLexicon-UD: a lexicon for Brazilian Portuguese according to Universal Dependencies

Unitex-PB: a lexicon for Brazilian Portuguese

Word embeddings for Portuguese

Lexical resources for Portuguese: propbank, verbnet, liwc and other resources

WordNetBr: a lexical database for Portuguese

Thesaurus for Portuguese

Aeiouadô: a machine-readable pronunciation dictionary for Brazilian Portuguese

Corpora and datasets

NILC corpus: a large corpus for Brazilian Portuguese

Lácio-Web: Brazilian Portuguese corpora and analysis tools

TweetSentBR: a large corpus of tweets in Brazilian Portuguese manually labeled according to their polarities


Mac-Morpho: a reference corpus for POS tagging in Portuguese




Datasets of neuropsychological language tests in Brazilian Portuguese




CSTNews: a corpus with several linguistic annotation layers

OpSums-PT: a corpus of opinion summaries

Text complexity for educational levels

Historical Portuguese corpora

Fake.Br corpus: a corpus of aligned true and fake news in Brazilian Portuguese

UTLCorpus: a corpus of online reviews in Brazilian Portuguese annotated with helpfulness classification

AMR-BP: semantically annotated corpora for Brazilian Portuguese (according to Abstract Meaning Representation)

PLN-BR corpus: a journalistic corpus for Brazilian Portuguese


Stemming for Portuguese

Lemmatization for Portuguese

A flexible normalizer for user-generated content in Portuguese

SENTER: sentence segmenter for Portuguese



Neologism detection tool for Portuguese

e-Termos: a system for terminology management

HABLA project

Automatic phonetic transcription for Brazilian Portuguese

Part-of-speech tagging for Portuguese

Curupira parser

NLP with neural networks: part-of-speech tagging and semantic role labeling

Semantic parser (following the Abstract Meaning Representation) for Portuguese

OPCluster-PT: automatic identification and clustering of opinion aspects in Portuguese

DiZer: DIscourse analyZER for Portuguese (according to the Rhetorical Structure Theory)

CST Parser: a multi-document discourse parser for Portuguese (according to Cross-document Structure Theory)

Topic segmentation for Portuguese (an adaptation of TextTilling)

RST Toolkit: a collection of software for dealing with RST-based discourse annotated texts

Evaluation tool for RST-based discourse trees

Text aligners

A tool for sentence ordering for texts in Portuguese

NILC-Wise: web interface for summary evaluation

UDConcord: a concordancer for Universal Dependencies-annotated data


RSumm: multi-document summarization for Portuguese

GistSumm - a classical text summarization system for Portuguese

An English pronunciation checker

Machine translation portal

Educational Facilita

SciPo: Scientific Portuguese


SciPo-Farmácia (for English)

CALeSE: Computer-Aided Learning Tool for Scientific Writing in English

Scientific writing portal

FakeCheck: fake news detection for Portuguese

Reference materials

ABNT rules

Compound words in Portuguese

Mini-grammar for Portuguese