PLN-BR is a journalistic corpus that was compiled to support NLP researches on Brazilian Portuguese.
This corpus is composed of more than 26 million tokens from more than 96 thousands texts from the newspaper Folha de São Paulo. The collection is made of news published during one month per year, from 1994 to 2005.
If you are interested in using the corpus for academic purpose only, email to sandra @ icmc . usp . br