Opinion Mining for Portuguese

Concept-based Approaches and Beyond

According to Liu (2012), "sentiment analysis, also called opinion mining, is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes". Cambria (2013) and Cambria et al. (2015) go on and define what they call concept-level sentiment analysis, which, according to the authors, performs a deeper understanding of the texts of interest in order to produce better results, taking into account more sophisticated NLP tasks for extracting opinionated information from text, including microtext analysis, semantic parsing, subjectivity detection, anaphora resolution, sarcasm detection, topic spotting, aspect extraction, and polarity detection.

The OPINANDO project aimed at investigating issues of concept-level analysis for the Brazilian Portuguese language. We were particularly interested on three main research fronts, namely: (i) the identification of relevant texts to mine, which included tackling text importance and filtering deceptive content; (ii) the analysis of the selected texts, performing the necessary semantic and discourse analysis and identifying subjective content and the corresponding aspects and polarities; and (iii) the synthesis of the relevant information, using text summarization and generation strategies and dealing with the related challenges in these tasks.

The project was officially funded by USP Research Office (PRP N. 668, from May 2019 to April 2020) and got student scholarships from FAPESP, CAPES and CNPq agencies.

Related publications

  • Dias, M.S.; Di Felippo, A.; Rassi, A.P.; Cardoso, P.C.F.; Nóbrega, F.A.A.; Pardo, T.A.S. (2021). An investigation of linguistic problems in automatic multi-document summaries. Revista de Estudos da Linguagem, Vol. 29, N. 2, pp. 859-907. link to the paper

  • Sobrevilla Cabezudo, M.A. and Pardo, T.A.S. (2020). NILC at WebNLG+: Pretrained Sequence-to-Sequence Models on RDF-to-Text Generation. In the Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pp. 131-136. December, 18. pdf

  • Sobrevilla Cabezudo, M.A. and Pardo, T.A.S. (2020). NILC at SR’20: Exploring Pre-Trained Models in Surface Realisation. In the Proceedings of the Third Workshop on Multilingual Surface Realisation (MSR), pp. 50-56. December, 12. pdf

  • Costa, R.W.M. and Pardo, T.A.S. (2020). Métodos baseados em léxico para extração de aspectos de opiniões em português. In the Proceedings of the IX Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), pp. 61-72. November, 16-20. Cuiabá/Brazil. pdf

  • Anchiêta, R.T. and Pardo, T.A.S. (2020). Semantically Inspired AMR Alignment for the Portuguese language. In the Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1595-1600. November, 16-20. pdf

  • Belisário, L.B.; Ferreira, L.G.; Pardo, T.A.S. (2020). Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese. Information, Vol. 11, N. 9, pp. 1-14. link to the paper

  • Anchiêta, R.T.; Sousa, R.F.; Pardo, T.A.S. (2020). Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation. Information, Vol. 11, N. 9, pp. 1-12. link to the paper

  • Vargas, F.A. and Pardo, T.A.S. (2020). Studying Dishonest Intentions in Brazilian Portuguese Texts. In the Proceedings of the 1st International Workshop on Deceptive AI, pp. 1-13. August, 30. Santiago de Compostela/Spain. pdf

  • Anchiêta, R.T. (2020). Abstract Meaning Representation Parsing for the Brazilian Portuguese Language. PhD Thesis. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. 145p. pdf

  • Vargas, F.A. and Pardo, T.A.S. (2020). Linguistic Rules for Fine-Grained Opinion Extraction. In the Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, pp. 1-6. June, 8. pdf

  • Vargas, F.A. and Pardo, T.A.S. (2020). Aspect Clustering for Sentiment Analysis. In T.S. Clary (ed.), Horizons in Computer Science Research, Vol. 18, pp. 213-224. Nova Science Publishers Inc. link to the book

  • Santos, R.L.S.; Wick-Pedro, G.; Leal, S.; Vale, O.A.; Pardo, T.A.S.; Bontcheva, K.; Scarton, C. (2020). Measuring the Impact of Readability Features in Fake News Detection. In the Proceedings of the 12th Language Resources and Evaluation Conference (LREC), pp. 1404-1413. May, 13-15. Marseille/France. pdf

  • Vargas, F.A. and Pardo, T.A.S. (2020). An Automatic Explicit and Implicit Opinion Aspect Clustering Tool for Portuguese. In the Online Proceedings of PROPOR Demonstration Workshop, pp. 1-3. March, 2-4. Évora/Portugal. pdf

  • Wick-Pedro, G.; Santos, R.L.S.; Vale, O.A.; Pardo, T.A.S.; Bontcheva, K.; Scarton, C. (2020). Linguistic Analysis Model for Monitoring User Reaction on Satirical News for Brazilian Portuguese. In the Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 12037), pp. 313-320. March, 2-4. Évora/Portugal. link to the paper

  • Santos, R.L.S. and Pardo, T.A.S. (2020). Fact-Checking for Portuguese: Knowledge Graph and Google Search-Based Methods. In the Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 12037), pp. 195-205. March, 2-4. Évora/Portugal. link to the paper

  • Nóbrega, F.A.A.; Jorge, A.M.; Brazdil, P.; Pardo, T.A.S. (2020). Sentence Compression for Portuguese. In the Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 12037), pp. 270-280. March, 2-4. Évora/Portugal. link to the paper

  • Belisário, L.B.; Ferreira, L.G.; Pardo, T.A.S. (2020). Evaluating Methods of Different Paradigms for Subjectivity Classification in Portuguese. In the Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 12037), pp. 261-269. March, 2-4. Évora/Portugal. link to the paper

  • Anchiêta, R.T. and Pardo, T.A.S. (2020). Exploring the Potentiality of Semantic Features for Paraphrase Detection. In the Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 12037), pp. 228-238. March, 2-4. Évora/Portugal. link to the paper

  • Bertalan, V.G. and Ruiz, E.E.S. (2020). Predicting judicial outcomes in the Brazilian legal system using textual features. Workshop on Digital Humanities and Natural Language Processing. March, 2-4. Évora/Portugal.

  • Okano, E.Y.; Liu, Z.; Ji, D.; Ruiz, E.E.S. (2020). Fake news detection on Fake.Br using hierarchical attention networks. In the Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 12037), pp. 143-152. March, 2-4. Évora/Portugal. link to the paper

  • Silva, R.M.; Santos, R.L.S.; Almeida, T.A.; Pardo, T.A.S. (2020). Towards Automatically Filtering Fake News in Portuguese. Expert Systems with Applications (ESWA), Vol. 146, pp. 1-14. pdf

  • Bertalan, V.G. and Ruiz, E. E. S. (2019). Using topic modeling to find main discussion topics in brazilian political websites. In the Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (WebMedia), pp. 245-248. October, 29 - November, 1. Rio de Janeiro/RJ. pdf

  • Sousa, R.F.; Anchiêta, R.T.; Nunes, M.G.V. (2019). Um método baseado em grafos para predição da utilidade de opiniões sobre produtos. In the Proceedings of the VIII Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), pp. 95-106. Belém/PA. pdf

  • Sousa, R.F.; Brum, H.B.; Nunes, M.G.V. (2019). A bunch of helpfulness and sentiment corpora in brazilian portuguese. In the Proceedings of Symposium in Information and Human Language Technology (STIL), pp. 209-218. Salvador/BA. pdf

  • Silva, R.R. (2019). Sumarização contrastiva de opinião. Dissertação de Mestrado. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. 152p. pdf

  • Anchiêta, R.F.; Sobrevilla Cabezudo, M.A.; Pardo, T.A.S. (2019). SEMA: an Extended Semantic Evaluation Metric for AMR. In the Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). April, 7-13. La Rochelle, France. pdf (preprint version)

  • Belisário, L.B.; Ferreira, L.G.; Pardo, T.A.S. (2019). Classificação de subjetividade para a língua portuguesa. In Anais do VI Workshop de Iniciação Científica em Tecnologia da Informação e da Linguagem Humana (TILic), pp. 358-361. October, 15-18. Salvador/Bahia, Brazil. pdf

  • Silva, R.R. and Pardo, T.A.S. (2019). Córpus 4P: um córpus anotado de opiniões em português sobre produtos eletrônicos para fins de sumarização contrastiva de opinião. In Anais da 6a Jornada de Descrição do Português (JDP), pp. 330-338. October, 15-18. Salvador/Bahia, Brazil. pdf

  • Sobrevilla Cabezudo, M.A.; Mille, S.; Pardo, T.A.S. (2019). Back-Translation as Strategy to Tackle the Lack of Corpus in Natural Language Generation from Semantic Representations. In the Proceedings of the Second Workshop on Multilingual Surface Realization (MSR), pp. 94-103. November, 3. Hong Kong, China. pdf

  • Belisário, L.B.; Ferreira, L.G.; Pardo, T.A.S. (2019). Classificação de Subjetividade para o Português: Métodos Baseados em Aprendizado de Máquina e em Léxico. 27o Simpósio Internacional de Iniciação Científica e Tecnológica da USP (SIICUSP), pp. 1-1. September, 11. São Carlos/SP. Brazil. pdf

  • Sobrevilla Cabezudo, M.A. and Pardo, T.A.S. (2019). Natural Language Generation: Recently Learned Lessons, Directions for Semantic Representation-based Approaches, and the Case of Brazilian Portuguese Language. In the Proceedings of the ACL Student Research Workshop (SRW), pp. 81-88. July, 28 to August, 2. Florence/Italy. pdf

  • Sobrevilla Cabezudo, M.A. and Pardo, T.A.S. (2019). Towards a General Abstract Meaning Representation Corpus for Brazilian Portuguese. In the Proceedings of the 13th Linguistic Annotation Workshop (LAW), pp. 236-244. August, 1. Florence/Italy. pdf

  • Monteiro, R.A. (2018). Detecção Automática de Notícias Falsas. Trabalho de Conclusão de Curso. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos-SP, November, 40p. pdf

  • Costa, R.W.M. (2018). Extração e qualificação de aspectos de opinião para o português. Trabalho de Conclusão de Curso. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos-SP, November, 47p. pdf

  • Anchiêta, R.T. and Pardo, T.A.S. (2018). A Rule-Based AMR Parser for Portuguese. In the Proceedings of the 16th Ibero-American Conference on Artificial Intelligence (IBERAMIA) (LNCS 11238), pp. 341-353. November, 13-16. Trujillo/Peru. pdf (preprint version)

  • Nóbrega, F.A.A. and Pardo, T.A.S. (2018). Update Summarization: Building from Scratch for Portuguese and Comparing to English. Journal of the Brazilian Computer Society (JBCS), Vol. 24, N. 11, pp. 1-12. pdf

  • Monteiro, R.A.; Santos, R.L.S.; Pardo, T.A.S.; Almeida, T.A.; Ruiz, E.E.S.; Vale, O.A. (2018). Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In the Proceedings of the 13th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 11122), pp. 324-334. September, 24-26. Canela-RS/Brazil. pdf (preprint version)

  • Machado, M.T.; Pardo, T.A.S.; Ruiz, E.E.S. (2018). Creating a Portuguese context sensitive lexicon for sentiment analysis. In the Proceedings of the 13th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 11122), pp. 335-344. September, 24-26. Canela-RS/Brazil. pdf (preprint version)

  • Vargas, F.A. and Pardo, T.A.S. (2018). Aspect clustering methods for sentiment analysis. In the Proceedings of the 13th International Conference on the Computational Processing of Portuguese (PROPOR) (LNAI 11122), pp. 365-374. September, 24-26. Canela-RS/Brazil. pdf (preprint version)

  • Santos, R.L.S.; Monteiro, R.A.; Pardo, T.A.S. (2018). The Fake.Br corpus - a corpus of fake news for Brazilian Portuguese. Latin American and Iberian Languages Open Corpora Forum (OpenCor). September, 24. Canela-RS/Brazil. pdf

  • Sobrevilla Cabezudo, M.A. and Pardo, T.A.S. (2018). NILC-SWORNEMO at the Surface Realization Shared Task: Exploring Syntax-Based Word Ordering using Neural Models. In the Proceedings of the First Workshop on Multilingual Surface Realisation, pp. 1–7. July 19. Melbourne/Australia. pdf

  • Sousa, O.A.F. (2018). Sumarização contrastiva de opinião: uma abordagem com otimização. Trabalho de Conclusão de Curso. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos-SP, June, 42p. pdf

  • Anchiêta, R.T. and Pardo, T.A.S. (2018). Towards AMR-BR: A SemBank for Brazilian Portuguese Language. In the Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC), pp. 974-979. May 7-12. Miyazaki/Japan. pdf

  • Vargas, F.A. and Pardo, T,A.S. (2018). Hierarchical clustering of aspects for opinion mining: a corpus study. In M.J.B. Finatto, R.R. Rebechi, S. Sarmento and A.E.P. Bocorny (eds.), Linguística de Corpus: Perspectivas, pp. 69-91. Porto Alegre: Instituto de Letras da UFRGS. 580p. pdf

  • Anchiêta, R.T.; Sousa, R.F.; Moura, R.S.; Pardo, T.A.S. (2017). Improving Opinion Summarization by Assessing Sentence Importance in On-line Reviews. In the Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology (STIL), pp. 32-36. October 2-4. Uberlândia-MG/Brazil. pdf

  • Machado, M.T.; Ruiz, E.E.S.; Pardo, T.A.S. (2017). Analysis of unsupervised aspect term identification methods for Portuguese reviews. In the Proceedings of the 14o Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pp. 239-249. October 2-5. Uberlândia-MG/Brazil. pdf

  • Machado, M.T.; Temporal, J.C.A.N.; Pardo, T.A.S.; Ruiz, E.E.S. (2017). Mineração de tópicos e aspectos em microblogs sobre Dengue, Chikungunya, Zika e Microcefalia. In Anais do XVII Workshop de Informática Médica (WiM), pp. 265-274. July 3-5. São Paulo-SP/Brazil. pdf

  • López Condori, R.E. and Pardo, T.A.S. (2017). Opinion Summarization Methods: Comparing and Extending Extractive and Abstractive Approaches. Expert Systems with Applications (ESWA), Vol. 78, pp. 124-134. pdf

  • Vargas, F.A. and Pardo, T.A.S. (2017). Estudo Empírico sobre Agrupamento e Organização Hierárquica de Aspectos para Mineração de Opinião. Série de Relatórios Técnicos do Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, N. 418. São Carlos-SP, March, 48p. pdf

Applications


Tools


Resources -- lexicons and ontologies

  • LIWC lexicon - as described here, a Brazilian Portuguese version of the lexicon in the Linguistic Inquiry and Word Count tool, which is a text analysis software program that calculates the degree to which people use different categories of words across a wide array of texts

  • Aspect ontologies (also available here) - as described here and in the MSc Dissertation of Vargas (2017), groups of (hierarchically organized) opinion aspects for supporting opinion mining tasks in Brazilian Portuguese, including the domains of smartphones, digital cameras and books, in OWL format

  • Verbo-Brasil - as described here, a PropBank-like repository for Brazilian Portuguese (there is also a web interface for consulting the data)

  • VerbNet.Br - as described here, a class-based verb lexicon for Brazilian Portuguese (you may acess the search tool here or directly download the database and the gold standard file)


Resources -- corpora

ICMC-USP/São Carlos

December 3, 2019