Research
Selected papers
Wilkens, R., Zilio, L., Villavicencio A. (2023). Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese. Language Resources and Evaluation
FABRA: French aggregator-based readability assessment toolkit. Wilkens, R., Alfter, D., Wang, X., Pintard, A., Tack, A., Yancey, K., & François, T. In Proceedings of the thirteenth International Conference on Language Resources and Evaluation, LREC 2022, pages 1217–1233, (2022).
Is Attention Explanation? An Introduction to the Debate. Bibal, A., Cardon, R., Alfter, D., Wilkens, R., Wang, X., François, T., & Watrin, P. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL. Pages 3889–3900, (2022).
Challenging social media threats using collective well-being aware recommendation algorithms and an educational virtual companion. Ognibene, D., Taibi, D., Kruschwitz, U., Wilkens, R., Hernandez-Leo, D., Theophilou, E., Scifo, L., Alejandro Lobo, R., Lomonaco, F., Eimler, S., Hoppe, U. & Malzahn, N. arXiv preprint arXiv:2102.04211, pages 1-50, (2021).
French coreference for spoken and written language. Wilkens, R.; Oberle, B.; Landragin, F.; Todirascu, A.. International Conference on Language Resources and Evaluation, LREC, Marseille, France, pages 80-89, (2020).
The brWaC corpus: a new open resource for Brazilian Portuguese. Wagner Filho, J. A., Wilkens, R., Idiart, M., & Villavicencio, A. In Proceedings of the eleventh international conference on language resources and evaluation, LREC 2018, pages 4339-4344, (2018).
Resources/softwares
FABRA is a readability toolkit based on the aggregation of a large number of readability predictor variables targeting French. The toolkit is implemented as a service-oriented architecture, which obviates the need for installation, and simplifies its integration into other projects.
Wilkens, R., Alfter, D., Wang, X., Pintard, A., Tack, A., Yancey, K., & François, T. (2022). Fabra: French aggregator-based readability assessment toolkit. In Proceedings of the thirteenth International Conference on Language Resources and Evaluation (LREC 2022).
ALGLM Assessing-Linguistic-Generalisation-in-Language-Models
Wilkens, R., Zilio, L., Villavicencio A. (2023). Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese. Language Resources and Evaluation
coFR COreference resolution tool For FRench
Wilkens, Rodrigo; Oberle, Bruno; Landragin, Frédéric; Todirascu, Amalia. French coreference for spoken and written language. International conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
brWaC The Brazilian Portuguese Web as Corpus is a large corpus constructed in our lab following the Wacky framework, which was made public for research purposes.
Filho, Jorge A. Wagner; Wilkens, Rodrigo; Idiart, Marco; Villavicencio, Aline. The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) 2018
SW4ALL
Wilkens, Rodrigo; Zilio, Leonardo; Fairon, Cédrick. SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) 2018
Wilkens, Rodrigo; Zilio, Leonardo; Cordeiro, Silvio R.; Paula, F.; Ramisch, Carlos; Idiart, Marco; Villavicencio, Aline. LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds In: 12th International Conference on Computational Semantics 2017
B2SG
Wilkens, Rodrigo; Zilio, Leonardo; Ferreira, Eduardo; Villavicencio, Aline. B2SG: a TOEFL-like task for Portuguese In: coFR COreference resolution tool For FRench Proceedings of 10th edition of the Language Resources and Evaluation Conference (LREC) 2016
Wilkens, Rodrigo; Zilio, Leonardo; Ferreira, Eduardo; Villavicencio, Aline. The Portuguese B2SG: a semantic test for distributional thesaurus In: Proceedings of 12th International Conference on the Computational Processing of Portuguese (PROPOR) 2016
Courage@exist Participation of the team of the COURAGE project (from the University of Milano-Bicocca) at EXIST shared task (nlp.uned.es/exist2021/).
Wilkens, R., Ognibene, D. MB-Courage @ EXIST: GCN Classification for Sexism Identification in Social Networks. In IberLEF@ EXIST. 2021.
Wilkens, Rodrigo & Todirascu, Amalia. Simplifying Coreference Chains for Dyslexic Children. International conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
Wilkens, Rodrigo; Oberle, Bruno; Todirascu, Amalia. Coreference-Based Text Simplification. Workshop Tools and Resources to Empower People with Reading Difficulties (READI). International conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
Publications (full list)
Cardon, R. and Bibal, A. and Wilkens, R. and Alfter, D. and Norré, M. and Müller, A. and Watrin, P. and François, T. Annotation Linguistique pour l'Évaluation de la Simplification Automatique de Textes. In Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles (TALN2023)
Cardon, R. and Bibal, A. and Wilkens, R. and Alfter, D. and Norré, M. and Müller, A. and Watrin, P. and François, T. Linguistic Corpus Annotation for Automatic Text Simplification Evaluation. In Proceedings of EMNLP2022.
Wilkens, R., Alfter, D., Cardon, R., Gribomont, I., Bibal, A., Watrin, P., de Marneffe, M.-C. and François, T. CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification? In Proceedings of TSAR-2022 Shared Task
Bibal, A., Cardon, R., Alfter, D., Wilkens, R., Wang, X., François, T., & Watrin, P. L'Attention est-elle de l'Explication ? Une Introduction au Débat. In Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles (TALN2022), pages 447-449
Wilkens, R., Zilio, L., Villavicencio A. (2023). Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese. Language Resources and Evaluation 2023
Wilkens, R., Alfter, D., Wang, X., Pintard, A., Tack, A., Yancey, K., & François, T. (2022). Fabra: French aggregator-based readability assessment toolkit. In Proceedings of the thirteenth International Conference on Language Resources and Evaluation (LREC2022).
Bibal, A., Cardon, R., Alfter, D., Wilkens, R., Wang, X., François, T., & Watrin, P. (2022). Is Attention Explanation? An Introduction to the Debate. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL2022).
Todirascu, A., Wilkens, R., Rolin, E., François, T., Bernhard, D., & Gala, N. (2022). HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French. In the 12th International Conference on Language Resources and Evaluation (LREC2022).
Wilkens, R., Alfter, D., Cardon, R., & Gala, N. (2022). In the 2nd Workshop on Tools and Resources for REAding DIfficulties (READI2022).
Wilkens, R., Seibert, D., Wang, X., & François, T. (2022). MWE for Essay Scoring English as a Foreign Language. In 2nd Workshop on Tools and Resources for REAding DIfficulties (READI2022).
Ognibene, D., Taibi, D., Kruschwitz, U., Wilkens, R. S., Hernandez-Leo, D., Theophilou, E., ... & Malzahn, N. (2021). Challenging social media threats using collective well-being aware recommendation algorithms and an educational virtual companion. arXiv preprint arXiv:2102.04211.
R. Wilkens, D. Ognibene. biCourage: ngram and syntax GCNs for Hate Speech detection, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021.
Wilkens, R., Ognibene, D. MB-Courage @ EXIST: GCN Classification for Sexism Identification in Social Networks. In IberLEF@ EXIST. 2021.
Gala, Nuria & Wilkens, Rodrigo. Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI). 2020.
Gala, Nuria; Todirascu, Amalia; Bernhard, Delphine; Wilkens, Rodrigo; Meyer, Jean-Paul. Transformations syntaxiques pour une aide à l’apprentissage de la lecture : typologie, adéquation et corpus adaptés. Congrès Mondial de Linguistique Française (CMLF 2020). Montpellier, France.
Wilkens, Rodrigo; Oberle, Bruno; Todirascu, Amalia. Coreference-Based Text Simplification. Workshop Tools and Resources to Empower People with Reading Difficulties (READI). International conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
Wilkens, Rodrigo & Todirascu, Amalia. Simplifying Coreference Chains for Dyslexic Children. International conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
Wilkens, Rodrigo; Oberle, Bruno; Landragin, Frédéric; Todirascu, Amalia. French coreference for spoken and written language. International conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
Paula, Felipe, Wilkens; Rodrigo; Idiart, Marco; Villavicencio, Aline. Similarity Measures for the Detection of Clinical Conditions with Verbal Fluency Tasks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), Volume 2 (Vol. 2, pp. 231-235).
Filho, Jorge A. Wagner; Wilkens, Rodrigo; Idiart, Marco; Villavicencio, Aline. The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) 2018
Wilkens, Rodrigo; Zilio, Leonardo; Fairon, Cédrick. SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) 2018
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick. An SLA Corpus Annotated with Pedagogically Relevant Grammatical Structures In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) 2018
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick. Investigating Productive and Receptive Knowledge: A Profile for Second Language Learning In: Proceedings of the 27th International Conference on Computational Linguistics. 2018. p. 3467-3478.
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick. PassPort: A Dependency Parsing Model for Portuguese. Lecture Notes in Computer Science. 13ed.: International Conference on the Computational Processing of Portuguese (PROPOR), 2018, v. 11122, p. 479-489.
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick. SMILLE for Portuguese: Annotation and Analysis of Grammatical Structures in a Pedagogical Context. Lecture Notes in Computer Science. 13ed.: Springer International Publishing, 2018, v. 11122, p. 13-23.
Wilkens, Rodrigo; Zilio, Leonardo; Fairon, Cédrick. Document Ranking Applied to Second Language Learning. Lecture Notes in Computer Science. 1ed.: Springer International Publishing, 2018, v. , p. 618-624.
Wilkens, Rodrigo; Zilio, Leonardo; Cordeiro, Silvio R.; Paula, F.; Ramisch, Carlos; Idiart, Marco; Villavicencio, Aline. LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds In: 12th International Conference on Computational Semantics 2017
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick. Using NLP for Enhancing Second Language Acquisition In: RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning 2017
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick. Enhancing grammatical structures in web-based texts. In: Kate Borthwick; Linda Bradley; Sylvie Thouësny. (Org.). CALL in a climate of change: adapting to turbulent global conditions short papers from EUROCALL 2017. 1ed.: Research-publishing.net, 2017, v. 1, p. 345-350.
Cordeiro, Silvio; Ramisch, Carlos; Zilio, Leonardo; Idiart, Marco; Villavicencio, Aline; Wilkens, Rodrigo. How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) 2016
Wagner Filho, Jorge; Wilkens, Rodrigo; Villavicencio, Aline. Automatic Construction of Large Readability Corpora In: Workshop on Computational Linguistics for Linguistic Complexity (CL4LC) 2016
Wagner Filho, Jorge; Wilkens, Rodrigo; Zilio, L.; Idiart, Marco; Villavicencio, Aline. Crawling by Readability Level In: Proceedings of 12th International Conference on the Computational Processing of Portuguese (PROPOR) 2016
Wilkens, Rodrigo; Idiart, Marco; Villavicencio, Aline. Multiword Expressions in Child Language In: Proceedings of 10th edition of the Language Resources and Evaluation Conference (LREC) 2016
Wilkens, Rodrigo; Zilio, Leonardo; Ferreira, Eduardo; Villavicencio, Aline. B2SG: a TOEFL-like task for Portuguese In: Proceedings of 10th edition of the Language Resources and Evaluation Conference (LREC) 2016
Wilkens, Rodrigo; Zilio, Leonardo; Ferreira, Eduardo; Villavicencio, Aline. The Portuguese B2SG: a semantic test for distributional thesaurus In: Proceedings of 12th International Conference on the Computational Processing of Portuguese (PROPOR) 2016
Wilkens, Rodrigo; Zilio, Leoanrdo; Idiart, Marco; Wagner Filho, Jorge; Ferreira, Eduardo; Mollmann, Luis; Pasqualini, Biacan; Villavicencio, Aline. Resources for Monolingual Translation: a case study of Text Simplification for Portuguese In: Proceedings of PROPOR 2016 Workshop on Corpora and Tools for Processing Corpora 2016
Zilio, Leonardo; Wilkens, Rodrigo; Mollmann, Luis; Idiart, Marco; Wehrli, Eric; Villavicencio, Aline. Joining Forces for Multiword Expression Identification In: Proceedings of 12th International Conference on the Computational Processing of Portuguese (PROPOR) 2016
Sakas, Wilian; Robert Berwick; Corver, A.; Wilkens, Rodrigo; Yang, Charles. Parameter Setting is Feasible In: 6th bi-annual Generative Approaches to Language Acquisition (GALANA 6) 2015
Wilkens, Rodrigo; Zilio, Leonardo; Ferreira, Eduardo; Goncalves, Gabriel; Villavicencio, Aline. Tesauros Distribucionais para o Português In: Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology 2015
Boito, Marceli Z.; Hagemann, Luiza; Wilkens, Rodrigo; Villavicencio, Aline. Uma análise do perfil de entropia das estruturas sintáticas do português In: Proceedings of the Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish 2014
Vecchia, Alessando D.; Wilkens, Rodrigo; Boito, Marceli Z.; Padró, Muntsa; Villavicencio, Aline. Size does not matter. Frequency does. A study of features for measuring lexical complexity In: Proceedings of the 14th Ibero-American Conference on Artificial Intelligence 2014
Evers, Aline; Wilkens, Rodrigo. Classificação de proficiência em língua adicional no português: um estudo para a determinação de índices diferenciadores In: IX Encontro Nacional de Inteligência Artificial (ENIA) 2012
Villavicencio, Aline; Yankama, Beracka; Wilkens, Rodrigo; Idiart, Marco.; Berwick, Robert. An annotated English child language database In: 13th Conference of the European Chapter of the Association for computational Linguistics 2012
Wilkens, Rodrigo. Searching the Annotated Portuguese Childes Corpora In: Proceedings of the Workshop on Computational Models of Language Acquisition and Loss. Association for Computational Linguistics 2012
Wilkens, Rodrigo; Proenca, Matheus; Villavicencio, Aline. An Environment for searching Portuguese child language corpora In: International Conference on Computational Processing of the Portuguese Language 2012
Wilkens, Rodrigo; Villavicencio, Aline. I say have you say tem: profiling verbs in children data in English and Portuguese In: EACL-2012 Workshop on Computational Models of Language Acquisition and Loss 2012
Bocorny, Ana; Villavicencio, Aline; Killian, Critina; Wilkens, Rodrigo. Flexible Media Environments for Collaborative Lexicography In: Electronic lexicography in the 21st century: new applications for new users (eLEX2011) 2011
Bocorny, Ana; Villavicencio, Aline; Killian, Cristina; Wilkens, Rodrigo. The creation of a bilingual online multimedia learner's aviation glossary (Bomlag) based in corpus In: Electronic lexicography in the 21st century: new applications for new users (eLEX2011) 2011
Gonçalves, Gabriel; Wilkens, Rodrigo; Villavicencio, Aline. Sistema de Aquisição semi-automática de Ontologias. In: OntoBras 2011
Prestes, Kassius; Wilkens, Rodrigo; Zilio, Leonardo; Villavicencio, Aline. Extração e Validação de Ontologias a partir de Recursos Digitais In: OntoBras 2011
Wilkens, Rodrigo; Villavicencio, Aline. Question Answering for Portuguese: how much is needed? In: Brazilian Symposium on Artificial Intelligence (SBIA) 2010
Wilkens, Rodrigo; Villavicencio, Aline; Muller, Daniel; Wives, Leandro; Loh, Stanley. COMUNICA - A Question Answering System for Brazilian Portuguese In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING) 2010
Research Projects
2021 to the present CEFR-FR Reitor: This project looks at the issue of automatic assessment of the written competence of learners of French as a foreign language (FFL) from a two-pronged perspective. First, by building on a collaboration with France Éducation International, we will compile the largest corpus of learner productions for FFL. This learner corpus will enable us to build up an inventory of linguistic phenomena and to estimate their distribution over the six levels of the Common European Framework of Reference for Languages (CEFR). In a second step, we will develop an artificial intelligence algorithm capable of assigning one of the six levels of the CEFR to a learner's written production. It will also be able to automatically identify the linguistic phenomena included in our inventory in a learner's production and link them to a CEFR level. This will enable us to provide a detailed diagnosis of the level of competence of this learner from different linguistic levels. An evaluation of the performance of this model and its usefulness for the training of future language assessors will be carried out.
2020 to 2021 COURAGE: COURAGE at University of Milano-Bicocca is a collaboration funded by Volkswagen Foundation as part of the Artificial Intelligence and the Society of the Future funding initiative, including as partners the Universitat Pompeu Fabra (Spain), the Istituto per le Tecnologie Didattiche of the National Council of Research ITD-CNR, the Hochschule Ruhr West (Germany) and the Rhine-Ruhr Institute for System Innovation (Germany). This project brings together a multi-disciplinary consortium to develop novel approaches aimed at addressing some of the major challenges posed by social media to society and to young members of society. In particular, aiming to develop a Virtual Social Media Companion that educates and supports teenage school students facing the threats of social media such as discrimination and biases as well as hate speech, bullying, fake news and other toxic content. The University of Milano-Bicocca team drives two strands of this work, the machine-learning-based user modelling aspect and the process of analysing textual data drawing from our expertise in natural language processing (NLP).
2019 - 2020 ALECTOR: This project address scientific issues including readability assessment, lexical simplification, syntactic simplification, and discourse transformations. Targeting dyslexic and poor readers, in this project, text transformations will be based on theoretical findings about the reading process and further refined by specific adaptations leveraging the feedback from the targeted audience. As one of the key innovative deliverables, ALECTOR will propose a web-based application where simplified corpus will be available to teachers and speech therapists.
2017 - 2019 Smart and Adaptive Language Learning Applications (SMALLA): This project aims to allow second language learners to read texts of their interests and reading skills, helping e-learning environments to keep the learners engaged in learning activities. The three core research fields related are user-modeling, text profiling, and text classification. The user-modeling feature aims to measure the user skills by tracking user interaction and e-learning tests. The text-profiling feature acts as a background both searching for texts and building a profile from them. Finally, the text classification feature aims to put together the other core features results by selecting texts, which fits in their interests and reading skills. This would allow the learner to take his or her own interest into the e-learning platform and use its resources in specific reading and learning tasks.
2014 - 2016 ExplainText Text simplification of complex expressions: The goal of this project is to investigate and develop techniques, resources and tools for automatic text simplification. The idea was to rewrite texts making them more accessible and easier to understand to a larger audience. Our focus was in lexical simplification, where more difficult words, in specific Multiword Expressions, are replaced by more familiar synonyms. The project Simplification of Complex Expressions was funded by Samsung Research.
2013 - 2016 Computational cognitive language models in the Autism Spectrum Disorder: This project aims to develop computational resources and models to investigate factors related to the acquisition of children's language, and to the use of language in clinical conditions, such as autism spectrum disorders and aphasia. It is mainly centred in the influences of language processes in clinical and non-clinical cases dealing with linguistic and psycholinguistic information.
2012 - 2016 Cognitive Computational Models of Natural Languages for Assessing Language Competency: In this project, we investigated the influence of language factors in low literacy and pathologies, focusing on Alzheimer’s Disease. Although there seem to have a link between factors like frequency and age of acquisition and strategies employed in processing, and in particular for Brazilian Portuguese there is still much to investigate. The long term goal of this investigation was improved scientific understanding of human language processing and its impact in the development of educational technology, as well as treatments and rehabilitation of various language disorders.
2008 - 2011 COMUNICA - Databases Access by Telephone: The project consisted in the development of an automatic question answering system over telephone, allowing the population to access public digital data. The project was financed by public and private sectors, and it was developed by a group of companies, in collaboration with the Institute of Informatics of UFRGS.