Ximena Gutierrez-Vasques

I am a computational linguist with an interdisciplinary focus to deepen the study of human language. I recently joined an interdisciplinary research center in Mexico (CEIICH, UNAM), where I work in the interface between humanities and the field of AI

My lines of research cover Multilingual NLP, Computational Morphology,  and NLP for less-resourced languages of the Americas

Before I was a postdoctoral researcher at the University of Zürich where I specialized in approaches for modeling linguistic complexity and typology using text corpora and inspired by information theory. 


--------------------------------------------------------------------------------------------------------------------------------------------------------

I come from a chaotic city, on a volcanic plateau at more than 2000 m above sea level, in a country where 68 different languages are spoken. Perhaps that's part of the reason why I'm captivated by the chaos, predictability and diversity inherent to natural languages and how can we measure that.


In my free time, I like to collaborate with initiatives that encourage NLP for under-represented languages of Mexico  

*I also enjoy getting to know about the history/languages/cultures around the world (and within Mexico),  bikes 🚲,  axolotls ≽(◕ ᴗ ◕)≼  and more...

NEWS

Keynote Speaker @ SIGTYP 2024, EACL, Malta 2024

Keynote: : Text-based Typology for Modeling Linguistic Diversity in NLP  [slides]

March, 2024

We're organizing a NLP summer School!

Mexican NLP Summer School, co-located with #NAACL2024 #MexicoCity 

June 2024

Area chair

I'm an area chair in the track Less-Resourced/Endangered/Less-studied Languages in LREC-COLING 2024


EMNLP 2023 presentation

 Ximena Gutierrez-Vasques, Christian Bentz, Tanja Samardžić. Languages through the Looking Glass of BPE Compression

New journal article

 Ximena Gutierrez-Vasques, Christian Bentz, Tanja Samardžić. Languages through the Looking Glass of BPE Compression

Computational Linguistics (2023)

Academic services, NAACL 2024

I am part of the Publicity Chairs for the upcoming NAACL 2024, which will be happening at Mexico City! 

Oral presentation @ QUALICO 2023 

Julia Lukasiewicz-Pater, Ximena Gutierrez-Vasques and Christian Bentz. Entropic analyses of the Voynich Manuscript using a diverse cross-linguistic corpus and neural networks 

Paper accepted @ CONLL 2022 

Tanja Samardzic, Ximena Gutierrez-Vasques, Rob van der Goot, Max MüllerEberstein, Olga Pelloni and Barbara Plank. On Language Spaces, Scales and Cross-Lingual Transfer of UD Parsers.

New journal article 

Bentz, Christian, Gutierrez-Vasques, Ximena, Sozinova, Olga and Samardžić, Tanja. "Complexity trade-offs and equi-complexity in natural languages: a meta-analysis" 

1 paper accepted@LREC 2022

Moran, S., Bentz, C., Gutierrez-Vasques, X., Sozinova, O., & Samardzic, T. TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP.

Corpora repo: https://github.com/MorphDiv/TeDDi_sample 

Book Chapter

“Relación tipo-token para contrastar la complejidad morfológica del español-náhuatl”.   

Book: Ámbitos morfológicos: Descripciones y métodos. UNAM, Mayo, 2022


Authors: Haspelmath, Martín; Körtvélyessy, Lívia; ?tekauer, Pavol; Orqueda, Verónica; Toro Varela, Francisca; Arriagada Anabalón, Silvana; Esquivel Brizuela, Shaila; Espinosa Ochoa, Mary Rosa; Velázquez Elizalde, Alejandro; Gallegos Shibya, Alfonso; Mijangos de la Cruz, Víctor; Hernández Quiroz, Anselmo; Zacarías Ponce de León, Ramón; Méndez Cruz, Carlos Francisco; Arroyo Fernández, Ignacio; Gutiérrez Vasques, Ximena

Workshop  Information-Theoretic Analyses of Natural Languages

Held at the DGfS Conference in Tübingen 2022 by Christian Bentz and Ximena Gutierrez-Vasques.

Course repo:  https://github.com/christianbentz/Workshop_DGfS2022

Invited talk

Copy of Hackathon

Consideraciones de NLP para lenguas minorizadas. El caso de México

Somos NLP. 1er Hackathon de PLN en español