I am a computational linguist with an interdisciplinary focus to deepen the study of human language. I recently joined an interdisciplinary research center in Mexico (CEIICH, UNAM), where I work in the interface between humanities and the field of AI.
My lines of research cover Multilingual NLP, Quantitative Linguistics, NLP for less-resourced languages of the Americas, and the diversity and social impact implications of AI technologies.
Before I was a postdoctoral researcher at the University of Zürich where I specialized in approaches for modeling linguistic complexity and typology using text corpora and inspired by information theory.
--------------------------------------------------------------------------------------------------------------------------------------------------------
I come from a chaotic city, on a volcanic plateau at more than 2000 m above sea level, in a country where 68 different languages are spoken. Perhaps that's part of the reason why I'm captivated by the chaos, predictability and diversity inherent to natural languages and how can we measure that.
In my free time, I like to collaborate with initiatives that encourage NLP for under-represented languages of Mexico
*I also enjoy getting to know about the history/languages/cultures around the world (and within Mexico), bikes 🚲, axolotls ≽(◕ ᴗ ◕)≼ and more...
Current location: Ciudad Universitaria, Koyowakan, Mexico City
NEWS
Article: Sesgos inductivos relacionales en mecanismos de atención.
XVII Congreso Mexicano de Inteligencia Artificial (COMIA)
Py-Elotl: A Python NLP package for the languages of Mexico. In Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Github repo: https://github.com/ElotlMX/py-elotl
Pypi: https://pypi.org/project/elotl/
I'm the publicity chair for NAACL 2025, and Area Chair for CONLL 2025
May/Jule 2025
Keynote: : Text-based Typology for Modeling Linguistic Diversity in NLP [slides]
March, 2024
Mexican NLP Summer School, co-located with #NAACL2024 #MexicoCity
June 2024
I'm an area chair in the track Less-Resourced/Endangered/Less-studied Languages in LREC-COLING 2024
Ximena Gutierrez-Vasques, Christian Bentz, Tanja Samardžić. Languages through the Looking Glass of BPE Compression