Publications and other activities

Publications

2026

Maria Irena Szawerna and Jacob Lee Suchardt. 2026. Fill-in-the-Blanks: Automatic Generation and Evaluation of Language Models' Pseudonyms for English and Swedish Texts. In Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC2026), pages 1155–1169, Palma de Mallorca, Spain. ELRA.
Maria Irena Szawerna and Simon Dobnik. 2026. Birds of a Feather: Do Embedding Representations of Personal Information Flock Together? In Proceedings of the Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026), pages 62–72, Palma de Mallorca, Spain. ELRA.
Ingo Siegert, Maria Irena Szawerna, Khalid Choukri, Simon Dobnik, Paweł Kamocki, Therese Lindström Tiedemann, Pierre Lison, Ricardo Muñoz Sánchez, Ildikó Pilán, Lisa Södergård, Kossay Talmoudi, Elena Volodina, Xuan-Son Vu. 2026. Proceedings of the Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (LEGAL2026 and CALD-pseudo 2026) @ LREC 2026, Palma de Mallorca, Spain. ELRA.

2025

Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, and Elena Volodina. 2025. The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 697–708, Tallinn, Estonia. University of Tartu Library.
Nikolai Ilinykh and Maria Irena Szawerna. 2025. “I Need More Context and an English Translation”: Analysing How LLMs Identify Personal Information in Komi, Polish, and English. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 165–178, Tallinn, Estonia. University of Tartu Library, Estonia.
Arianna Masciolini, Aleksandrs Berdicevskis, Maria Irena Szawerna, and Elena Volodina. 2025. Annotating Second Language in Universal Dependencies: a Review of Current Practices and Directions for Harmonized Guidelines. In Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), pages 153–163, Ljubljana, Slovenia. Association for Computational Linguistics.
Elena Volodina, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, Lisa Södergård, Xuan-Son Vu. 2025. Towards shared standards for pseudonymization of research data. In Proceedings of the Huminfra Conference (HiC 2025), Stockholm.
Maria Irena Szawerna, David Alfter, and Elena Volodina. 2025. Annotating Personal Information in Swedish Texts with SPARV. In Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities, pages 155–163, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Therese Lindström Tiedemann, Lisa Södergård, Elena Volodina, Simon Dobnik, Maria Irena Szawerna, Ricardo Muñoz Sánchez, Xuan-Son Vu. 2025. Om mormor Karl sägs vara 27 år gammal, vad säger det om skribenten? En presentation om att identifiera och ersätta identifierande element i språkvetenskapliga forskningsdata. In Abstractsamling Svenskans beskrivning 40 (Svebe40), Workshop om Pseudonymisering inom språkvetenskap.

2024

Ricardo Muñoz Sánchez, David Alfter, Simon Dobnik, Maria Irena Szawerna, and Elena Volodina. 2024. Jingle BERT, Jingle BERT, Frozen All the Way: Freezing Layers to Identify CEFR Levels of Second Language Learners Using BERT. In Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 137–152, Rennes, France. LiU Electronic Press.
Maria Irena Szawerna, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Xuan-Son Vu, and Elena Volodina. 2024. Pseudonymization Categories across Domain Boundaries. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13303–13314, Torino, Italia. ELRA and ICCL.
Arianna Masciolini, Emilie Francis, and Maria Irena Szawerna. 2024. Synthetic-Error Augmented Parsing of Swedish as a Second Language: Experiments with Word Order. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 43–49, Torino, Italia. ELRA and ICCL.
Maria Irena Szawerna. 2024. Can Stanza be Used for Part-of-Speech Tagging Historical Polish?. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 44–49, St. Julian’s, Malta. Association for Computational Linguistics.
Elena Volodina, David Alfter, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, and Xuan-Son Vu. 2024. Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024). Association for Computational Linguistics, St. Julian’s, Malta, edition.
Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Therese Lindström Tiedemann, and Elena Volodina. 2024. Detecting Personal Identifiable Information in Swedish Learner Essays. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), pages 54–63, St. Julian’s, Malta. Association for Computational Linguistics.
Ricardo Muñoz Sánchez, Simon Dobnik, Maria Irena Szawerna, Therese Lindström Tiedemann, and Elena Volodina. 2024. Did the Names I Used within My Essay Affect My Score? Diagnosing Name Biases in Automated Essay Scoring. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), pages 81–91, St. Julian’s, Malta. Association for Computational Linguistics.

Other activities

2025

"The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling" - a presentation at NoDaLiDa/Baltic-HLT 2025, in relation to the paper above (3. March)
"“I Need More Context and an English Translation”: Analysing How LLMs Identify Personal Information in Komi, Polish, and English" - a poster presentation at the RESOURCEFUL 2025 workshop, in relation to the paper above (2. March)
"Annotating Personal Information in Swedish Texts with SPARV" - a presentation at the LM4DH 2025 workshop (10. September)
"A Construction Grammar perspective on null subjects in Polish" - a presentation at the Formal Description of Slavic Languages 18 conference (24. September)
"Annotating Personal Information in Swedish Texts with SPARV" - a poster presentation at the CLARIN Annual Conference (1. October)
"A Construction Grammar perspective on null subjects in Polish(es)" - a presentation at the CASA|Plus workshop (9. October)
"Putting the "Im" Back in Personal Information" - a presentation at the CLT workshop (22. October)

2024

"The most stupid baseline for generating pseudonyms and how it does not work" - a presentation at the Språkbanken Text End of the Year Workshop 2024 (17. December)
"Swedish Learner Essays Revisited: Further Insights into Detecting Personal Information" - a presentation of a nonarchival submission at the Tenth Swedish Language Technology Conference (SLTC) (27. November)
"AI for open research data with Grandma Karl" - a presentation at the Privacy and AI: Towards a trustworthy eco-system (AI trust) workshop, WASP HS conference (19. November)
"As words have power, names have power" - a presentation at the CLEANUP seminar in Oslo presented together with Ricardo Muñoz Sánchez, Norway (8. October)
"AI for open research data with Grandma Karl" - a presentation at the Beyond Words Theoretical, Experimental, and Computational Approaches to Language, Contexts, and Modalities workshop (3. October)
"AI for open research data with Grandma Karl" - a presentation at the Symposium on ‘Humanistic AI’ workshop (19. June)
"Detecting Personal Identifiable Information in Swedish Learner Essays" - a presentation at the CALD-Pseudo workshop hosted at EACL 2024, in relation to the paper above (21. March)
"Can Stanza be Used for Part-of-Speech Tagging Historical Polish?" - a poster presentation at the Student Research Workshop at EACL 2024, in relation to the paper above (19. March)

2023

"Detecting Personal Identifiable Information (PII)" - a presentation at the Språkbanken Text End of the Year Workshop 2023 (13. December)
"Sense and Sensitivity: what do we need to turn private information into pseudonyms?" - a presentation at Mormor Karl Open House (29. November)

Blog posts

Maria Irena Szawerna. Personal information detection in Sparv: towards a pseudonymization pipeline in the Språkbanken Text blog (April 16th, 2025).
Maria Irena Szawerna, Ricardo Muñoz Sánchez. The Lions, the Words, and the Workshops: Språkbanken Text at EACL 2024 in the Språkbanken Text blog (April 11th, 2024).

Proceedings editorial team

Elena Volodina, David Alfter, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, and Xuan-Son Vu. 2024. Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024). Association for Computational Linguistics, St. Julian’s, Malta.

Organizing

Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026), co-located with LREC 2026, co-organizer (12. May, 2026)
Mind, Machine, Multimodality: Current Issues in Linguistics and Beyond 2025 Student Conference, co-organizer (24. June, 2025)
Privacy and AI: Towards a Trustworthy Ecosystem (AITrust) Workshop, co-located with WASP-HS 2024, organizing co-chair (19. November, 2024)
CALD-Pseudo Workshop, co-located with EACL 2024, organizing co-chair (21. March, 2024)
Workshop on ethics for research and teaching in natural language processing, local organizer (23. January, 2024)
Open House at Mormor Karl’s, co-organizer (29. November, 2023)

Reviewing

I have reviewed for the following venues:

NLP4CALL workshop: 2025
RESOURCEFUL workshop series: 2025, 2026
LEGAL+CALD-pseudo workshop: 2026

Thesis Supervision

2026:

Cristina Matacuta, Gossiping Models: Understanding Unintentional Data Disclosure in LLMs (co-supervised with Simon Dobnik, FLoV and Fazeleh Hoseini, AI Sweden). MA thesis.
Caroline Nathalie Jeanne Grand-Clement, Presenting ENBYS: The Europarl Non-BinarY Sentences: an English-Polish-French Dataset for Machine Translation Evaluation of Non-Binary Gender Inclusion (co-supervised with Sharid Loáiciga, FLoV). MA thesis.

Page updated

Google Sites

Report abuse