Photo: René Burri
The Workshop on Math Datasets: Alignments and Comparisons (Align 2025) will be held in association with CICM 2025, which will take place at the University of Brasilia (Brasilia, Brazil) from 6-10 October, 2025.
Mathematical knowledge is publicly available in many different formats and languages on the internet, ranging from mathematical English (e.g. Wikipedia) to formal corpora (e.g. Mathlib).
We would like to establish machine-based connections between concepts as they appear in these different formats, which we call alignments of mathematical concepts. To this end, we are organizing a workshop at CICM to bring together people interested in solving this and related problems. Align 2025 will be a hybrid workshop on aligning mathematical concepts between formal and natural-language resources.
Creating and improving alignments would help make the mathematics literature more searchable and accessible, would better enable compatibilty between different proof assistants/theorem provers, and would help simplify the teaching of mathematics in multiple languages. Some of the questions we hope to consider at this workshop include:
How can we be sure that two authors are talking about the same thing?
What important information is contained in the differences between framings or presentations of the same concept?
When are two theorems/proofs the same?
Our invited speaker is Katja Berčič from the University of Ljubljana, Slovenia.
2:00 PM - 3:00 PM: Katja Berčič: "Alignments for Data"
As data becomes increasingly central to mathematics, it is reshaping how mathematicians work, reason, and communicate. This talk sketches how data and mathematics have historically intersected, examines the current landscape, and considers how things might evolve—especially in the era of computer-assisted mathematics. We will pay particular attention to connections between concept alignments and data.
3:00 PM - 3:30 PM: Elif Uskuplu, "Enhancing Dependency Graphs for Research and Alignment in Math"
3:30 PM - 4:00 PM: Break
4:00 PM - 4:30 PM: Andrea Ferreira, "Compact Math Corpus"
4:30 PM - 5:00 PM: Jan Frederik Schaeffer, "Helping Humans Align Efficiently"
5:00 PM - 5:30 PM: Lucy Horowitz, "MathGloss Revisited"
Lucy Horowitz (UC Berkeley, USA)
Valeria de Paiva (Topos Institute, USA)
Florian Rabe (Erlangen, Germany)