EurIPS'25 Workshop on
AI for Tabular Data
Saturday 6 December 2025
EurIPS
Nanna Berg auditorium, University of Copenhagen (CPH)
Saturday 6 December 2025
EurIPS
Nanna Berg auditorium, University of Copenhagen (CPH)
The use of artificial intelligence (AI) is well established for modalities such as text, images, audio, and even video. An important yet understudied modality is structured table-like data, such as relational tables or spreadsheet tables, which dominates in many high-value applications in organizations, from enterprises to healthcare. Recent works attempt to use this modality as part of, or in combination with, AI models. This workshop hosts a program focused on AI for tabular data with key topics such as representation learning, generative AI, and foundation models. The workshop also targets researchers focusing on the intersection of learning over structured data and information retrieval, for example, in retrieval augmented generation (RAG) and question answering (QA) systems.
The goal of the workshop is to connect researchers working on tabular data and surface novel research ideas and collaboration opportunities by bringing views from the NLP, ML, DB, and IR disciplines together.
Submission deadline: 22 October, 2025 (11:59, AoE) (extended from 21 October due to the OpenReview outage).
Notifications: 3rd November, 2025 (19:00pm, CET), due to the deadline extension.
Camera-ready deadline: 14 November, 2025 (11:59am, CET)
Workshop: Saturday 6 December 2025
The distinguishing focus of this workshop is tabular data. Breakthroughs from the "tabular" community are poised to impact applications in enterprises, governmental organizations, healthcare, and beyond.
We invite submissions related to AI for tabular data and any of the subsequent topics.
Methods & Benchmarks: table representation learning; pretraining/foundation models for tables; generative modeling and synthesis; multimodal fusion (tables + text); evaluation protocols, datasets, and metrics.
Applications: table-centered question answering and retrieval; NL interfaces to databases (e.g. text-to-SQL); entity matching and record linkage; table understanding (type/semantic inference, schema matching); predictive machine learning (i.e. tabular classification and regression), time-series as tables, synthetic data generation, data wrangling, and new applications!
Systems & Data Management: table retrieval and indexing; open source software for the tabular community; scalable training/inference pipelines; cleaning, wrangling, and quality estimation; governance, privacy, and fairness for tabular ML.
Interdisciplinary Perspectives: bridges between ML, DB, IR, and NLP communities; lessons from practice and deployed systems; or any applications, methods, or benchmarks for multimodal data that joins tabular data with other modalities such as text, images, and code.
We invite submissions in the following form:
1) Short papers: up to 4 pages excluding references and appendices, in NeurIPS format. A small group of reviewers will perform a single-round assessment based on relevance, clarity, and potential for discussion. Submission template: NeurIPS paper template; the checklist is not required for the submission to the workshop.
2) Recent published work: full papers peer-reviewed and accepted in 2025 at a premier ML, DB, IR, or NLP venues. A paper previously peer-reviewed and accepted at a workshop should be submitted as a short paper. Submission goes through a link to the paper, plus a brief statement of relevance for our workshop.
This workshop is double-blind and non-archival. Submissions are managed through OpenReview. All accepted papers will be presented as posters, and a few selected ones will be presented as spotlight talks.
Camera-ready instructions: a limitations and/or acknowledgement section does not fall into the 4-page limit. For the footer in the camera-ready version, please use: AI for Tabular Data workshop at EurIPS 2025.
Presentation instructions:
Posters (all papers!): A0 portrait (or A1 landscape)
Spotlight (selected): 7 min talk + 2 min Q&A, please send pdf/pptx 2 days in advance to effy.li@cwi.nl
Schedule
The workshop will start at 9:00 am and last until 5:30 pm. Activities include:
Invited talks
Spotlight talks
Poster sessions (likely accompanied with a reception🍸)
(Detailed schedule will follow.)
Inria-Saclay
Title: Handling Missing Data in Tabular AI: What Really Matters for Prediction?
Abstract: Missing values are pervasive in real-world tabular data across high-value domains such as healthcare, finance, and the social sciences. They pose a fundamental challenge to predictive AI systems, which must seamlessly adapt to varying subsets of information at both training and inference time. In practice, the dominant strategy remains Impute-then-Predict, i.e. filling in missing values before training a model. However, this strategy has seldom been evaluated in the context of predictive tasks. In this talk, we will examine the theoretical foundations of the Impute-then-Predict approach, highlighting the inherent complexity of the learning problem under missing data. We will then address a key practical question: if and when investing in advanced imputation methods yields a statistically significant improvement in predictive performance compared to simple baselines. Finally, we will open up the discussion on key open challenges in learning with missing values, including their interaction with emerging Tabular Foundation models.
Bio: Marine Le Morvan is a Research Scientist at INRIA (France). Her work lies at the intersection of statistical learning and trustworthy AI. Her contributions span methods for learning from incomplete data, as well as model auditing and the development of tabular foundation models, notably TabICL, which unlock new possibilities through large-scale pretraining. M. Le Morvan’s research focuses on defining the theoretical and practical guidelines necessary to ensure that machine learning systems operating on structured data are both powerful, reliable, and governable for real-world deployment.
ETH Zurich
Title: From Table Construction to Dataset Understanding: TANQ and Croissant
Abstract: Across domains such as health, finance, and science, humans rely on tables as powerful tools to gather, organize, and communicate complex information. Yet today’s AI models still struggle to generate and reason with tables. In this talk, I present two complementary lines of work addressing these challenges: TANQ and Croissant. I will first introduce TANQ, an open-domain question answering benchmark where models must build answer tables from multiple sources. TANQ highlights key gaps in current models’ ability to retrieve, integrate, and structure information into coherent tabular outputs. I will then present Croissant, a metadata format for ML-ready datasets that creates a shared representation across ML tools, frameworks, and platforms. Croissant provides a standardized way to describe dataset characteristics and structure, enabling improved dataset discoverability and interoperability at scale. Together, these efforts help us develop more capable AI systems that better support real-world use cases in domains where tables remain the lingua franca of data reasoning and communication.
Bio: TBA
University of Antwerp
Title: Grables: Graphs and Tables
Abstract: Classical tabular learning treats data as independent rows and columns, while relational deep learning focuses on the connectivity between rows. In this talk, we study what happens when we endow tables with an explicit graph structure: how this affects the expressive power of learning methods, and when such structure helps models capture richer dependencies than purely tabular approaches. We also identify conditions under which suitably enriched tabular models can match the expressive power of their graph-based counterparts, clarifying when graphs are truly necessary and when tables are enough.
Bio: Floris Geerts is a professor at the University of Antwerp, Belgium. Previously, he was a senior research fellow at the University of Edinburgh and a postdoctoral researcher at the University of Helsinki. He received his PhD in 2001 from Hasselt University, Belgium. His research interests include the theory and practice of databases, relational deep learning, and graph learning. He has written a book on data quality and published over 130 technical papers. His awards include three Best Paper Awards, the PODS Alberto O. Mendelzon Test-of-Time Award, an ACM SIGMOD Research Highlight Award, and an ICLR Outstanding Paper Award. He is an ACM Distinguished Member, has served as program chair of PODS and ICDT, general chair of EDBT/ICDT, and is currently general chair of PODS. He has served on the editorial boards of ACM TODS and IEEE TKDE, and has edited several conference proceedings and special journal issues in database research.
Oral presentation:
Magnus Bühler, Lennart Purucker, Frank Hutter
Elias Dubbeldam, Reza Mohammadi, Marit Schoonhoven, Ilker Birbil
Florian D. van Leeuwen
Alexander Pfefferle, Johannes Hog, Lennart Purucker, Frank Hutte
Isaiah Onando Mulang', Felix Sasaki, Tassilo Klein, Jonas Kolk, Nikolay Grechanov, Johannes Hoffart
Daniel Gomm, Cornelius Wolff, Madelon Hulsebos
Leo Grinsztajn, Klemens Flöge, Oscar Key, Adrian Hayler, Mihir Manium, Anurag Garg, Jake Robertson, Shi Bin Hoo, Felix Birkel, Philipp Jund, Benjamin Jäger, Rosen Ting-Ying Yu, Bernhard Schölkopf, Noah Hollmann, Frank Hutter
Anshul Singh, Rohan Chaudhary, Gagneet singh, Abhay kumar
Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos I. Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec
Poster presentation:
Asher Labovich
Andrey Sidorenko, Ivona Krchova, Mariana Vargas Vieyra, Paul Tiwald, Mario Scriminaci, Michael Platzer
Daniel Beaglehole, David Holzmüller, Adityanarayanan Radhakrishnan, Mikhail Belkin
Ricardo Knauer, Erik Rodner
Amalia Stuger, Lucas Lageweg, Fina Polat
Parker Glenn, Alfy Samuel, Daben Liu
Jacob Feitelberg, Dwaipayan Saha, Kyuseong Choi, Zaid Ahmad, Anish Agarwal, Raaz Dwivedi
Omar Swelam, Lennart Purucker, Jake Robertson, Hanne Raum, Joschka Boedecker, Frank Hutter
João Machado de Freitas, Alexander Fuchs, Markus Feuerstein, Philipp Paller, Franz Pernkopf
Christopher Kolberg, Katharina Eggensperger, Nico Pfeifer
Günther Schindler, Maximilian Schambach, Michael Medek, Sam Thelin
Félix Lefebvre, Myung Jun Kim, Gaël Varoquaux
Suraj Neelakantan, Martin Längkvist, Amy Loutfi
Mikkel Werling, Nabeel Seedat, Jiashuo Liu, Lars Grønlykke, Carsten Utoft Niemann, Mihaela van der Schaar, Rudi Agius
Conor Hassan, Nasrulloh Ratu Bagus Satrio Loka, Cen-You Li, Daolang Huang, Paul Edmund Chang, Yang Yang, Francesco Silvestrin, Samuel Kaski, Luigi Acerbi
Jacob Si, Mike Qu, Michelle Lee, Yingzhen Li
Frederik Hoppe, Lars Kleinemeier, Astrid Franz, Udo Göbel
Sintija Stevanoska, Jurica Levatic, Saso Dzeroski
Amir Rezaei Balef, Mykhailo Koshil, Katharina Eggensperger
George Yakushev, Alina Shutova, Ivan Rubachev, Renat Sergazinov, Artem Babenko
Tamara Cucumides, Floris Geerts
Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi
Kristýna Onderková, Ondrej Platek, Zdeněk Kasner, Ondrej Dusek
Gyu-Il Kim, Dae-Won Kim, Jaesung Lee
Laurence Liang, Veronika Pak, Zachary Yang
Susanna Di Vita
Kacper Jurek, Wojciech Batko, Marek Śmieja, Marcin Przewięźlikowski
Allaa Boutaleb, Bernd Amann, Rafael Angarita, Hubert Naacke
Daniel Gärber, Lea Demelius
David Otte, Jörg K.H. Franke, Frank Hutter
Joe Meyer, Divyansha Lachi, Mahmoud Mohammadi, Roshan Reddy Upendra, Eva L Dyer, Minghua Li, Tom Palczewski
Vengadesh Ravikumaran, Anand Krishnakumar
Divyansha Lachi, Mahmoud Mohammadi, Joe Meyer, Vinam Arora, Shivashriganesh P. Mahato, Tom Palczewski, Eva L Dyer
Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, Frank Hutter
Nikolaus Kopp, Alexander Fuchs, Markus Feuerstein, Phillip Paller, Franz Pernkopf
Mohamed Bouadi, Pratinav Seth, Aditya Tanna, Vinay kumar Sankarapu
Cornelius Wolff, Daniel Gomm, Madelon Hulsebos
Boshko Koloski, Nada Lavrač, Blaž Škrlj
CWI
University of Freiburg
MIT
PriorLabs
We're grateful for the reviews conducted by the following researchers:
Magnus Bühler
Andrey Sidorenko
Mikkel Werling
Frederik Hoppe
Cornelius Wolff
Florian D. van Leeuwen
Suraj Neelakantan
George Yakushev
Erkan Karabulut
Allaa Boutaleb
Myung Jun Kim
Sara Pyykölä
Asher Labovich
Parker Glenn
Mustafa Tajjar
Jacob Feitelberg
Tianyi Yao
Jan-Micha Bodensohn
Liane Vogel
Olga Ovcharenko
Yannick Brunink
Conor Hassan
Amalia Stuger
Maximilian Schambach
Gerardo Vitagliano
Félix Lefebvre
Malina Molnar
Zeyu Zhang
Shi Bin Hoo
Günther Schindler
Anshul Gupta
Marco Spinaci
Daniel Gomm
Daniel Beaglehole
Elias Dubbeldam
Susanna Di Vita
Giulia Perciballi
Junwei Ma
Tom Zehle
Anurag Garg
Aécio Santos
Amir Rezaei Balef
Gyu-Il Kim
Rohith Saai Pemmasani Prabakaran
Vadim Borisov
Tianji Cong
Chorok Lee
Alexander Pfefferle
Simone Papicchio
Amine Mhedhbi