EurIPS'25 Workshop on

AI for Tabular Data

Saturday 6 December 2025
EurIPS

Nanna Berg auditorium, University of Copenhagen (CPH)

Njalsgade 76, 2300 Copenhagen

Abstract

The use of artificial intelligence (AI) is well established for modalities such as text, images, audio, and even video. An important yet understudied modality is structured table-like data, such as relational tables or spreadsheet tables, which dominates in many high-value applications in organizations, from enterprises to healthcare. Recent works attempt to use this modality as part of, or in combination with, AI models. This workshop hosts a program focused on AI for tabular data with key topics such as representation learning, generative AI, and foundation models. The workshop also targets researchers focusing on the intersection of learning over structured data and information retrieval, for example, in retrieval augmented generation (RAG) and question answering (QA) systems.

The goal of the workshop is to connect researchers working on tabular data and surface novel research ideas and collaboration opportunities by bringing views from the NLP, ML, DB, and IR disciplines together.

Important Dates

Submission deadline: 22 October, 2025 (11:59, AoE) (extended from 21 October due to the OpenReview outage).
Notifications: 3rd November, 2025 (19:00pm, CET), due to the deadline extension.
Camera-ready deadline: 14 November, 2025 (11:59am, CET)
Workshop: Saturday 6 December 2025

Scope

The distinguishing focus of this workshop is tabular data. Breakthroughs from the "tabular" community are poised to impact applications in enterprises, governmental organizations, healthcare, and beyond.

We understand tabular data as any table-like (semi-)structured data, including but not limited to: spreadsheets, (multi-)tables, relational databases, or time-series data.

We invite submissions related to AI for tabular data and any of the subsequent topics.

Methods & Benchmarks: table representation learning; pretraining/foundation models for tables; generative modeling and synthesis; multimodal fusion (tables + text); evaluation protocols, datasets, and metrics.
Applications: table-centered question answering and retrieval; NL interfaces to databases (e.g. text-to-SQL); entity matching and record linkage; table understanding (type/semantic inference, schema matching); predictive machine learning (i.e. tabular classification and regression), time-series as tables, synthetic data generation, data wrangling, and new applications!
Systems & Data Management: table retrieval and indexing; open source software for the tabular community; scalable training/inference pipelines; cleaning, wrangling, and quality estimation; governance, privacy, and fairness for tabular ML.
Interdisciplinary Perspectives: bridges between ML, DB, IR, and NLP communities; lessons from practice and deployed systems; or any applications, methods, or benchmarks for multimodal data that joins tabular data with other modalities such as text, images, and code.

Submission Guidelines

We invite submissions in the following form:

1) Short papers: up to 4 pages excluding references and appendices, in NeurIPS format. A small group of reviewers will perform a single-round assessment based on relevance, clarity, and potential for discussion. Submission template: NeurIPS paper template; the checklist is not required for the submission to the workshop.

2) Recent published work: full papers peer-reviewed and accepted in 2025 at a premier ML, DB, IR, or NLP venues. A paper previously peer-reviewed and accepted at a workshop should be submitted as a short paper. Submission goes through a link to the paper, plus a brief statement of relevance for our workshop.

This workshop is double-blind and non-archival. Submissions are managed through OpenReview. All accepted papers will be presented as posters, and a few selected ones will be presented as spotlight talks.

Camera-ready instructions: a limitations and/or acknowledgement section does not fall into the 4-page limit. For the footer in the camera-ready version, please use: AI for Tabular Data workshop at EurIPS 2025.

Presentation instructions:

Posters (all papers!): A0 portrait (or A1 landscape)
Spotlight (selected): 7 min talk + 2 min Q&A, please send pdf/pptx 2 days in advance to effy.li@cwi.nl

Program

Schedule

The workshop will start at 9:00 am and last until 17:00 with a reception as of 4pm. Please find the detailed schedule below:

Invited Speakers

Marine le Morvan

Inria-Saclay

Title: Handling Missing Data in Tabular AI: What Really Matters for Prediction?

Abstract: Missing values are pervasive in real-world tabular data across high-value domains such as healthcare, finance, and the social sciences. They pose a fundamental challenge to predictive AI systems, which must seamlessly adapt to varying subsets of information at both training and inference time. In practice, the dominant strategy remains Impute-then-Predict, i.e. filling in missing values before training a model. However, this strategy has seldom been evaluated in the context of predictive tasks. In this talk, we will examine the theoretical foundations of the Impute-then-Predict approach, highlighting the inherent complexity of the learning problem under missing data. We will then address a key practical question: if and when investing in advanced imputation methods yields a statistically significant improvement in predictive performance compared to simple baselines. Finally, we will open up the discussion on key open challenges in learning with missing values, including their interaction with emerging Tabular Foundation models.

Bio: Marine Le Morvan is a Research Scientist at INRIA (France). Her work lies at the intersection of statistical learning and trustworthy AI. Her contributions span methods for learning from incomplete data, as well as model auditing and the development of tabular foundation models, notably TabICL, which unlock new possibilities through large-scale pretraining. M. Le Morvan’s research focuses on defining the theoretical and practical guidelines necessary to ensure that machine learning systems operating on structured data are both powerful, reliable, and governable for real-world deployment.

Madelon Hulsebos

CWI

Title: What are we asking from tabular data?

Abstract: Tabular data has emerged as “the new hot topic in AI”. For a reason: tables are the dominant modality in the organizational data landscape, power high-value decisions, and are challenging. In this talk, I will take a step back: I will first reflect on the inception of the tabular AI community, and the diversity of tasks and questions that bring this community together through some of our work on table semantics, retrieval, and querying. I will end with some questions that we are asking next.

Bio: Madelon Hulsebos is a researcher at CWI in Amsterdam and faculty at ELLIS Amsterdam. Prior to that, she was a postdoctoral fellow at UC Berkeley, and obtained her PhD from the University of Amsterdam for which she did research at MIT and Sigma Computing. Her general research interest is on representation learning and generative models for tabular data to democratize insights from structured data. Madelon founded the Table Representation Learning workshop series starting at NeurIPS, and leads various other efforts in this space. She was awarded a BIDS-Accenture fellowship for her postdoctoral research at UC Berkeley as well as an AiNed fellowship grant funding her Table Representation Learning Lab.

Floris Geerts

University of Antwerp

Title: Grables: Graphs and Tables

Abstract: Classical tabular learning treats data as independent rows and columns, while relational deep learning focuses on the connectivity between rows. In this talk, we study what happens when we endow tables with an explicit graph structure: how this affects the expressive power of learning methods, and when such structure helps models capture richer dependencies than purely tabular approaches. We also identify conditions under which suitably enriched tabular models can match the expressive power of their graph-based counterparts, clarifying when graphs are truly necessary and when tables are enough.

Bio: Floris Geerts is a professor at the University of Antwerp, Belgium. Previously, he was a senior research fellow at the University of Edinburgh and a postdoctoral researcher at the University of Helsinki. He received his PhD in 2001 from Hasselt University, Belgium. His research interests include the theory and practice of databases, relational deep learning, and graph learning. He has written a book on data quality and published over 130 technical papers. His awards include three Best Paper Awards, the PODS Alberto O. Mendelzon Test-of-Time Award, an ACM SIGMOD Research Highlight Award, and an ICLR Outstanding Paper Award. He is an ACM Distinguished Member, has served as program chair of PODS and ICDT, general chair of EDBT/ICDT, and is currently general chair of PODS. He has served on the editorial boards of ACM TODS and IEEE TKDE, and has edited several conference proceedings and special journal issues in database research.

Accepted Papers

Oral presentation:

Generalization Can Emerge in Tabular Foundation Models From a Single Table

Junwei Ma, Nour Shaheen, Alex Labach, Amine Mhedhbi, Frank Hutter, Anthony L. Caterini, Valentin Thomas

Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models

Magnus Bühler, Lennart Purucker, Frank Hutter

Graph-based Tabular Deep Learning Should Learn Feature Interactions, Not Just Make Predictions

Elias Dubbeldam, Reza Mohammadi, Marit Schoonhoven, Ilker Birbil

Conformal Prediction for Tabular Prior-Data Fitted Networks with Missing data

Florian D. van Leeuwen

nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN

Alexander Pfefferle, Johannes Hog, Lennart Purucker, Frank Hutte

SALT-KG: A Benchmark for Semantics-Aware Learning on Enterprise Tables

Isaiah Onando Mulang', Felix Sasaki, Tassilo Klein, Jonas Kolk, Nikolay Grechanov, Johannes Hoffart

Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis

Daniel Gomm, Cornelius Wolff, Madelon Hulsebos

TabPFN-2.5: a Preview

Leo Grinsztajn, Klemens Flöge, Oscar Key, Adrian Hayler, Mihir Manium, Anurag Garg, Jake Robertson, Shi Bin Hoo, Felix Birkel, Philipp Jund, Benjamin Jäger, Rosen Ting-Ying Yu, Bernhard Schölkopf, Noah Hollmann, Frank Hutter

Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables

Anshul Singh, Rohan Chaudhary, Gagneet singh, Abhay kumar

Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos I. Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec

Exploring Multi-Table Retrieval Through Iterative Search

Allaa Boutaleb, Bernd Amann, Rafael Angarita, Hubert Naacke

Poster presentation:

A Case for Library-Level k-Means Binning in Histogram Gradient-Boosted Trees

Asher Labovich

TabularARGN: An Auto-Regressive Generative Network for Tabular Data Generation

Andrey Sidorenko, Ivona Krchova, Mariana Vargas Vieyra, Paul Tiwald, Mario Scriminaci, Michael Platzer

xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

Daniel Beaglehole, David Holzmüller, Adityanarayanan Radhakrishnan, Mikhail Belkin

In Search of Grandmother Cells: Tracing Interpretable Neurons in Tabular Representations

Ricardo Knauer, Erik Rodner

Explaining Financial LLMs: An Attribution-Based Interpretability Study in Multilingual Table QA in Dutch and English

Amalia Stuger, Lucas Lageweg, Fina Polat

Play by the Type Rules: Inferring Constraints for Small Language Models in Declarative Programs

Parker Glenn, Alfy Samuel, Daben Liu

TabImpute: Accurate and Fast Zero-Shot Missing-Data Imputation with a Pre-Trained Transformer

Jacob Feitelberg, Dwaipayan Saha, Kyuseong Choi, Zaid Ahmad, Anish Agarwal, Raaz Dwivedi

Does TabPFN Understand Causal Structures?

Omar Swelam, Lennart Purucker, Jake Robertson, Hanne Raum, Joschka Boedecker, Frank Hutter

TabPFN for Data-Scarce Industrial Settings

João Machado de Freitas, Alexander Fuchs, Markus Feuerstein, Philipp Paller, Franz Pernkopf

TabPFN-Wide: Continued Pre-Training for Extreme Feature Counts

Christopher Kolberg, Katharina Eggensperger, Nico Pfeifer

TabGemma: Text-Based Tabular ICL via LLM using Continued Pretraining and Retrieval

Günther Schindler, Maximilian Schambach, Michael Medek, Sam Thelin

Knowledge-Rich Embeddings for Tabular Learning

Félix Lefebvre, Myung Jun Kim, Gaël Varoquaux

Domain-Aware Tabular Data Augmentation Using Large Language Models

Suraj Neelakantan, Martin Längkvist, Amy Loutfi

Tables2Traces: Distilling Tabular Data to Improve LLM Reasoning in Healthcare

Mikkel Werling, Nabeel Seedat, Jiashuo Liu, Lars Grønlykke, Carsten Utoft Niemann, Mihaela van der Schaar, Rudi Agius

Efficient Autoregressive Inference for Tabular Foundation Models

Conor Hassan, Nasrulloh Ratu Bagus Satrio Loka, Cen-You Li, Daolang Huang, Paul Edmund Chang, Yang Yang, Francesco Silvestrin, Samuel Kaski, Luigi Acerbi

TabRAG: Tabular Document Retrieval via Structured Language Representations

Jacob Si, Mike Qu, Michelle Lee, Yingzhen Li

Comparing Task-Agnostic Embedding Models for Tabular Data

Frederik Hoppe, Lars Kleinemeier, Astrid Franz, Udo Göbel

Semi-supervised learning from tabular data with autoencoders: when does it work?

Sintija Stevanoska, Jurica Levatic, Saso Dzeroski

Towards Understanding Layer Contributions in Tabular In-Context Learning Models

Amir Rezaei Balef, Mykhailo Koshil, Katharina Eggensperger

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

George Yakushev, Alina Shutova, Ivan Rubachev, Renat Sergazinov, Artem Babenko

The Missing Structure: When Graph Representations Outperform Tabular Models

Tamara Cucumides, Floris Geerts

$N^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi