Synthetic Tabular Data: Methods, Attacks and Defenses

VLDB 2025, London | KDD 2025, Toronto

Summary

Synthetic data is often positioned as a solution to replace sensitive fixed-size data sets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade, leveraging corresponding advances in machine learning and data analytics. In this tutorial, we survey the key developments and the main concepts in tabular synthetic data generation, including paradigms based on probabilistic graphical models and on deep learning. We provide background and motivation, before giving a technical deep-dive into the methodologies. We also address the limitations of synthetic data, by studying attacks that seek to retrieve information about the original sensitive data. Finally, we present extensions and open problems in this area.

The collapsible info-graphics present an overview of the tutorial contents. Tutorial slides & references are appended at the end.

Synthetic Data: Introduction & Desiderata

Synthetic Data: Methods Landscape

Synthetic Data: Privacy Foundations & Differential Privacy

Synthetic Data: Privacy Attacks & Defenses

Synthetic Data: Advanced Topics & Open Problems

Slides

Reference Materials

This webpage is a companion to the tutorial presented at KDD 2025 and VLDB 2025. We intend to share tutorial materials here including slides, videos (if any), survey paper, and links to references & libraries. More details will be posted in the coming weeks.

Survey Paper: link
Tutorial Proposal: link

Organizers