As the saying goes, “Data is the new oil.” Indeed, high-quality data is a treasure trove of valuable information. Machine learning models trained on such data can revolutionize our processes, from automating mundane tasks to enhancing precision in areas where human error is inevitable.
However, access to high-quality data is often limited due to privacy issues. Even if the privacy issue is somehow resolved, bias and fairness remain in real-world data. Because modern machine learning models reflect the data for which they are trained, the inherent bias and unfair aspects of data can be well presented in the trained models.
Synthetic data emerges as a powerful solution to these data limitations. Thanks to advances in deep generative modelling, we can now create synthetic data that not only provide formal privacy guarantees but also overcome bias and unfairness. This empowers us to work with a more comprehensive and unbiased dataset, fostering optimism for the future of machine learning.
In this grad course, we delve into the popular techniques in deep generative modelling. We aim to equip you with the knowledge to generate diverse synthetic data, including tabular data, images, texts, and multi-modal synthetic data. We also focus on understanding and applying popular and promising evaluation metrics to judge the quality of a generated synthetic dataset. Importantly, we study different notions of privacy and fairness, instilling a sense of responsibility and ethics in our approach to machine learning. Finally, we delve into generative transfer learning, which helps transfer knowledge from large foundation models to user-specific synthetic data generation.