5th ACM International Conference on AI in Finance (ICAIF-24)
6 MetroTech Center, Brooklyn, NY 11201
Synthetic data generation has emerged as a popular research area in both academic and industry research labs. The financial and healthcare industries in particular have demonstrated strong interest due to the highly regulated nature of these businesses and sensitivity of individual financial and medical information. The hope of synthetic data is enabling internal and external collaborations through the sharing of realistic, but privacy-preserving synthetic data, currently impossible due to legal requirements and internal policies. These collaborations open up possibilities which could lead to improved healthcare outcomes for patients and improved customer experiences and protections (e.g. against fraud) at financial institutions. Many questions surrounding synthetic data, however, remain: (i) privacy guarantees and their robustness to attacks, e.g. membership inference, (ii) fairness implications when utilizing synthetic data, (iii) how to assess quality, utility and diversity of synthetic data. Each must be interpreted in light of specific technical, legal and practical challenges when working with sensitive financial information about individuals. The goal of this workshop is to bring together researchers from academia and practitioners and regulators to understand the evolving landscape and serve as a venue for cross-pollination between academic research and practical experience dealing with challenges of using synthetic data in industry. Our main goals are to develop understandings of the most important open problems, current methods and their limitations, and establish a series of cross-disciplinary good practices. Our particular emphasis is to bridge the needs and works between industrial practitioners and academia researchers, and to establish construction collaborations so that state-of-the-art technologies shall soon bring rapid growth and values to the industry.
Generative AI techniques such as LLMs have recently garnered quite a bit of attention and are particularly relevant for the generation of synthetic data. For example, LLMs are relevant in the topic of data collaboration with tabular data mixed with numerical and categorical features, where tabular data synthesis with LLMs has gained increasing awareness. Many questions surrounding LLM data synthesis remain however: (i) privacy guarantees and their robustness to attacks during LLM fine-tuning, e.g. membership inference, (ii) fitting of multi-tabular data structure into LLM framework for synthesis, (iii) evaluation of multi-tabular synthetic data in terms of quality, utility and diversity, (iv) scale-able LLM adaptation for industrial-sized datasets. Each of the questions must be addressed properly to march through technical, legal and practical challenges when working with sensitive information in the financial sector. This workshop seeks to provide a platform for researchers from academia, industrial practitioners and regulators to share their works and views on data collaboration challenges in utilizing synthetic data in the business.