Tabular data synthesis
Adaption of Large Language Models for above synthesis
Data collaboration with multi-tabular data
Hallucination detection
Published Work:
Tung Sum Thomas Kwok, Chi-Hua Wang, Guang Cheng. DEREC-SIMPRO: unlock Language Model benefits to advance Data Clean Room Synthesis. [ICAIF 2024 - workshop][Arxiv]
Tung Sum Thomas Kwok, Chi-Hua Wang, Guang Cheng. GReaTER: Generate Realistic Tabular data after data Enhancement and Reduction. [Data Engineering Meets Large Language Models: Challenges and Opportunities@ICDE2025][Arxiv]
Manuscripts patiently awaiting publication:
Tung Sum Thomas Kwok, Chi-Hua Wang, Guang Cheng. Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering.
Xinyu Wang, Jijun Chi, Tung Sum Thomas Kwok, Zhenghan Tai, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Ling Zhou. FinSage: A Multi-aspect RAG System for Financial Filings Question Answering.