The goal of the Data Intelligence Lab is to pioneer the inevitable trend of Responsible/Trustworthy/Safe AI, Data-centric AI, and Big Data – AI Integration in all of machine learning including Large Language Models (LLMs). We are especially interested in solving fairness, robustness, privacy, and explainability challenges in machine learning from the data.
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective (VLDB Journal '23)
Machine Learning Robustness, Fairness, and their Convergence (ACM SIGKDD '21 tutorial)
A Survey on Data Collection for Machine Learning: a Big Data - AI Integration Perspective (IEEE TKDE '21)
Responsible AI Challenges in End-to-end Machine Learning (IEEE Data Engineering Bulletin '21)
Data Collection and Quality Challenges for Deep Learning (VLDB '20 Tutorial)
Data Lifecycle Challenges in Production Machine Learning: A Survey (ACM SIGMOD Record '18)
PFGuard: A Generative Framework with Privacy and Fairness Safeguards (ICLR '25)
Falcon: Fair Active Learning using Multi-armed Bandits (VLDB '24)
iFlipper: Label Flipping for Individual Fairness (ACM SIGMOD '23)
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models (ACM SIGMOD '21)
Inspector Gadget: A Data Programming-based Labeling System for Industrial Images (VLDB '21)
RC-Mixup: A Data Augmentation Strategy against Noisy Data for Regression Tasks (ACM SIGKDD'24)
Quilt: Robust Data Segment Selection against Concept Drifts (AAAI'24)
Redactor: A Data-centric and Individualized Defense Against Inference Attacks (AAAI '23)
Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach (DEEM@ACM SIGMOD '19)
Data Validation for Machine Learning (MLSys '19)
LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views (ICML'24)
Dr-Fairness: Dynamic Data Ratio Adjustment for Fair Training on Real and Generated Data (TMLR '23)
Sample Selection for Fair and Robust Training (NeurIPS '21)
FR-Train: A Mutual Information-based Approach to Fair and Robust Training (ICML '20)
SHAP-based Explanations are Sensitive to Feature Representation (ACM FAccT'25)
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models (NeurIPS'24)
XClusters: Explainability-first Clustering (AAAI '23)
Open-world COVID-19 Data Visualization (DMAH@VLDB '20)
Automated Data Slicing for Model Validation: A Big data - AI Integration Approach (IEEE TKDE '19)
Slice Finder: Automated Data Slicing for Model Validation (IEEE ICDE '19)