Projects
The goal of the Data Intelligence Lab is to pioneer the inevitable trend of Responsible/Trustworthy/Safe AI, Data-centric AI, and Big Data – AI Integration in all of machine learning including Large Language Models (LLMs). We are especially interested in solving fairness, robustness, privacy, and explainability challenges in machine learning from the data.
Surveys and Tutorials
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective (VLDB Journal '23)
Machine Learning Robustness, Fairness, and their Convergence (ACM SIGKDD '21 tutorial)
A Survey on Data Collection for Machine Learning: a Big Data - AI Integration Perspective (IEEE TKDE '21)
Responsible AI Challenges in End-to-end Machine Learning (IEEE Data Engineering Bulletin '21)
Data Collection and Quality Challenges for Deep Learning (VLDB '20 Tutorial)
Data Lifecycle Challenges in Production Machine Learning: A Survey (ACM SIGMOD Record '18)
Data Acquisition and Labeling
Falcon: Fair Active Learning using Multi-armed Bandits (VLDB '24)
iFlipper: Label Flipping for Individual Fairness (ACM SIGMOD '23)
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models (ACM SIGMOD '21)
Inspector Gadget: A Data Programming-based Labeling System for Industrial Images (VLDB '21)
Data Cleaning, Validation, Augmentation, and Selection
RC-Mixup: A Data Augmentation Strategy against Noisy Data for Regression Tasks (ACM SIGKDD'24)
Quilt: Robust Data Segment Selection against Concept Drifts (AAAI'24)
Redactor: A Data-centric and Individualized Defense Against Inference Attacks (AAAI '23)
Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach (DEEM@ACM SIGMOD '19)
Data Validation for Machine Learning (MLSys '19)
Model Explanation and Evaluation
XClusters: Explainability-first Clustering (AAAI '23)
Open-world COVID-19 Data Visualization (DMAH@VLDB '20)
Automated Data Slicing for Model Validation: A Big data - AI Integration Approach (IEEE TKDE '19)
Slice Finder: Automated Data Slicing for Model Validation (IEEE ICDE '19)