Security and privacy issues in AI;
Data mining and scalable learning algorithms;
Social data analytics;
Learning to rank and domain adaptation;
Cloud computing and distributed systems
Our research projects have been supported by NSF, NIH, Northwestern Mutual Data Science Institute, Air Force Research Lab, AFOSR, Yahoo!, Amazon, and the Ohio Board of Regents.
Data and Model Privacy in AI
Domain inference based on exposed deep learning models (AAAI 2023 paper), adaptive domain inference (ongoing)
Feature masking for better utility preservation in DP-SGD (ongoing)
Hardware-assisted trusted computing: SGX-MR for SGX access-pattern protection (PET21 paper and a poster), comparison between data oblivious approaches for TEEs (CLOUD23 paper), TEE-based complex graph analytics (Frontiers in BigData), GPU TEEs (ongoing)
Image disguising for confidential outsourced deep learning (CCS18poster, CLOUD21 paper, TOIT23 paper)
Efficient crypto protocols for learning boosting models from user-generated data. arXiv, CCS18poster
Privacy-preserving spectral analysis for outsourced data in the cloud. CLOUD13, ASIACCS16
RASP: random space perturbation for efficient outsourced query services and data mining. VLDB14 demo, TKDE14, CODASPY11, CCS12 poster
Geometric data perturbation for outsourced data mining. KAIS12
Social Data Analysis and Privacy Protection:
PUTS: Privacy-Utility Tradeoff in social network privacy settings. PASSAT12
Mining Regrettable Tweets to Proactively Prevent Privacy Loss. WWW15 poster, WWW16
Adaptative Training Example Selection for Multi-domain Emotion Analysis. WI17.
Knowledge-enhanced social network analysis.
Crowdsourcing and wisdom-of-crowd.
Big Data in the Cloud
CloudVista: Interactive visual data analytics for extreme scale data in the cloud. VLDB2012 demo, SSDBM11
Scalable Euclidean Embedding for Big Data. Cloud15
CRESP: Cloud resource provisioning for large-scale MapReduce programs. TPDS14,CLOUD11
CUTE: Instructional Laboratories for Cloud Computing Education
Applied Research:
Large models for immunological sequence data (collaborating with Jiang Lab at UPenn)
SPIN: Cleaning, Monitoring, and Querying Image Streams Generated by Ground-Based Telescopes (collaborating with AFRL)
Large-scale next-gen sequencing data analysis for immunology research (with Jiang Lab at UT Austin): IR-Seq Processing and Analysis Tool, BIBM16 paper for constrained lineage tree generation
Blockchain and privacy-enhancement techniques for smart and connected health.
Web Search and Ranking:
Gradient-boosting-tree-based domain adaptation, ACM TOIS paper, CIKM08 paper, patent
Domain similarity analysis for web search ranking, CIKM09 paper
Tradeoffs on user preferences and expert judgment for learning ranking functions, DBRANK08 Paper, patent
GBRank: Learning to rank with pairwise preference data in the regression framework, patent
Clustering Analysis and Visualization:
Scalable sequence data clustering for antibody clonality analysis - a part of the Cloud-IRseq project that studies the IRSeq techniques.
BestK: Categorical data clustering and validation
CatStream: Change detection on categorical/transactional data streams
VISTA: visually rendering and validating clustering structures