HaoYu Wang

I will be joining the Department of Computer Science at SUNY Albany as a tenure-track Assistant Professor in Fall 2024.

[Recruiting PhDs and Interns]: I am seeking students for Ph.D. in 25 Fall or research intern roles. Please email me with your CV and brief descriptions of your preferred research topics. Kindly mark the subject with [PhD/Research Intern Application].

I am currently a final year Ph.D. student under the advisory of Prof. Jing Gao in School of Electrical and Computer Engineering, Purdue University. I got my B.Eng. degree from the University of Electronic Science and Technology of China under the advisory of Prof. Defu Lian, and got my MS degree from SUNY Buffalo.

My research interests lie in the intersection of data mining, natural language processing, and machine learning, with a strong focus on democratizing AI for broader accessibility. In particular, my research projects are:

Parameter-efficient Learning: Large-scale deep learning models have achieved success in numerous applications. However, large-scale models introduce significant computational complexity and demand extensive storage resources, which hinders model development, particularly for edge devices and latency-sensitive applications. Therefore, it is crucial to explore methods for learning parameter-efficient representations that can reduce computational complexity and storage consumption, thereby facilitating model development.
- LightLT: a Lightweight Representation Quantization Framework for Long-tail Data. ICDE'24
- HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference. EMNLP'23
- LightToken: a Task and Model-agnostic Lightweight Token Embedding Framework for Pre-trained Language Models. KDD'23
- A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and Storage. CIKM'21
- xLightFM: Extremely Memory-Efficient Factorization Machine. SIGIR'21
- LightRec: a Memory and Search-Efficient Recommender System. WWW'20
- Binarized Collaborative Filtering with Distilling Graph Convolutional Network. IJCAI'19
- Adversarial Binary Collaborative Filtering For Implicit Feedback. AAAI'19
- Discrete Ranking-based Matrix Factorization with Self-Paced Learning. KDD'18
Data-efficient Deep Learning: Large-scale deep learning models have demonstrated exceptional performance across diverse tasks. Nevertheless, in scenarios characterized by data limitations, such as multilingual or cross-lingual applications, many domains, particularly those involving low-resource languages, face data scarcity. Consequently, models trained on such restricted datasets may exhibit suboptimal performance. Hence, there arises an imperative to investigate strategies for training effective models with minimal data—a crucial necessity for real-world applications.
- Macedon: Minimizing Representation Coding Rate Reduction for Cross-Lingual Natural Language Understanding. EMNLP'23
- Macular: a Multi-Task Adversarial Framework for Cross-Lingual Natural Language Understanding. KDD'23
- FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding. WWW'22
- Multi-modal Emergent Fake News Detection via Meta Neural Process Networks. KDD'21
Miscellaneous: I also explore the following topics: interpretable models for medical data mining, model fairness, and knowledge-enhanced language models.
- LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models. ICML'24
- Towards Poisoning Fair Representations. ICLR'24
- SimFair: A Unified Framework for Fairness-Aware Multi-Label Classification. AAAI'23
- InterHG: an Interpretable and Accurate Model for Hypothesis Generation. BIBM'21
- Knowledge-Guided Paraphrase Identification. EMNLP'21
- Fair Classification Under Strict Unawareness. SDM'21