Yu Yang

About Me

Hello! I'm Yu Yang (杨雨), a Ph.D. candidate in Computer Science at University of California, Los Angeles (UCLA), where I am fortunate to be advised by Baharan Mirzasoleiman. My research primarily focuses on understanding and improving large-scale training data for efficient and robust learning.

Prior to pursuing my Ph.D., I earned my B.Sc. degree in Mathematics of Computation and Statistics, also from UCLA. During that time, I collaborated closely with Quanshi Zhang and Jungseock Joo at VCLA@UCLA on interpretable and fair computer vision.

Before coming to US, I was born and raised in Beijing, China. 

Email: yuyang AT cs.ucla.edu

LinkedInLinkTwitterGitHub

Awards

Amazon Doctoral Student Fellowship, 2022

UCLA Computer Science Fellowship, 2021

News

Selected Publications [Full List]

2024


SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
Yu Yang, Siddhartha Mishra, Jeffrey N Chiang, Baharan Mirzasoleiman
Preprint, 2024.
[Preprint]

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang, Eric Gan, Gintare Karolina Dziugaite, Baharan Mirzasoleiman
Accepted to Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024.
[Paper] [Code]

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality
Xuxi Chen*, Yu Yang*, Zhangyang Wang, Baharan Mirzasoleiman (*Equal Contribution)
Accepted to Proceedings of the Twelfth International Conference on Learning Representations (ICLR), 2024.
[Preprint

SIEVE: Multimodal Dataset Pruning Using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari S Morcos
Accepted to Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]

2023


Robust Learning with Progressive Data Expansion Against Spurious Correlation
Yihe Deng*, Yu Yang*, Baharan Mirzasoleiman, Quanquan Gu (*Equal Contribution)
In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023.
[Paper] [Code] [Project Page]

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data
Yu Yang, Aaditya K Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S Morcos, Newsha Ardalani
The 3rd Workshop on Efficient Natural Language and Speech Processing (ENLSP-III), NeurIPS 2023. (Oral)
[Paper]

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
Hritik Bansal*, Nishad Singhi*, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang (*Equal Contribution)
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. (Oral: 1.8%)
[Paper] [Code]

Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
Yu Yang, Hao Kang, Baharan Mirzasoleiman
In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
[Paper] [Code]

Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
Yu Yang, Besmira Nushi, Hamid Palangi, Baharan Mirzasoleiman
In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
[Paper] [Code]

2022

Not All Poisons are Created Equal: Robust Training against Data Poisoning
Yu Yang, Tian Yu Liu, Baharan Mirzasoleiman
In Proceedings of the International Conference on Machine Learning (ICML), 2022. (Oral: 2.10%)
[Paper] [Code

Enhancing Fairness in Face Detection in Computer Vision Systems by Demographic Bias Mitigation
Yu Yang, Aayush Gupta, Jianwei Feng, Yue Rex Wu, Vivek Yadav, Varsha Hedau, Prateek Singhal, Pradeep Natarajan, Jungseock Joo
In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022.
[Paper] [Dataset]

Explaining Deep Convolutional Neural Networks via Unsupervised Visual-Semantic Filter Attention
Yu Yang, Seungbae Kim, Jungseock Joo
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (Oral: 4.22%)
[Paper] [Code

Experience

2023

Research Scientist Intern, AI Systems Machine Learning @ FAIR at Meta

2022

Research Intern, Robustness of Platform Models in Language and Vision @ Microsoft Research

2021

Applied Scientist Intern, Computer Vision @ Amazon Alexa AI

Teaching

Academic Activities