Yu Yang

About Me

Hello! I'm Yu Yang (杨雨), a final-year Ph.D. student in Computer Science at University of California, Los Angeles (UCLA), where I am fortunate to be advised by Baharan Mirzasoleiman. My research primarily focuses on understanding and improving large-scale training data for efficient and robust learning. 

I'm also a founding research scientist of Virtue AI, where I lead the evaluation and red-teaming for code generation models and agents. 

I used to live in Beijing and Los Angeles, and I'm currently based in San Francisco. 

Email: yuyang AT cs.ucla.edu

LinkedInLinkTwitterGitHub

Awards

UCLA Dissertation Year Award, 2024

Amazon Doctoral Student Fellowship, 2022

UCLA Computer Science Fellowship, 2021

News

Experience

2024

Founding Research Scientist, Virtue AI

2023

Research Scientist Intern, AI Systems Machine Learning @ FAIR at Meta

2022

Research Intern, Robustness of Platform Models in Language and Vision @ Microsoft Research

2021

Applied Scientist Intern, Computer Vision @ Amazon Alexa AI

Selected Publications [Full List]

2024


SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
Yu Yang, Siddhartha Mishra, Jeffrey N Chiang, Baharan Mirzasoleiman
Accepted to Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024.
[Preprint]

Few-shot Adaption to Distribution Shifts By Mixing Source and Target Embeddings
Yihao Xue, Ali Payani, Yu Yang, and Baharan Mirzasoleiman
In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024.
[Paper]

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang, Eric Gan, Gintare Karolina Dziugaite, Baharan Mirzasoleiman
In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024.
[Paper] [Code]

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality
Yu Yang*, Xuxi Chen*, Zhangyang Wang, Baharan Mirzasoleiman (*Equal Contribution)
In Proceedings of the Twelfth International Conference on Learning Representations (ICLR), 2024.
[Preprint

SIEVE: Multimodal Dataset Pruning Using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari S Morcos
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]

2023


Robust Learning with Progressive Data Expansion Against Spurious Correlation
Yu Yang*, Yihe Deng*, Baharan Mirzasoleiman, Quanquan Gu (*Equal Contribution)
In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023.
[Paper] [Code] [Project Page]

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data
Yu Yang, Aaditya K Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S Morcos, Newsha Ardalani
The 3rd Workshop on Efficient Natural Language and Speech Processing (ENLSP-III), NeurIPS 2023. (Oral)
[Paper]

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
Hritik Bansal*, Nishad Singhi*, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang (*Equal Contribution)
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. (Oral: 1.8%)
[Paper] [Code]

Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
Yu Yang, Hao Kang, Baharan Mirzasoleiman
In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
[Paper] [Code]

Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
Yu Yang, Besmira Nushi, Hamid Palangi, Baharan Mirzasoleiman
In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
[Paper] [Code]

2022

Not All Poisons are Created Equal: Robust Training against Data Poisoning
Yu Yang, Tian Yu Liu, Baharan Mirzasoleiman
In Proceedings of the International Conference on Machine Learning (ICML), 2022. (Oral: 2.10%)
[Paper] [Code

Enhancing Fairness in Face Detection in Computer Vision Systems by Demographic Bias Mitigation
Yu Yang, Aayush Gupta, Jianwei Feng, Yue Rex Wu, Vivek Yadav, Varsha Hedau, Prateek Singhal, Pradeep Natarajan, Jungseock Joo
In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022.
[Paper] [Dataset]

Explaining Deep Convolutional Neural Networks via Unsupervised Visual-Semantic Filter Attention
Yu Yang, Seungbae Kim, Jungseock Joo
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (Oral: 4.22%)
[Paper] [Code

Teaching

Academic Activities