Carole-Jean Wu

Carole-Jean Wu is a Director of AI Research at Meta, where she leads the Systems and Machine Learning Research team. She is a founding member and a Vice President of MLCommons – a non-profit organization that aims to accelerate machine learning innovations for the benefits of all. Dr. Wu also serves on the MLCommons Board as a Director, chaired the MLPerf Recommendation Benchmark Advisory Board, and co-chaired for MLPerf Inference. Prior to Meta/Facebook, Dr. Wu was a professor with tenure at ASU. She earned her M.A. and Ph.D. from Princeton University and B.Sc. from Cornell University.

Dr. Wu’s expertise sits at the intersection of computer architecture and machine learning. Her work spans across datacenter infrastructures and edge systems with a focus on performance, energy efficiency and sustainability. She is passionate about pathfinding and tackling system challenges to enable efficient, scalable, and environmentally-sustainable AI technologies.

Dr. Wu's work has been recognized with several awards, including IEEE Micro Top Picks and ACM / IEEE Best Paper Awards. She is the recipient of NSF CAREER Award, CRA-WP Anita Borg Early Career Award Distinction of Honorable Mention, IEEE Young Engineer of the Year Award, Science Foundation Arizona Bisgrove Early Career Scholarship, and Facebook AI Infrastructure Mentorship Award. She is in the Hall of Fame of ISCA, HPCA and IISWC. Dr. Wu was the Program Co-Chair of the Conference on Machine Learning and Systems (MLSys 2022), the Program Chair of the IEEE International Symposium on Workload Characterization (IISWC 2018), and the Editor for the IEEE MICRO Special Issue on Environmentally Sustainable Computing. She currently serves on the ACM SIGARCH/SIGMICRO CARES committee, as well as the National Academies of Sciences, Engineering, Medicine workshop planning committee.

[Google Scholar] [dblp]

Research

My work sits in the intersection of computer architecture and machine learning with the following emphasis:

System design and optimization for deep learning
Learning-based approaches for system design and optimization
High-performance and energy-efficient heterogeneous CPU+GPU systems
Performance quality modeling and energy efficiency optimization for mobile systems
Memory system optimization
Sustainable computing via carbon-efficient system design and management, energy harvesting and temperature-aware management for portable electronics

My work has been featured at National Academy of Engineering's The Bridge, the Computer Architecture Podcast Ep16: Sustainability in a Post-AI World, Tech @ Meta on Understanding computing's carbon footprint and Designing low-carbon computers, the Nature Magazine: Light bulbs have energy ratings — so why can’t AI chatbots?, and by Bloomberg Green, the Atlantic, and the HiPEAC blog: To minimize computing’s carbon footprint, the first step is to quantify lifecycle emissions, as well as by Stanford's MLSys seminar: Designing AI Systems for Recommender Systems and Beyond, by MLPerf: Inference v0.5 launch results and MaskRCNN2Go for MLPerf.

If you are interested in learning more about Designing Computer Systems for Sustainability, check out my course offered at HiPEAC's Summer School. The course includes

Understanding the Lay of the Land: Computing's Environmental Footprint;
Carbon Impact of AI;
Carbon Modeling and Design Optimization;
(Carbon)-efficient Edge Computing;
Carbon Optimization At-Scale: Carbon-Aware Datacenter Computing.

And, check out Socio-Technological Challenges and Opportunities: Paths Forward from ISCA-2021 Panel - The Microprocessor at 50: Societal Challenge, Sustainable AI: Environmental Implications, Challenges and Opportunities from MLSys-2022, Designing Computer Systems for Sustainability from ISCA-2024's Panel, and my thoughts on inclusive approaches to technological innovations: Think Globally, Design Deliberately: Taking an Inclusive Approach to Innovation.

Recent Publications

[National Academy of Engineering The Bridge Winter Edition] Scaling AI Sustainably

Carole-Jean Wu, Bilge Acun, Ramya Raghavendra, Kim Hazelwood.

[HPCA-2025] Revisiting Reliability in Large-Scale Machine Learning Research Clusters

Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu.

[HPCA-2025] CORDOBA: Carbon-Efficient Optimization Framework for Computer Systems

Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu.

Best Paper Award Honorable Mention

Early Version

[ACM SIGARCH Computer Architecture Today] Designing Computer Systems for Sustainability

Carole Jean-Wu, Tamar Eilam, Babak Falsafi, Gage Hills, Srilatha Manne.

[Nature Magazine] Light bulbs have energy ratings — so why can’t AI chatbots?

Sasha Luccioni, Boris Gamazaychikov, Sara Hooker, Regis Pierrard, Emma Strubell, Yacine Jernite, Carole-Jean Wu.

[IEEE Micro-2024] Beyond Efficiency: Scaling AI Sustainably

Carole-Jean Wu, Bilge Acun, Ramya Raghavendra, Kim Hazelwood.

Special Issue: The Past, Present, and Future of Warehouse-Scale Computing

Honors and Awards

2025 CORDOBA: Carbon-Efficient Optimization Framework for Computer Systems selected for the IEEE HPCA Best Paper Award Honorable Mention

2025 MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems selected for IEEE Micro Top Picks Honorable Mention

2024 Carbon Explorer: A Holistic Approach for Designing Carbon Aware Datacenters selected for IEEE Micro Top Picks Honorable Mention

2024 MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation selected for IEEE Micro Top Picks Honorable Mention

2023 2020 ISCA Paper Selected for Inclusion in ISCA@50 25-Year Retrospective: 1996-2020: MLPerf Inference Benchmark (Retrospective: MLPerf)

One of the 98 papers (out of 1077) selected as one of the most significant and exciting papers from the ACM/IEEE International Symposium on Computer Architecture from 1996 -- 2020.

2023 ACT: Designing Sustainable Computer Systems with an Architectural Carbon Modeling Tool selected for IEEE Micro Top Picks

2022 Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity selected for ACM Conference on Recommender Systems Best Paper Award Finalist

2022 Chasing Carbon: The Elusive Environmental Footprint of Computing selected for IEEE Micro Top Picks

2021 MLPerf Inference Benchmark selected for IEEE Micro Top Picks (Article: The Vision Behind MLPerf: Understanding AI Inference Performance)

2021 DeepRecSys: A System for Optimizing End-to-end At-scale Neural Recommendation Inference selected for IEEE Micro Top Picks Honorable Mention

2020 Distinction of Honorable Mention of the CRA Anita Borg Early Career Award

2019 Facebook AI Infrastructure Mentorship Award

2019 Genetic Improvement for GPU Code selected for ACM/IEEE ICSE Genetic Improvement on Software Best Paper Award

2018 Designing a Temperature Model to Understand the Thermal Challenges of Portable Computing Platforms selected for IEEE ITHERM Best Paper Award

2017 NSF CAREER Award

2017 IEEE Young Engineer of the Year Award

2015 Architectural Thermal Energy Harvesting Opportunities for Sustainable Computing selected for IEEE Best of Computer Architecture Letters

2013 SFAz Bisgrove CAREER Award

2011 Intel PhD Fellowship

2011 Characterization and Dynamic Mitigation of Intra-Application Cache Interference nominated for IEEE ISPASS Best Paper Nomination

2009 Princeton Excellence in Leadership Award

Professional Service

ACM SIGARCH/SIGMICRO CARES Committee, 2024 - present.

Journal Editor

IEEE Micro Magazine Special Issue on Environmentally Sustainable Computing, 2022-23.

Executive Committee

IEEE Technical Committee on Computer Architecture (TCCA), 2017-18.

Steering Committee

HotCarbon: Workshop on Sustainable Computer Systems, 2023 - present.
IEEE Intl. Symp. on Performance Analysis of Systems and Software (ISPASS), 2018-23.
IEEE Intl. Symp. on Workload Characterization (IISWC), 2018-23.

Award Selection Committee

IEEE TCCA Young Computer Architect Award, 2021-23.

Technical Program Chair

Conference on Machine Learning and Systems (MLSys), 2022.
IEEE Intl. Symp. on Workload Characterization (IISWC), 2018.

Technical Program Committee

HPCA 2020; 2014-17; HPCA Industry Track 2025
HotCarbon 2022, 2023
ISCA 2014-21; ISCA Industry Track 2020
MLSys 2020-21
IISWC 2019; 2013-17
MICRO 2014; 2016-17

Journal Editorial Board

IEEE Micro 2019-23
IEEE Computer Architecture Letters 2019-22

CRA-Widening Participation (WP) Career Mentoring Workshop

Finding a Research Topic with Soha Hassoun, 2023.
Strategies for Your Career with Amber Settle, 2020.

Selected Publications

[NeurIPS-2024] Croissant: A Metadata Format for ML-Ready Dataset

M. Akhtar, O. Benjelloun, C. Conforti, L. Foschini, J. Giner-Miguelez, P. Gijsbers, et al.

Spotlight Poster

[NeurIPS-2024] Toward Efficient Inference for Mixtures of Experts

Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke, Shruti Bhosale, Hsien-Hsin Lee, Carole-Jean Wu, Benjamin Lee.

[ACL-2024] Layer Skip: Enabling Early Exit Inference and Self Speculative Decoding

Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu.

[ICML-2024] CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu.

[ISPASS-2024] Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu.

[ISCA-2024] MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu.

IEEE Micro Top Picks Honorable Mention

[MLSys-2024] HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning

Gyudong Kim, Mehdi Ghasemi, Soroush Heidari, Seungryong Kim, Young Geun Kim, Sarma Vrudhula, Carole-Jean Wu.

[NeurIPS-2023] DataPerf: Benchmarks for Data-Centric AI Development

M. Mazumder, C. Banbury, X. Yao, B. Karlas, W. Rojas, S. Diamos, et al.

[USENIX-ATC 2023] Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training

Mark Zhao, Satadru Pan, Niket Agarwal, Zhaoduo Wen, David Xu, Anand Natarajan, Pavan Kumar, Shiva Shankar, Ritesh Tijoriwala, Karan Asher, Hao Wu, Aarti Basant, Daniel Ford, Delia David, Nezih Yigitbasi, Pratap Singh, Carole-Jean Wu, Christos Kozyrakis.

[MLSys-2023] RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure

Mark Zhao, Dhruv Choudhary, Devashish Tyagi, Ajay Somani, Max Kaplan, Sung-Han Lin, Sarunya Pumma, Jongsoo Park, Aarti Basant, Niket Agarwal, Carole-Jean Wu, Christos Kozyrakis.

[ASPLOS-2023] Carbon Explorer: A Holistic Approach for Designing Carbon Aware Datacenters

Bilge Acun, Benjamin C. Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu. [code]

IEEE Micro Top Picks Honorable Mention

[ASPLOS-2023] MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

Samuel Hsia, Udit Gupta, Bilge Acun, Newsha Ardalani, Pan Zhong, Gu-Yeon Wei, David Brooks, Carole-Jean Wu.

IEEE Micro Top Picks Honorable Mention

[NeurIPS-2022] Infinite Recommendation Networks: A Data-Centric Approach

Noveen Sachdeva, Mehak Preet Dhaliwal, Carole-Jean Wu, Julian McAuley. [code: Infinite AE; Data-Distill]

[RecSys-2022] Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Kiwan Maeng, Haiyu Lu, Luca Melis, John Nguyen, Mike Rabbat, Carole-Jean Wu. [code]

Best Paper Award Finalist

[ISCA-2022] ACT: Designing Sustainable Computer Systems with an Architectural Carbon Modeling Tool

Udit Gupta, Mariam Elgamal, Gage Hills, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu. [code]

IEEE Micro Top Picks

[ISCA-2022] Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

M. Zhao, N. Agarwal, A. Basant, B. Gedik, S. Pan, M. Ozdal, R. Komuravelli, J. Pan, T. Bao, H. Lu, S. Narayanan, J. Langman, K. Wilfong, H. Rastogi, C.-J. Wu, C. Kozyrakis, P. Pol.

[MLSys-2022] Sustainable AI: Environmental Implications, Challenges and Opportunities

C.-J. Wu, R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, F. A. Behram, J. Huang, C. Bai, M. Gschwind, A. Gupta, M. Ott, A. Melnikov, S. Candido, D. Brooks, G. Chauhan, B. Lee, H.-S. S. Lee, B. Akyildiz, M. Balandat, J. Spisak, R. Jain, M. Rabbat, K. Hazelwood.

[MLSys-2022] Papaya: Practical, Private, and Scalable Federated Learning

D. Huba, J. Nguyen, K. Malik, R. Zhu, M. Rabbat, A. Yousefpour, C.-J. Wu, G. Zhan, P. Ustinov, H. Srinivas, K. Wang, A. Shoumikhin, J. Min, M. Malek.

[ASPLOS-2022] RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation

Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, Carole-Jean Wu.

[WSDM-2022] On Sampling Collaborative Filtering Datasets

Noveen Sachdeva, Carole-Jean Wu, and Julian McAuley. [code]

[MICRO-2021] AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning

Young Geun Kim and Carole-Jean Wu.

[MICRO-2021] RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

U. Gupta, S. Hsia, J. Zhang, M. Wilkening, J. Pombra, H.-S. Lee, G. Wei, C.-J. Wu, and D. Brooks.

[MLSys-2021] TT-Rec: Tensor Train Compression for Deep Learning Recommendation Model Embeddings

Chunxing Yin, Bilge Acun, Xing Liu, and Carole-Jean Wu. [code]

MLSys Outstanding Paper Award

[MLSys-2021] CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

K. Maeng, S. Bharuka, I. Gao, M. Jeffrey, V. Saraph, B.-Y. Su, C. Trippel, J. Yang, M. Rabbat, B. Lucia, and C.-J. Wu.

[ASPLOS-2021] RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

M. Wilkening, U. Gupta, S. Hsia, C. Trippel, C.-J. Wu, D. Brooks, G.-Y. Wei.

[HPCA-2021] Chasing Carbon: The Elusive Environmental Footprint of Computing

U. Gupta, Y. Kim, S. Lee, J. Tse, H.-H. Lee, G. Wei, D. Brooks, and C.-J. Wu.

IEEE Micro Top Picks

[HPCA-2021] Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, and Kim Hazelwood.

[MICRO-2020] AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning

Young Geun Kim and Carole-Jean Wu.

[ISCA-2020] DeepRecSys: A System for Optimizing End-to-end At-scale Neural Recommendation Inference

U. Gupta, S. Hsia, V. Saraph, X. Wang, B. Reagen, G.-Y. Wei, H.-S. Lee, D. Brooks, and C.-J. Wu. [code]

IEEE Micro Top Picks Honorable Mention

[ISCA-2020] RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

L. Ke, U. Gupta, B. Cho, D. Brooks, V. Chandra, U. Diril, A. Firoozshahian, K. Hazelwood, B. Jia, H.-S. Lee, M. Li, B. Maher, D. Mudigere, M. Naumov, M. Schatz, M. Smelyanskiy, X. Wang, B. Reagen, C.-J. Wu, M. Hempstead, X. Zhang.

[ISCA-2020] MLPerf Inference Benchmark

V. Reddi, C. Cheng, D. Kanter, P. Mattson, G. Schmuelling, C.-J. Wu, B. Anderson, M. Breughe, M. Charlebois, W. Chou, R. Chukka, C. Coleman, S. Davis, P. Deng, G. Diamos, J. Duke, D. Fick, J. Gardner, I. Hubara, S. Idgunji, T. Jablin, J. Jiao, T. St. John, P. Kanwar, D. Lee, J. Liao, A. Lokhmotov, F. Massa, P. Meng, P. Micikevicius, C. Osborne, G. Pekhimenko, A. Rajan, D. Sequeira, A. Sirasao, F. Sun, H. Tang, M. Thomson, F. Wei, E. Wu, L. Xu, K. Yamada, B. Yu, G. Yuan, A. Zhong, P. Zhang, Y. Zhou. [code]

IEEE Micro Top Picks -- The Vision Behind MLPerf: Understanding AI Inference Performance

[MLSys-2020] MLPerf Training Benchmark

P. Mattson, C. Cheng, C. Coleman, G. Diamos, P. Micikevicius, D. Patterson, H. Tang, G.-Y. Wei, P. Ballis, V. Bittorf, D. Brooks, D. Chen, D. Dutta, U. Gupta, K. Hazelwood, A. Hock, X. Huang, B. Jia, D. Kang, N. Kumar, J. Liao, G. Ma, D. Narayanan, T. Oguntebi, G. Pekhimenko, L. Pentecost, V. Reddi, T. Robie, T. St. John, C.-J. Wu, L. Xu, C. Young, M. Zaharia. [code]

[HPCA-2020] The Architectural Implications of Facebook’s DNN-based Personalized Recommendation

U. Gupta, C.-J. Wu, X. Wang, M. Naumov, B. Reagen, D. Brooks, B. Cottel, K. Hazelwood, M. Hempstead, B. Jia, H.-H. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, X. Zhang.

[HPCA-2019] Machine Learning at Facebook: Understanding Inference at the Edge

Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang.

[HPCA-2019] Understanding the Future of Energy Efficiency in Multi-Module GPUs

Akhil Arunkumar, Evgeny Bolotin, David Nellans, and Carole-Jean Wu.

[HPCA-2018] LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs

Akhil Arunkumar, Shin-Ying Lee, Vignesh Soundararajan, and Carole-Jean Wu. [paper]

[ISCA-2017] MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. [paper]

[HPCA-2016] Improving Smartphone/Mobile User Experience by Balancing Performance and Energy with Probabilistic QoS Guarantee

Benjamin Gaudette, Carole-Jean Wu, and Sarma Vrudhula. [paper]

[ISCA-2015] CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration for GPGPU Workloads

Shin-Ying Lee, Akhil Arunkumar, and Carole-Jean Wu. [paper]

[PACT-2014] CAWS: Criticality-Aware Warp Scheduling for GPGPU Workloads

Shin-Ying Lee and Carole-Jean Wu. [paper]

[MICRO-2011] PACMan: Prefetch-Aware Cache Management for High Performance Caching

Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon Steely Jr., and Joel Emer. [paper]

[MICRO-2011] SHiP: Signature-Based Hit Predictor for High Performance Caching

Carole-Jean Wu, Aamer Jaleel, Margaret Martonosi, Simon Steely Jr., and Joel Emer. [paper]

Industry Initiatives and Open Source Software

MLCommons/MLPerf/Croissant

Fair and useful benchmarks for measuring and improving the accuracy, safety, speed, and efficiency of AI technologies.

CVPR-LPCV [Slide Deck][Talk]

Embedded Vision Summit [Slide Deck][Talk]

ACT: Architectural Carbon Modeling Tool

Designing low-carbon computers with an architectural carbon modeling tool (Tech @ Meta article)

Carbon Explorer: Designing Sustainable Datacenter Computing

CLEAR: Computing Landscapes for Environmental Accountability and Responsibility

PERSONAL: Personalized Recommendation Systems and Algorithms

DORA: Optimizing Smartphone Energy Efficiency and Web Browser Performance under Interference

MobileBench: Performance, Energy Characterizations and Architectural Implications of an Emerging Mobile Platform Benchmark Suite

Mentorship

Research Interns and Post-Doctoral Researchers at FAIR

Michael Kuchnik (2023 — 2024 post-doctoral researcher; now at Meta FAIR)
Yejin Lee (2023 — 2024 post-doctoral researcher; now at Meta Infrastructure)
Alicia Golden (2023; PhD researcher, Harvard University)
Mariam Elgamal (2022; PhD researcher, Harvard University)
Mark Zhao (2021 — 2022; PhD researcher, Stanford University)
Samuel Hsia (2021 — 2024; PhD researcher, Harvard University; now at Meta FAIR)
Geet Sethi (2021 PhD researcher, Stanford University; now at Meta Infrastructure)
Kiwan Maeng (2020 PhD researcher, CMU; 2021-22 post-doctoral researcher; now Assistant Professor at the Pennsylvania State University)
Chunxing Yin [with Bilge Acun] (2020 PhD researcher, Georgia Tech; now at Facebook/Meta)
Mike Lui (2019-20 PhD researcher, Drexel University; now at Facebook/Meta)
Emma Yu Wang [with Xiaodong Wang] (2019 PhD researcher, Harvard University; now at Google Research)
Udit Gupta (2018 — 2022 PhD researcher, Harvard University; 2022-23 post-doctoral researcher; now Assistant Professor at Cornell Tech.)

Undergraduate/MS/PhD Advisees and Post-Doctoral Researchers at ASU

Jhe-Yu Liou (PhD 2023; co-advised with Prof. Stephanie Forrest)

Thesis: Automatic Program Optimization by Semantic Relaxation for Parallel Processing Accelerators

Young-Geun Kim (Post-doctoral Researcher 2019 — 2020) [First employment: Soongsil University; now Assistant Professor at Korea University]
Akhil Arunkumar (PhD 2018) [First employment: Samsung Austin R&D Center; now at AMD]

Thesis: Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors

Viraj Wadhwa (High school intern from BASIS Chandler Primary, 2017-18. Now an undergraduate student at UT-Austin) [Improving Image Recognition with Tensor Flow API for Autonomous Driving]
Shin-Ying Lee (PhD 2017) [First employment: Samsung Austin R&D Center; AMD; now at Amazon]

Thesis: Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures

Outstanding Computer Engineering PhD Graduate Student Award

TJ Smith (Research Experience for Undergraduates (REU) from Princeton EE; 2017)
Katherine Hann (High school intern from Xavier College Preparatory High School, 2017. Now an undergraduate student at University of Pennsylvania) [Designing A Paired Robotic Car Indoor Navigation and Tracking System]
Rashmi Athavale (High school intern from Hamilton High School, 2017. Now an undergraduate student at Georgia Tech) [Designing A Paired Robotic Car Indoor Navigation and Tracking System]
Benjamin Gaudette (PhD 2017; co-advised with Prof. Sarma Vrudhula) [First employment: Benchmark Electronics; now at Intel]

Thesis: An Intelligent Framework for Energy-aware Mobile Computing Subject to Stochastic System Dynamics

Ying-Ju Yu (Post-doctoral Researcher, 2016-17) [First employment: Intel]

Kody Stribrny (BS 2017; co-advised with Prof. Sarma Vrudhula) [First employment: Amazon] [Honors Thesis: Mobile Waterway Monitor]
Davesh Shingari (MS 2016) [First employment: Marvell]

Thesis: Memory Interference Characterization and Mitigation for Heterogeneous Smartphones

Soochan Lee (PhD 2015; co-advised with Prof. Patrick E. Phelan) [First employment: LG Electronics]

Thesis: A Study of Latent Heat of Vaporization in Aqueous Nanofluids

Ryan Brazones (BS 2014) [First employment: Intel]
Dhinakaran Pandiyan (MS 2014) [First employment: Intel]

Thesis: Data Movement Energy Characterization of Emerging Smartphone Workloads for Mobile Platforms

Outstanding Computer Engineering MS Graduate Student Award

Amrit Panda (PhD 2014; co-advised with Prof. Karam S. Chatha) [First employment: Qualcomm Research; now at Microsoft]

Thesis: StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing

Page updated

Google Sites

Report abuse

Recent Publications

[HPCA-2025] Revisiting Reliability in Large-Scale Machine Learning Research Clusters

[HPCA-2025] CORDOBA: Carbon-Efficient Optimization Framework for Computer Systems

Honors and Awards

Professional Service

Finding a Research Topic with Soha Hassoun, 2023.

Strategies for Your Career with Amber Settle, 2020.

Selected Publications

[RecSys-2022] Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

[ISCA-2022] ACT: Designing Sustainable Computer Systems with an Architectural Carbon Modeling Tool

[ISCA-2022] Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

[MLSys-2022] Sustainable AI: Environmental Implications, Challenges and Opportunities

[MLSys-2022] Papaya: Practical, Private, and Scalable Federated Learning

[ASPLOS-2022] RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation

[WSDM-2022] On Sampling Collaborative Filtering Datasets

[MICRO-2021] AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning

[MICRO-2021] RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

[MLSys-2021] TT-Rec: Tensor Train Compression for Deep Learning Recommendation Model Embeddings

[MLSys-2021] CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

K. Maeng, S. Bharuka, I. Gao, M. Jeffrey, V. Saraph, B.-Y. Su, C. Trippel, J. Yang, M. Rabbat, B. Lucia, and C.-J. Wu.

[ASPLOS-2021] RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

[HPCA-2021] Chasing Carbon: The Elusive Environmental Footprint of Computing

[HPCA-2021] Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

[MICRO-2020] AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning

[ISCA-2020] DeepRecSys: A System for Optimizing End-to-end At-scale Neural Recommendation Inference

[ISCA-2020] RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

[ISCA-2020] MLPerf Inference Benchmark

[MLSys-2020] MLPerf Training Benchmark

[HPCA-2020] The Architectural Implications of Facebook’s DNN-based Personalized Recommendation

[HPCA-2019] Machine Learning at Facebook: Understanding Inference at the Edge

[HPCA-2019] Understanding the Future of Energy Efficiency in Multi-Module GPUs

[HPCA-2018] LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs

[ISCA-2017] MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

[HPCA-2016] Improving Smartphone/Mobile User Experience by Balancing Performance and Energy with Probabilistic QoS Guarantee

[ISCA-2015] CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration for GPGPU Workloads

[PACT-2014] CAWS: Criticality-Aware Warp Scheduling for GPGPU Workloads

[MICRO-2011] PACMan: Prefetch-Aware Cache Management for High Performance Caching

[MICRO-2011] SHiP: Signature-Based Hit Predictor for High Performance Caching

Industry Initiatives and Open Source Software

MLCommons/MLPerf/Croissant

Fair and useful benchmarks for measuring and improving the accuracy, safety, speed, and efficiency of AI technologies.

ACT: Architectural Carbon Modeling Tool

Carbon Explorer: Designing Sustainable Datacenter Computing

CLEAR: Computing Landscapes for Environmental Accountability and Responsibility

PERSONAL: Personalized Recommendation Systems and Algorithms

DORA: Optimizing Smartphone Energy Efficiency and Web Browser Performance under Interference

MobileBench: Performance, Energy Characterizations and Architectural Implications of an Emerging Mobile Platform Benchmark Suite

Mentorship

Research Interns and Post-Doctoral Researchers at FAIR

Undergraduate/MS/PhD Advisees and Post-Doctoral Researchers at ASU