[Preprint 2021] FG-Attn: Leveraging Fine-Grained Sparsity In Diffusion Transformers
Sankeerth Durvasula, Kavya Sreedhar, Zain Moustafa, Suraj Kothawade, Ashish Gondimalla, Suvinay Subramanian, Narges Shahidi, Nandita Vijaykumar
arXiv e-Print archive, September 2025
[ISCA MLArchSys 2025] Leveraging LLMs to Improve Hardware-Software Co-Design Workflow Productivity and Accessibility
Kavya Sreedhar, Josh Ogbonda, Pengqi Yin, Narges Shahidi, Kanthi Nagaraj, Zhijie Deng, Rami Cohen, Ton Kalker, Sameer Kumar, Amir Yazdanbakhsh, Suvinay Subramanian
ML for Computer Architecture and Systems (MLArchSys) Workshop co-located with the International Symposium on Computer Architecture (ISCA), June 2025
[NeurIPS MLNCP 2024] Enabling On-Device Large Language Models with 3D-Stacked Memory
Lita Yang, Kavya Sreedhar, Huichu Liu, Edith Beigne
Machine Learning with New Compute Paradigms (MLNCP) Workshop co-located with the Conference on Neural Information Processing Systems (NeurIPS), December 2024
[ESSERC 2024] A 3.25GHz Large-Integer Extended GCD Accelerator in 12nm
Kavya Sreedhar, Gedeon Nyengele, Mark Horowitz, Christopher Torng
European Solid-State Electronics Research Conference (ESSERC), September 2024
[ISCA OSCAR 2024] AHA: An Open-Source Framework for Co-design of Programmable Accelerators and Compilers
Kalhan Koul, Jackson Melchert, Keyi Zhang, Taeyoung Kong, Maxwell Strange, Olivia Hsu, Qiaoyi Liu, Jeff Setter, Ross Daly, Caleb Donovick, Alex Carsello, Leonard Truong, Po-Han Chen, Yuchen Mei, Zhouhua Xie, Kathleen Feng, Gedeon Nyengele, Dillon Huff, Kavya Sreedhar, Huifeng Ke, Ankita Nayak, Rajsekhar Setaluri, Stephen Richardson, Christopher Torng, Pat Hanrahan, Clark Barrett, Mark Horowitz, Fredrik Kjolstad, and Priyanka Raina
Open-Source Computer Architecture Research (OSCAR) Workshop co-located with the International Symposium on Computer Architecture (ISCA), June 2024
[ISPASS 2024] Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, and Mark Horowitz
International Symposium on Performance Analysis of Systems and Software (ISPASS), May 2024
[ASPLOS LATTE 2024] Lake: An Agile Framework for Designing and Automatically Configuring Physical Unified Buffers
Maxwell Strange, Kavya Sreedhar, and Mark Horowitz
Languages, Tools, and Techniques for Accelerator Design (LATTE) Workshop co-located with the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024
[ISSCC SRP 2024] A 3GHz Extended GCD Accelerator in 12nm
Kavya Sreedhar, Mark Horowitz, and Christopher Torng
International Solid-State Circuits Conference (ISSCC) Student Research Preview (SRP), February 2024
Poster | Lightning Talk Slides
[JSSC 2023] Amber: A 16nm System-on-Chip with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra
Kathleen Feng, Taeyoung Kong, Kalhan Koul, Jackson Melchert, Alex Carsello, Qiaoyi Liu, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D'Agostino, Pranil Joshi, Stephen Richardson, Christopher Torng, Mark Horowitz, Priyanka Raina
Journal of Solid State Circuits (JSSC), September 2023
[ISCA OSCAR 2023] A Fast Open-Source Extended GCD Accelerator
Kavya Sreedhar, Mark Horowitz, and Christopher Torng
Open-Source Computer Architecture Research (OSCAR) Workshop co-located with the International Symposium on Computer Architecture (ISCA), June 2023
[CHES 2022] [TCHES 2022] A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion
Kavya Sreedhar, Mark Horowitz, and Christopher Torng
Conference on Cryptographic Hardware and Embedded Systems (CHES), September 2022
IACR Transactions on Cryptographic Hardware and Embedded Systems (TCHES), September 2022
Paper | Slides | Video | Code Artifact
[HotChips 2022] Amber: Coarse-Grained Reconfigurable Array-Based SoC for Dense Linear Algebra Acceleration
Kathleen Feng, Alex Carsello, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, Jackson Melchert, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, James Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D’Agostino, Pranil Joshi, Stephen Richardson, Rick Bahr, Christopher Torng, Mark Horowitz, Priyanka Raina
Hot Chips: A Symposium on High Performance Chips (Hot Chips), August 2022
[VLSI 2022] Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra
Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, Jackson Melchert, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, James Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D’Agostino, Pranil Joshi, Stephen Richardson, Rick Bahr, Christopher Torng, Mark Horowitz, Priyanka Raina
IEEE VLSI Symposium on Technology and Circuits (VLSI), June 2022
Best Demo Paper Award
[IAP 2022] A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion
Kavya Sreedhar, Mark Horowitz, Christopher Torng
Industry-Academia Partnership (IAP) Berkeley/Stanford/UC Santa Cruz Cloud Workshop, May 2022
Best Poster Award
[TECS 2022] AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers
Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Meyers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, Priyanka Raina
ACM Transactions on Embedded Computing Systems (TECS), April 2022
[GOMACTech 2022] An Agile Approach to the Design of Hardware Accelerators and Adaptable Compilers
Ross Daly, Jackson Melchert, Kalhan Koul, Raj Setaluri, Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Stephen Richardson, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, Keyi Zhang
Government Microcircuit and Applications Conference (GOMACTech), March 2022
[FMCAD 2021] Automating System Configuration
Nestan Tsiskaridze, Maxwell Strange, Makai Mann, Kavya Sreedhar, Qiaoyi Liu, Mark Horowitz, and Clark Barrett
Formal Methods in Computer-Aided Design (FMCAD), October 2021
[Preprint 2021] Compiling Halide Programs to Push-Memory Accelerators
Qiaoyi Liu, Dillon Huff, Jeff Setter, Max Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, Fredrik Kjolstad
arXiv e-Print archive, May 2021
[DAC 2020] Creating an Agile Hardware Design Flow
Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, Keyi Zhang
ACM/IEEE Design Automation Conference (DAC), July 2020
Invited Paper
A Fast Large-Integer XGCD Accelerator
Kavya Sreedhar
Stanford University PhD Thesis, March 2025
Designing a fast large-integer XGCD algorithm and accelerator for cryptography applications
Kavya Sreedhar
Stanford University PhD Defense, August 2024
Next Generation Fast Shutter System for LIGO
Kavya Sreedhar
Caltech Senior Thesis, June 2019
Techniques for Balancing Dynamic Inference by Machine Learning Models
Jason Clemons, Kavya Sreedhar
Patent pending: filed January 2023
Augmenting and Dynamically Configuring a Neural Network Model for Real-Time Systems
Jason Clemons, Kavya Sreedhar, Stephen W. Keckler
Patent pending: filed April 2022
High-reliability ultra-fast mechanical shutter
Richard Abbott, Peter Fritschel, Kavya Sreedhar
US Patent No. 11467395
Granted October 2022
TPU Hardware/Software Co-Design at Google
Kavya Sreedhar
European Solid-State Electronics Research Conference (ESSERC), September 2025
Physical Design and LLM-integrated Co-design
Kavya Sreedhar
Quantum+Chips @ University of Minnesota, August 2025
Attention isn't all you need?
Kavya Sreedhar
Google Research Tech Talk, June 2024
Analyzing machine learning models for computer vision tasks for use in real-time systems
Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, and Mark Horowitz
Quad Fellowship Spring Symposium, May 2024
A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion
Kavya Sreedhar, Mark Horowitz, and Christopher Torng
Silicon Salon 3: Investigating New Silicon-based Cryptographic Functionality, January 2023
A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion
Kavya Sreedhar, Mark Horowitz, and Christopher Torng
Agile Hardware (AHA) Project Monthly Industrial Affiliates Meeting, May 2022
A Fast Large-Integer Extended GCD Algorithm and Accelerator
Kavya Sreedhar, Mark Horowitz, Christopher Torng
SystemX Fall Conference, November 2023
A Fast Large-Integer Extended GCD Algorithm and Accelerator
Kavya Sreedhar, Mark Horowitz, Christopher Torng
Agile Hardware (AHA) Retreat, August 2023
Accelerating Vision Transformer Applications
Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, and Mark Horowitz
Stanford Data Science Conference, May 2023
Accelerating Dynamic Real-Time Inference for Vision Transformers
Kavya Sreedhar, Jason Clemons, Mark Horowitz, Stephen W. Keckler
Agile Hardware (AHA) Retreat, August 2022
Fast Extended GCD for Large Integers for Verifiable Delay Functions
Kavya Sreedhar, Mark Horowitz, Christopher Torng
Agile Hardware (AHA) Retreat, August 2021