CS293E Shared Google Drive is where any slides used and homework given can be found. Access was granted to your @ucsb.edu account. If you are attending class and are not enrolled, then please ask for access after class.
Languages, Tools, and Techniques for Accelerator Design
An Academic’s Attempt to Clear the Fog of the Machine Learning Accelerator War
Cheat Sheet for all things about Machine Learning (as it relates to accelerators)
PyRTL provides a collection of classes for Pythonic register-transfer level design, simulation, tracing, and testing suitable for teaching and research. Simplicity, usability, clarity, and extensibility rather than performance or optimization is the overarching goal. Features include:
Amaranth (previously nMigen) is another python hardware project providing an open-source toolchain that has a lot of wonderful stuff for working with FPGAs in particular. It has support for evaluation board definitions, a System-on-Chip toolkit, and more. It would be neat to see the power of these tools combined in some way.
Chisel is an elaborate-through-execution hardware design language. With support for signed types, named hierarchies of wires useful for hardware protocols, and a neat control structure call "when" that inspired our conditional contexts, Chisel is a powerful tool used in some great research projects including RISC-V.
SpinalHDL is a different approach to HDL in Scala and is very much aligned with the way PyRTL is built (invented independently it is neat to see the convergent evolution which, I think, points to something deeper about hardware design). It has a lot of support and really well thought out structures.
MyHDL is another neat Python hardware project built around generators and decorators. The semantics of this embedded language are close to Verilog and unlike PyRTL, MyHDL allows asynchronous logic and higher level modeling.
PyMTL3 (a.k.a. Mamba) is an "open-source Python-based hardware generation, simulation, and verification framework with multi-level hardware modeling support". One of the neat things about this project is that they are trying to allow simulation, modeling, and verification at multiple different levels of the design from the functional level, the cycle-close level, and down to the register-transfer level (where PyRTL really is built to play).
CλaSH is a hardware description embedded DSL in Haskell.
Yosys is an open source tool for Verilog RTL synthesis. It supports a huge subset of the Verilog-2005 semantics and provides a basic set of synthesis algorithms.
CoSA is a scheduling tool that works via constrained optimization designed for spatial accelerators.
Aladdin is a pre-RTL power and performance estimation tool for fixed-function accelerators.
MachSuite is a benchmark suite intended for use with accelerator-design research evaluation.
Pint A python library for adding units to variables (helpful when doing computation with physical unit like Watts or micrometers)
TLA+ Tools, Video Tutorials, and papers
MiniSat SAT Solver
CVC4 SMT Solver and Theorem Prover
Z3 SMT Solver and Theorem Prover
Z3 - guide tutorial
Alloy Language and tool for Relational Models
Cryptol Domain specific language for specifying crypto algorithms
Coq Formal proof management system
Kami Coq framework for bluespec-style hardware
Lean Theorem Prover
Dafny Verification-aware programming language
Whiley Language with extended static checking
Design Methods (e.g. HW/SW co-design and new languages)
Compositional Dataflow Circuits Stephen Edwards, Richard Townsend, Martha Barker, and Martha A. Kim.
Green Droid Using Dark Silicon to Improve Smartphone Processors
Vectorization for Digital Signal Processors via Equality Saturation. Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson.
A Compiler Infrastructure for Accelerator Generators. Samuel Thomas, Rachit Nigam, Zhijing Li, and Adrian Sampson.
PDL: A High-Level Hardware Design Language for Pipelined Processors Drew Zagieboylo, Charles Sherk, G. Edward Suh, and Andrew C. Myers.
Composable Building Blocks to Open up Processor Design Sizhuo Zhang, Andrew Wright, Thomas Bourgeat, and Arvind
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, and Yakun Sophia Shao
Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification Bo-Yuan Huang, Hongce Zhang, Pramod Subramanyan, Yakir Vizel, Aarti Gupta, and Sharad Malik
Commercial Systems
Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E.R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, and Brian Kurtz.
RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei Wang, Sanchari Sen, Jintao Zhang, Ankur Agrawal, Monodeep Kar, Shubham Jain, Alberto Mannari, Hoang Tran, Yulong Li, Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, Marcel Schaal, Mauricio Serrano, Jungwook Choi, Xiao Sun, Naigang Wang, Chia-Yu Chen, Allison Allain, James Bonano, Nianzheng Cao, Robert Casatuta, Matthew Cohen, Bruce Fleischer, Michael Guillorn, Howard Haynie, Jinwook Jung, Mingu Kang, Kyu-hyoun Kim, Siyu Koswatta, Saekyu Lee, Martin Lutz, Silvia Mueller, Jinwook Oh, Ashish Ranjan, Zhibin Ren, Scot Rider, Kerstin Schelm, Michael Scheuermann, Joel Silberman, Jie Yang, Vidhi Zalani, Xin Zhang, Ching Zhou, Matt Ziegler, Vinay Shah, Moriyoshi Ohara, Pong-Fei Lu, Brian Curran, Sunil Shukla, Leland Chang, Kailash Gopalakrishnan
Ten Lessons From Three Generations Shaped Google's TPUv4i Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Nishant Patil, Sushma Prasad, Clifford Young, Zongwei Zhou, David Patterson
Machine Learning and Tensors
Plasticine: A Reconfigurable Architecture For Parallel Patterns Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, Kunle Olukotun
Tangram: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis
A Survey of Machine Learning for Computer Architecture and Systems Nan Wu, Yuan Xie
SIMD2 : A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM Yunan Zhang, Po-An Tsai, Hung-Wei Tseng
Management and Security of Accelerators (e.g. interfaces, distributed systems)
Crossing Guard: Mediating Host-Accelerator Coherence Interactions Lena Olson, Mark Hill, and David Wood
System Design using Kahn Process Networks: The Compaan/Laura Approach Todor Stefanov, Claudiu Zissulescu, Alexandru Turjan, Bart Kienhuis, and Ed Deprettere
Reticle: A Virtual Machine for Programming Modern FPGAs. Luis Vega, Joseph McMahan, Adrian Sampson, Dan Grossman, and Luis Ceze. In PLDI 2021.
MgX: Near-Zero Overhead Memory Protection with an Application to Secure DNN Acceleration Weizhe Hua, Muhammad Umar, Zhiru Zhang, G. Edward Suh
BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption Sangpyo Kim, Jongmin Kim, Michael Jaemin Kim, Wonkyung Jung, Minsoo Rhu, John Kim, Jung Ho Ahn
Workloads (e.g. Applications, IOT, AR/VR, graph, etc.)
Understanding sources of inefficiency in general-purpose chips Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz
EyeCoD: Eye Tracking System Acceleration via FlatCam-based Algorithm & Accelerator Co-Design Haoran You, Cheng Wan, Yang Zhao, Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Shang Wu, Shunyao Zhang, Yongan Zhang, Chaojian Li, Vivek Boominathan, Ashok Veeraraghavan, Ziyun Li, Yingyan Lin
NDMiner: Accelerating Graph Pattern Mining Using Near Data Processing Nishil Talati, Haojie Ye, Yichen Yang, Leul Wuletaw Belayneh, Kuan-Yu Chen, David Blaauw, Trevor Mudge, Ronald Dreslinski
PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture Ye Zhang, Shuo Wang, Xian Zhang, Jiangbin Dong. Xingzhong Mao, Fan Long, Cong Wang, Dong Zhou, Mingyu Gao, and Guangyu Sun
PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators Vidushi Dadu, Sihao Liu, Tony Nowatzki
A RISC-V In-Network Accelerator for Flexible High-Performance Low-Power Packet Processing Salvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, and Torsten Hoefler
Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild Parthasarathy Ranganathan Daniel Stodolsky, Jeff Calow, Jeremy Dorfman, Marisabel Guevara, Clinton Wills, Smullen IV Aki Kuusela, Raghu Balasubramanian, Sandeep Bhatia Prakash Chauhan Anna Cheung In Suk Chong Niranjani Dasharathi Jia Feng Brian Fosco Samuel Foss Ben Gelb Google Inc. USA Sarah J. Gwin Yoshiaki Hase Da-ke He C. Richard Ho Roy W. Huffman Jr. Elisha Indupalli Indira Jayaram Poonacha Kongetira Cho Mon Kyaw Aaron Laursen Yuan Li Fong Lou Kyle A. Lucke JP Maaninen Ramon Macias Maire Mahony David Alexander Munday Srikanth Muroor vcu@google.com Google Inc. USA Narayana Penukonda Eric Perkins-Argueta Devin Persaud Alex Ramirez Ville-Mikko Rautio Yolanda Ripley Amir Salek Sathish Sekar Sergey N. Sokolov Rob Springer Don Stark Mercedes Tan Mark S. Wachsler Andrew C. Walton David A. Wickeraad Alvin Wijaya. and Hon Kwan Wu
Rhythmic Pixel Regions: Multi-resolution Visual Sensing System towards High-Precision Visual Computing at Low Power Venkatesh Kodukula, Alexander Shearer, Van Nguyen, Srinivas Lingutla, Yifei Liu, Robert LiKamWa.
Q-VR: System-Level Design for Future Mobile Collaborative Virtual Reality Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael Taylor, Shuaiwen Leon Song