Power: A First-Class Architectural Design Constraint, Trevor Mudge
In-Datacenter Performance Analysis of a Tensor Processing Unit Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudeva, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon
Read through the design portions of PyRTL Documentation
Download, run, and know how to modify Examples 0 through 3 from the PyRTL
PyRTL design and prototyping
Dataflow Process Networks Edward Lee and Thomas Parks
PICO: automatically designing custom computers V. Kathail, S. Aditya, R. Schreiber, B. Ramakrishna Rau, D.C. Cronquist, M. Sivaraman
The Semantics of a Simple Language for Parallel Programming Gilles Kahn
The Warp Computer: Architecture, Implementation, and Performance Marco Annaratone, Emmanuel Arnould, Thomas Gross, H. T. Kung, Monica Lam, Onat Menzilcioglu, and Jon A. Webb.
Q100: The Architecture and Design of a Database Processing Unit Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, Kenneth A. Ross
LogCA: A Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf and David A. Wood
Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks
Amdahl’s Law in the Multicore Era Mark Hill and Michael Marty
Gables: A Roofline Model for Mobile SoCs Mark Hill and Vijay Janapa Reddi
Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale Akshitha Sriraman and Abhishek Dhanotia
Continued discussion of Accelerometer
Parallel Programming for FPGAs Ryan Kastner, Janarbek Matai, and Stephen Neuendorffer (Chapter 1 pg 11-30)
Fun with Semirings: A functional pearl on the abuse of linear algebra Stephen Dolan
Design of the GraphBLAS API for C Aydın Buluc, Tim Mattson, Scott McMillan, Jose Moreira, and Carl Yang
Genesis: A Hardware Acceleration Framework for Genomic Data Analysis Tae Jun Ham, David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H. Oh, Krste Asanovic, Jae W. Lee, Lisa Wu Willis
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. Andrew Putnam, Adrian Caulfield, Eric Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, Eric Peterson, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger, Jim Larus, Gopi Prashanth Gopal, and Simon Pope
Genesis: A Hardware Acceleration Framework for Genomic Data Analysis (continued)
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services (continued)
Functional Collection Programming with Semi-ring Dictionaries
Vectorization for Digital Signal Processors via Equality Saturation Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, Adrian Sampson
A Compiler Infrastructure for Accelerator Generators Rachit Nigam, Samuel Thomas, Zhijing Li, Adrian Sampson
A Compiler Infrastructure for Accelerator Generators Jonathan Balkind, Katie Lim, Michael Schaffner, Fei Gao, Grigory Chirkov, Ang Li, Alexey Lavrov, Tri M. Nguyen, Yaosheng Fu, Florian Zaruba, Kunal Gulati, Luca Benini, and David Wentzlaff
Crossing Guard: Mediating Host-Accelerator Coherence Interactions Lena E. Olson, Mark D. Hill, and David A. Wood
Ten Lessons From Three Generations Shaped Google's TPUv4i Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Nishant Patil, Sushma Prasad, Clifford Young, Zongwei Zhou, David Patterson
A Survey of Machine Learning for Computer Architecture and Systems Nan Wu, Yuan Xie (for reference)
Plasticine: A Reconfigurable Architecture For Parallel Patterns Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, Kunle Olukotun
Holiday
Final Project Presentations and Discussion