Paper Readings
This is the list of papers you need to read for each course topic. You will need to submit a weekly paper review for a subset of the papers, listed here.
Intro:
[1] A. J. Smith, "The Task of the Referee," IEEE Computer, 1990.
[2] M. D. Hill, S. Adve, L. Ceze, M. J. Irwin, D. Kaeli, M. Martonosi, J. Torrellas, T. F. Wenisch, D. Wood and K. Yelick, "21st Century Computer Architecture," CCC Whitepaper, 2012.
Multicores and Multiprogramming:
[3] E. Fatehi and P. V. Gratz, "ILP and TLP in Shared Memory Applications: A Limit Study," PACT, 2014.
[4] C. Bienia, S. Kumar, J. P. Singh and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," PACT, 2008.
[5] K. Kennedy and K. S. McKinley, "Optimizing for Parallelism and Data Locality," ICS, 1992.
Synchronization:
[6] M. L. Scott, "Shared-Memory Synchronization," Synthesis Lectures on Computer Architecture, Chapters 1, 4.0-4.3.3 and 5.0-5.2.5.
[7] R. Rajwar and J. R. Goodman, "Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution," MICRO, 2001.
Cache and Memory Hierarchy:
[8] D. J. Sorin, M. D. Hill and D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Synthesis Lectures on Computer Architecture, Chapter 2.
[9] J. Kim, M. Sullivan, E. Choukse and M. Erez, "Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures," ISCA, 2016.
[10] J. San Miguel, J. Albericio, N. Enright Jerger and A. Jaleel, "The Bunker Cache for Spatio-Value Approximation," MICRO, 2016.
[11] M. A. Ogleari, E. L. Miller and J. Zhao, "Steal but No Force: Efficient Hardware Undo+Redo Logging for Persistent Memory Systems," HPCA, 2018.
Coherence:
[8] D. J. Sorin, M. D. Hill and D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Synthesis Lectures on Computer Architecture, Chapters 6-8.
[12] P. V. Rengasamy, A. Sivasubramaniam, M. T. Kandemir and C. R. Das, "Exploiting Staleness for Approximating Loads on CMPs," PACT, 2015.
[13] G. Zhang, W. Horn and D. Sanchez, "Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems," MICRO, 2015.
Transactional Memory:
[14] T. Harris, J. Larus and R. Rajwar, "Transactional Memory, 2nd Edition," Synthesis Lectures on Computer Architecture, Chapters 1 and 5.
[15] K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill and D. A. Wood, "LogTM: Log-Based Transactional Memory," HPCA, 2006.
Consistency:
[8] D. J. Sorin, M. D. Hill and D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Synthesis Lectures on Computer Architecture, Chapters 3-5.
[16] S. V. Adve and K. Gharachorloo, "Shared Memory Consistency Models: A Tutorial," IEEE Computer, 1996.
[17] M. D. Hill, "Multiprocessors Should Support Simple Memory Consistency Models," IEEE Computer, 1998.
[18] B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter and C.-T. Chou, "DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism," PACT, 2011.
Interconnects:
[19] N. Enright Jerger, T. Krishna and L.-S. Peh, "On-Chip Networks, Second Edition," Synthesis Lectures on Computer Architecture, Chapters 3-6.
[20] T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks," ISCA, 2009.
[21] J. Kim, J. Balfour and W. Dally, "Flattened Butterfly Topology for On-Chip Networks," MICRO, 2007.
[22] Z. Li, J. San Miguel and N. Enright Jerger, "The Runahead Network-On-Chip," HPCA, 2016.
[23] M. Parasar, H. Farrokhbakht, N. Enright Jerger, P. Gratz, T. Krishna and J. San Miguel, "DRAIN: Deadlock Removal for Arbitrary Irregular Networks," HPCA, 2020.
GPUs:
[24] H. Kim, R. Vuduc, S. Baghsorkhi, J. Choi and W.-M. Hwu, "Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)," Synthesis Lectures on Computer Architecture, Chapter 1.
[25] D. Wong, N. S. Kim and M. Annavaram, "Approximating Warps with Intra-Warp Operand Value Similarity," HPCA, 2016.
Accelerators:
[26] M. S. B. Altaf and D. A. Wood, "LogCA: A High-Level Performance Model for Hardware Accelerators," ISCA, 2017.
[27] T. Nowatzki, V. Gangadhan, K. Sankaralingam and G. Wright, "Pushing the Limits of Accelerator Efficiency while Retaining Programmability," HPCA, 2016.
Unconventional Parallelism:
[28] M. C. Jeffrey, S. Subramanian, C. Yan, J. Emer and D. Sanchez, "A Scalable Architecture for Ordered Parallelism," MICRO, 2015.
[29] S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw and R. Das, "Compute Caches," HPCA, 2017.
[30] J. San Miguel and N. Enright Jerger, "The Anytime Automaton," ISCA, 2016.
[31] A. Madhavan, T. Sherwood and D. Strukov, "Race Logic: A Hardware Acceleration for Dynamic Programming Algorithms," ISCA, 2014.
[32] D. Wu, J. Li, R. Yin, H. Hsiao, Y. Kim and J. San Miguel, "uGEMM: Unary Computing Architecture for GEMM Applications," ISCA, 2020.