This is the list of papers you need to read for each course topic. You will need to submit a weekly paper review for a subset of the papers, listed here.
Intro:
[1] A. J. Smith, "The Task of the Referee," IEEE Computer, 1990.
[2] M. D. Hill, S. Adve, L. Ceze, M. J. Irwin, D. Kaeli, M. Martonosi, J. Torrellas, T. F. Wenisch, D. Wood, K. Yelick, "21st Century Computer Architecture," CCC Whitepaper, 2012.
Multicores and Multiprogramming:
[3] E. Fatehi, P. V. Gratz, "ILP and TLP in Shared Memory Applications: A Limit Study," PACT, 2014.
[4] C. Bienia, S. Kumar, J. P. Singh, K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," PACT, 2008.
Synchronization:
[5] M. L. Scott, "Shared-Memory Synchronization," Synthesis Lectures on Computer Architecture, Chapters 1, 4.0-4.3.3 and 5.0-5.2.5.
[6] R. Rajwar, J. R. Goodman, "Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution," MICRO, 2001.
Cache and Memory Hierarchy:
[7] D. J. Sorin, M. D. Hill, D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Synthesis Lectures on Computer Architecture, Chapter 2.
Coherence:
[7] D. J. Sorin, M. D. Hill, D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Synthesis Lectures on Computer Architecture, Chapters 6-8.
[8] G. Zhang, W. Horn, D. Sanchez, "Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems," MICRO, 2015.
Consistency:
[7] D. J. Sorin, M. D. Hill, D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Synthesis Lectures on Computer Architecture, Chapters 3-5.
[9] M. D. Hill, "Multiprocessors Should Support Simple Memory Consistency Models," IEEE Computer, 1998.
Transactional Memory:
[10] T. Harris, J. Larus, R. Rajwar, "Transactional Memory, 2nd Edition," Synthesis Lectures on Computer Architecture, Chapters 1 and 5.
[11] K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, D. A. Wood, "LogTM: Log-Based Transactional Memory," HPCA, 2006.
Interconnects:
[12] N. Enright Jerger, L.-S. Peh, "On-Chip Networks," Synthesis Lectures on Computer Architecture, Chapters 3-6.
[13] T. Moscibroda, O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks," ISCA, 2009.
[14] J. Kim, J. Balfour, W. Dally, "Flattened Butterfly Topology for On-Chip Networks," MICRO, 2007.
GPUs:
[15] H. Kim, R. Vuduc, S. Baghsorkhi, J. Choi, W.-M. Hwu, "Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)," Synthesis Lectures on Computer Architecture, Chapter 1.
[16] D. Wong, N. S. Kim, M. Annavaram, "Approximating Warps with Intra-Warp Operand Value Similarity," HPCA, 2016.
Accelerators:
[17] M. S. B. Altaf, D. A. Wood, "LogCA: A High-Level Performance Model for Hardware Accelerators," ISCA, 2017.
Unconventional Parallelism:
[18] M. C. Jeffrey, S. Subramanian, C. Yan, J. Emer, D. Sanchez, "A Scalable Architecture for Ordered Parallelism," MICRO, 2015.
[19] S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, "Compute Caches," HPCA, 2017.
[20] J. San Miguel, N. Enright Jerger, "The Anytime Automaton," ISCA, 2016.