Paper Readings

This is the list of papers you need to read for each course topic. You will need to submit a weekly paper review for a subset of the papers, listed here.

Intro:

[1] A. J. Smith, "The Task of the Referee," IEEE Computer, 1990.

[2] G. E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, 1965.

[3] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill and D. A. Wood, "The gem5 Simulator," ACM SIGARCH Computer Architecture News, 2011.

[4] T. Nowatzki, J. Menon, C.-H. Ko and K. Sankaralingam, "Architectural Simulators Considered Harmful," IEEE Micro, 2015.

Instruction Set Architecture:

[5] W. A. Wulf, "Compilers and Computer Architecture," IEEE Computer, 1981.

[6] E. Blem, J. Menon and K. Sankaralingam, "Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures," HPCA, 2013.

Pipelining:

[7] A. Gonzalez, F. Latorre and G. Magklis, "Processor Microarchitecture: An Implementation Perspective," Synthesis Lectures on Computer Architecture, 2010, Chapter 1.

[8] V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. N. Strenski and P. G. Emma, "Optimizing Pipelines for Power and Performance," MICRO, 2002.

Instruction Flow:

[9] A. Gonzalez, F. Latorre and G. Magklis, "Processor Microarchitecture: An Implementation Perspective," Synthesis Lectures on Computer Architecture, 2010, Chapters 3-4.

[10] J. E. Smith and A. R. Pleszkun, "Implementing Precise Interrupts in Pipelined Processors," IEEE Transactions on Computers, 1988.

[11] R. Sheikh, J. Tuck and E. Rotenberg, "Control-Flow Decoupling," MICRO, 2012.

Branch Prediction:

[12] T.-Y. Yeh and Y. Patt, "Two-level Adaptive Training Branch Prediction," MICRO, 1991.

[13] D. A. Jimenez and C. Lin, "Dynamic Branch Prediction with Perceptrons," HPCA, 2001.

[14] J. Albericio, J. San Miguel, N. Enright Jerger and A. Moshovos, "Wormhole: Wisely Predicting Multidimensional Branches," MICRO, 2014.

Register Data Flow:

[15] A. Gonzalez, F. Latorre and G. Magklis, "Processor Microarchitecture: An Implementation Perspective," Synthesis Lectures on Computer Architecture, 2010, Chapters 5, 6.1-6.3, 7, 8.

[16] G. S. Sohi and S. Vajapeyam. "Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors," ISCA, 1987.

Memory Data Flow:

[17] A. Gonzalez, F. Latorre and G. Magklis, "Processor Microarchitecture: An Implementation Perspective," Synthesis Lectures on Computer Architecture, 2010, Chapters 2, 6.4-6.5.

[18] A. Moshovos, S. E. Breach, T. N. Vijaykumar and G. S. Sohi, "Dynamic Speculation and Synchronization of Data Dependences," ISCA, 1997.

[19] A. Sembrant, T. Carlson, E. Hagersten, D. Black-Shaffer, A. Perais, A. Seznec and P. Michaud, "Long Term Parking (LTP): Criticality-Aware Resource Allocation in OOO Processors," MICRO, 2015.

Caches and Memory:

[20] B. Jacob, "The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It," Synthesis Lectures on Computer Architecture, 2009, Chapters 1-3.

[21] D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.

Prefetching:

[22] B. Falsafi and T. F. Wenisch, "A Primer on Hardware Prefetching," Synthesis Lectures on Computer Architecture, 2014, Chapters 2-3.

[23] S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi and A. Moshovos, "Spatial Memory Streaming," ISCA, 2006.

Compression:

[24] S. Sardashti, A. Arelakis, P. Stenstrom and D. A. Wood, "A Primer on Compression in the Memory Hierarchy," Synthesis Lectures on Computer Architecture, 2015, Chapters 2-4.

[25] G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch and T. C. Mowry, "Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches," PACT, 2012.

Value Prediction:

[26] M. H. Lipasti, C. B. Wilkerson and J. P. Shen, "Value Locality and Load Value Prediction," ASPLOS, 1996.

[27] J. San Miguel, M. Badr and N. Enright Jerger, "Load Value Approximation," MICRO, 2014.

Virtual Memory and TLBs:

[28] B. Pham, V. Vaidyanathan, A. Jaleel and A. Bhattacharjee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.

[29] A. Sembrant, E. Hagersten and D. Black-Shaffer, "TLC: A Tag-Less Cache for Reducing Dynamic First Level Cache Energy," MICRO, 2013.

Multiprocessors:

[30] D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo and R. L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," ISCA, 1996.

[31] E. Fatehi and P. V. Gratz, "ILP and TLP in Shared Memory Applications: A Limit Study," PACT, 2014.

[32] J. Kim, M. Sullivan, E. Choukse and M. Erez, "Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures," ISCA, 2016.

[33] T. Nowatzki, V. Gangadhan, K. Sankaralingam and G. Wright, "Pushing the Limits of Accelerator Efficiency while Retaining Programmability," HPCA, 2016.

Intermittent Computing:

[34] M. Hicks, "Clank: Architectural Support for Intermittent Computation," ISCA, 2017.

[35] J. San Miguel, K. Ganesan, M. Badr, C. Xia, R. Li, H. Hsiao and N. Enright Jerger, "The EH Model: Early Design Space Exploration of Intermittent Processor Architectures," MICRO, 2018.

Approximate Computing:

[36] J. San Miguel, J. Albericio, A. Moshovos and N. Enright Jerger, "Doppelgänger: A Cache for Approximate Computing," MICRO, 2015.

[37] D. S. Khudia, B. Zamirai, M. Samadi and S. Mahlke, "Rumba: An Online Quality Management System for Approximate Computing," ISCA, 2015.

[38] K. Ganesan, J. San Miguel and N. Enright Jerger, "The What’s Next Intermittent Computing Architecture," HPCA, 2019.

Unconventional Architectures:

[39] I. Akturk and U. R. Karpuzcu, "AMNESIAC: Amnesic Automatic Computer," ASPLOS, 2017.

[40] A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel, R. Allmon, R. Rayess, S. Maresh and J. Emer, "Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures," ISCA, 2013.

[41] A. Madhavan, T. Sherwood and D. Strukov, "Race Logic: A Hardware Acceleration for Dynamic Programming Algorithms," ISCA, 2014.

[42] Z. Deng, A. Feldman, S. A. Kurtz and F. T. Chong, "Lemonade from Lemons: Harnessing Device Wearout to Create Limited-Use Security Architectures," ISCA, 2017.

Page updated

Report abuse