Abstract: In recent decades prefetchers have been extensively researched, engineered, re-engineered and quite often over-engineered. Papers in this area have become increasingly complex and reviewers became harder to pass. However, at the same time, the community became bigger and stronger than ever before, the simulation tools simpler and powerful more than ever, and we are facing one of the greatest paradigm shifts in scientific history that can assist anyone who just happens to have an initial idea. Given all that, does it still make sense to research prefetchers in 2026? Have we reached the point of diminishing return? Are there any frontiers left to explore and conquer? Will there be a DPC5?! This talk will discuss current research work and try to present a few potential directions for expansion.
Bio: Leeor Peled has received his PhD at the Technion under Prof. Uri Weiser & Yoav Etsion focused on ML based prefetchers with rich context. He spent over 22 years doing CPU core architecture, starting at Intel Haifa working on various core clusters from Sandybridge, Ivybridge & Skylake until Goldencove and Lioncove. He then moved to Huawei working on CPU cores for Mobile phones, server CPUs and many other domains. Currently he is leading Huawei CPU Architecture Research with a great team that spans across Israel, Zurich, Cambridge, Edinburgh, and China. His main focus is on Predictors, Parallelism, Caches, SW/HW codesign, System/OS, Dynamic optimization, Compilers, but he may be willing to talk about other things given a chance.
Abstract: For decades, prefetching research has chased the "perfect algorithm"—a universal oracle designed to master an ever-expanding set of workloads. In the modern datacenter, however, this one-size-fits-all philosophy is colliding with the reality of extreme heterogeneity as hardware platforms, workload mixes, and utilization rates are in constant flux. This mismatch is exacerbated by commercial prefetchers that have become increasingly aggressive to juice SPEC benchmarks, imposing an unsustainable "tax" on memory bandwidth. At scale, this translates into a lose-lose trade-off: over-provision memory bandwidth to handle speculative overhead, or disable prefetching entirely and sacrifice performance. This talk argues that the future of prefetching lies not in deeper algorithmic complexity, but in software-defined flexibility. To reclaim performance, we must move past the narrow, rigid interfaces of current ISAs that prevent the software stack from communicating intent or priority. It is time to open the hardware black box and transform the prefetcher into a manageable, programmable resource.
Bio: Akanksha Jain is a staff software engineer at Google, where she works on cross-stack solutions to improve datacenter efficiency. Her work has led to substantial efficiency improvements to Google's existing fleet and has informed the design of future SoCs. Her research interests are in computer architecture, with a particular focus on the memory system and on using machine learning to improve system design. She received her Ph.D. in Computer Science from the University of Texas at Austin in 2016.