NAE Regional Meeting
April 5, 2017
Computer History Museum
Mountain View, CA
Thank you to all the attendees at the symposium. The slides used in select presentations are linked below.
With the ending of Moore's Law, many computer architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. The Tensor Processing Unit (TPU), deployed in Google datacenters since 2015, is a custom chip that accelerates deep neural networks (DNNs). We compare the TPU to contemporary server-class CPUs and GPUs deployed in the same datacenters. Our benchmark workload, written using the high-level TensorFlow framework, uses production DNN applications that represent 95% of our datacenters’ DNN demand. The TPU is an order of magnitude faster than contemporary CPUs and GPUs and its relative performance per Watt is even larger. The TPU’s deterministic execution model turns out to be a better match to the response-time requirement of our DNN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, …) that help average throughput more than guaranteed latency. The lack of such features also helps explain why despite having myriad arithmetic units and a big memory, the TPU is relatively small and low power.
Networking ties together storage, distributed computing and security in our global infrastructure. However, recent qualitative shifts in Moore's Law and breakthroughs in storage technology create significant new demands for networking technology. In this talk, we explore what next-generation large-scale computation infrastructure is likely to look like and the central role that networking must play to realize the next wave of computational scale and efficiency.