Abstract: Deep Neural Networks (DNNs) are very large artificial neural networks trained using very large datasets, typically using the supervised learning technique known as backpropagation. Currently, CPUs and GPUs are used for these computations. Over the next few years, we can expect special-purpose hardware accelerators based on conventional digital-design techniques to optimize the GPU framework for these DNN computations.
Even after the improved computational performance and efficiency that is expected from these special-purpose digital accelerators, there would still be an opportunity for even higher performance and even better energy-efficiency for inference and training of DNNs, by using neuromorphic computation based on analog memory devices.
In this presentation, I discuss the origin of this opportunity as well as the challenges inherent in delivering on it. While I will briefly discuss materials and devices for analog volatile and non-volatile memory, and circuit and architecture choices and challenges, I will focus on challenges in computer-aided design as well as the current status and prospects.
Bio: Geoffrey W. Burr received his Ph.D. in Electrical Engineering from the California Institute of Technology in 1996. Since that time, Dr. Burr has worked at IBM Research--Almaden in San Jose, California, where he is currently a Distinguished Research Staff Member. He has worked in a number of diverse areas, including holographic data storage, photon echoes, computational electromagnetics, nanophotonics, computational lithography, phase-change memory, storage class memory, and novel access devices based on Mixed-Ionic-Electronic-Conduction (MIEC) materials. Dr. Burr's current research interests involve AI/ML acceleration using non-volatile memory. Geoff is an IEEE Fellow (2020), and is also a member of MRS, SPIE, OSA, Tau Beta Pi, Eta Kappa Nu, and the Institute of Physics (IOP).
Abstract: We will give an overview of some recent results on 3D integration of CMOS and memristive memory arrays and demonstrate its potential of offering very high memory density and bandwidth at manageable power dissipation, and enabling new memory-centric computing paradigms for AI applications.
It has been recognized that such resistive memory cells still suffer from multiple limitations including high energy consumption for programming, limited endurance, and large cycle-to-cycle and device-to-device variations. We will highlight research directions and some recent solutions addressing these limitations.
Finally, we will discuss recent development and EDA opportunities of an application-specific co-design framework which closely integrates application-specific neural network search, hardware-friendly network compression and NN-aware architecture design for iterative co-optimization.
Bio: K.-T. Tim Cheng received his Ph.D. in EECS from the University of California, Berkeley in 1988. He has been serving as Dean of Engineering and Chair Professor of ECE and CSE at Hong Kong University of Science and Technology (HKUST) since May 2016. He worked at Bell Laboratories from 1988 to 1993 and joined the faculty at Univ. of California, Santa Barbara in 1993 where he was the founding director of UCSB’s Computer Engineering Program (1999-2002), Chair of the ECE Department (2005-2008) and Associate Vice Chancellor for Research (2013-2016). His current research interests include AI accelerator design, EDA, computer vision, and medical image analysis. He has recently led the founding of the AI Chip Center for Emerging Smart Systems (ACCESS) which is a multidisciplinary center aims to advance IC design to help realize ubiquitous AI applications in society. Cheng, an IEEE fellow and a fellow of Hong Kong Academy of Engineering Sciences a, received 12 Best Paper Awards from various IEEE and ACM conferences and journals. He has also received UCSB College of Engineering Outstanding Teaching Faculty Award, Pan Wen Yuan Outstanding Research Award, 2020, and Fellow of School of Engineering, The University of Tokyo. He served as Editor-in-Chief of IEEE Design and Test of Computers and was a board member of IEEE Council of Electronic Design Automation’s Board of Governors and IEEE Computer Society’s Publication Board.
Abstract: In-memory computing, where certain data processing is performed directly in the memory array, can be an effective accelerator architecture for data-intensive applications. Associative memory (AM), a type of memory that can efficiently “associate” an input query with appropriate data words/locations in the memory, is a powerful in-memory computing core. Nonetheless, harnessing the benefits of AM requires extensive cross-layer design space exploration spanning from devices and circuits to architectures and systems. In this talk, I will use several representative AM accelerator designs to show the vast design space involved. In particular, I will highlight how different non-volatile memory technologies can be exploited to implement various types of AM for some popular machine learning applications. I will introduce a circuit/architecture evaluation tool, Eva-CAM, that supports design space exploration of AM based IMC accelerators.
Bio: X. Sharon Hu is a professor in the department of Computer Science and Engineering at the University of Notre Dame, USA. Her research interests include low-power system design, circuit and architecture design with emerging technologies, real-time embedded systems and hardware-software co-design. She has published more than 360 papers in these areas. Some of her recognitions include the Best Paper Award from the Design Automation Conference and from the International Symposium on Low Power Electronics and Design, and NSF Career award. She was the General Chair of Design Automation Conference (DAC) in 2018 and was the TPC chair of DAC in 2015. She is the Editor-in-Chief of ACM Transactions on Design Automation of Electronic Systems, and has also served as Associate Editor for a number of ACM and IEEE journals. X. Sharon Hu is a Fellow of the IEEE.
Abstract: Edge artificial intelligence (AI) has been hailed as the next frontier of innovation in the Internet of Things (IoT) for our everyday objects to be connected and work together to improve our lives and transform industries. However, major challenges remain in achieving this potential due to the inherent complexity of designing energy-efficient edge AI architectures due to the complexity of convolutional neural networks (CNNs) with the underlying limited processing capabilities of edge AI accelerators. In this talk, Prof. Atienza will first discuss the benefits of operating edge AI architectures at sub-nominal conditions to reduce power and obtain ultra-low-power IoT systems, while highlighting the challenges of possible errors that appear in the memories of such systems when executing complex CNN designs. These errors can affect the stored values of CNN weights and activations, compromising their accuracy. Then, a new architectural design methodology for edge AI systems, called Embedded Ensemble CNNs (E2CNNs), will be presented to conceive ensembles of convolutional neural networks with improved robustness against memory errors compared to a single-instance CNNs. E2CNNs rely on compression methods and heuristics to produce an ensemble of CNNs for edge AI devices with the same memory requirements as the original architecture but improved error robustness. Overall, this new design methodology can efficiently explore the design space of voter-based ensemble architectures to trade-off memory footprint, performance, and accuracy for different expected memory error rates (in different types of memories) for sub-threshold operation. As a result, the next-generation of edge AI accelerators will be able to gracefully adapt the energy consumption and target computation precision of according to their available target application at each moment in time.
Bio: David Atienza is a Professor of Electrical and Computer Engineering and heads the Embedded Systems Laboratory (ESL) at EPFL. He received his MSc and Ph.D. degrees in Computer Science and Engineering from UCM and IMEC. His research interests focus on system-level design methodologies for energy-efficient computing systems, particularly multi-processor system-on-chip architectures (MPSoC) for servers and next-generation edge AI architectures. He is a co-author of more than 350 publications, 12 patents, and has received several best paper awards in top conferences in these fields. Dr. Atienza received, among other recognitions, the ICCAD 2020 10-Year Retrospective Most Influential Paper Award, the DAC Under-40 Innovators Award in 2018, the IEEE CEDA Early Career Award in 2013, and the ACM SIGDA Outstanding New Faculty Award in 2012. He is an IEEE Fellow, an ACM Distinguished Member, and was the President (2018-2019) of IEEE CEDA.
Abstract: Stochastic computing (SC) has seen a renaissance in recent years as a means for machine learning acceleration due to its compact arithmetic and approximation properties. Still, SC accuracy remains an issue, with prior works either not fully utilizing the computational density or suffering from significant accuracy losses. In this talk, I will discuss various optimizations we have developed over the past few years spanning generation of stochastic streams, effective pipelined architectures for SC execution and efficient training and neural architecture search for SC-based accelerators culminating in GEO or Generation and Execution Optimized Stochastic Computing Accelerator for Neural Networks. GEO bridges the accuracy gap between stochastic computing and fixed-point neural networks while delivering up to 5.6X higher throughput and 2.6X lower energy. I end with a brief discussion of the two silicon prototypes of GEO that we have taped out and measured, including one in-memory implementation.
Bio: Puneet Gupta received the B.Tech. degree in electrical engineering from IIT Delhi, New Delhi, India, in 2000 and the Ph.D. degree from the University of California at San Diego, La Jolla, CA, USA, in 2007. He is currently a Professor with the Electrical and Computer Engineering Department, University of California at Los Angeles, Los Angeles, CA, USA. He co-founded Blaze DFM Inc., Sunnyvale, CA, USA, in 2004, and served as its Product Architect until 2007. He has authored over 150 articles, a book, and a book chapter, and holds 17 U.S. patents. His current research interests include building high-value bridges across application-architecture-implementation-fabrication interfaces for lowered cost and power, increased yield, and improved predictability of integrated circuits and systems. Dr. Gupta was a recipient of the NSF CAREER Award, the ACM/SIGDA Outstanding New Faculty Award, the SRC Inventor Recognition Award, and the IBM Faculty Award. He led the multi-university IMPACT+ Center which focused on future semiconductor technologies.
Abstract: AI is having an increasingly large impact on our daily lives. However, current AI hardware and algorithms are still only partially inspired by the major blueprint for AI, i.e. the human brain. In particular, even the best AI hardware is still far away from the 20W power consumption, the low latency and the unprecedented large scale, high-throughput processing offered by the human brain. In this talk, I will describe our bio-inspired AI hardware, from our award-winning edge systems up to SpiNNaker2, which is the largest cloud system for real-time AI worldwide. With our unique sparsity-optimized hybrid AI framework, this enables a real-time distributed AI system with unprecedented low latency, low energy and high robustness. In this process, we also endavor to marry three different AI concepts: (1) DNN for handling real-world, noisy data, (2) bio-inspiration for efficiency and also to make representations sparse, closing the gap to the (3) symbolic AI layer, which does the abstracted, robust world interaction. Thus, we aim to close the huge gap between current pure computing power AI and true brain-like AI.
Bio: Christian Mayr is a Professor of Electrical Engineering at TU Dresden. He received the Dipl.-Ing. (M.Sc.) in Electrical Engineering in 2003, his PhD in 2008 and Habilitation in 2012, all three from Technische Universität Dresden, Germany. From 2003 to 2013, he has been with Technische Universität Dresden, with a secondment to Infineon (2004-2006). From 2013 to 2015, he was a group leader at the Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland. Since 2015, he is head of the Chair of Highly-Parallel VLSI-Systems and Neuromorphic Circuits at Technische Universität Dresden. His research interests include computational neuroscience, bio-inspired artificial intelligence, brain-machine interfaces, AD converters and general System-on-Chip design. He is author/co-author of over 100 publications and holds 4 patents. He has acted as editor/reviewer for various IEEE and Elsevier journals. He is a PI in two German excellency clusters, in the national supercomputing center Scads.AI and in the EU Flagship Human Brain Project. His work has received several national awards.
Abstract: Modern logic and physical synthesis tools provide numerous options and parameters that can drastically affect design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. Fortunately, machine learning approaches and cloud computing environments are well suited for tackling complex parameter tuning problems like those seen in VLSI design flows. This talk proposes a holistic approach where online and offline machine learning approaches work together for tuning industrial design flows. We describe a system called SynTunSys (STS) that has been used to optimize multiple industrial high-performance processors. STS consists of an online system that optimizes designs and generates data for a recommender system that performs offline training and recommendation. Experimental results show the collaboration between STS online and offline machine learning systems as well as insight from human designers provide best-of-breed results. Finally, we discuss potential new directions for research on design flow tuning.
Bio: Matthew Ziegler is a Principal Research Staff Member at the IBM T. J. Watson Research Center, Yorktown Heights, NY. He received the Ph.D. degree in electrical engineering from the University of Virginia, Charlottesville, in 2004. Since joining IBM Research in 2004, he received several technical accomplishment awards in the areas of processor design, design automation, and low power design. Dr. Ziegler has directly participated in the design of IBM’s Power Systems, z Systems, and BlueGene families of products. His research has recently focused on AI accelerator design, machine learning for CAD, and VLSI design productivity. This work has led to design methodologies and design automation systems used by multiple IBM processor and ASIC design teams. He is a recipient of the 2018 Mehboob Khan Award from the Semiconductor Research Corporation and is a member of the IBM Academy of Technology. He has served on various conference committees, including General Chair for ISLPED 2019 and Program Chair for the 2018-2021 IEEE IBM AI Compute Symposia.
Abstract: In this talk, we will introduce the OpenFPGA framework whose aim is to generate highly- customizable Field Programmable Gate Array (FPGA) fabrics and their supporting EDA flows. Following the footsteps of the RISC-V initiative, OpenFPGA brings reconfigurable logic into the open-source community and closes the performance gap with commercial products. OpenFPGA strongly incorporates physical design automation in its core and enables 100k+ look-up tables FPGA fabric generation from specification to layout in less than 24h with a single engineer effort.
Bio: Pierre-Emmanuel Gaillardon is an associate professor in the Electrical and Computer Engineering (ECE) department and an adjunct assistant professor in the School of Computing at The University of Utah, Salt Lake City, UT, where he leads the Laboratory for NanoIntegrated Systems (LNIS). He holds an Electrical Engineer degree from CPE-Lyon, France (2008), a M.Sc. degree in Electrical Engineering from INSA Lyon, France (2008) and a Ph.D. degree in Electrical Engineering from CEA-LETI, Grenoble, France and the University of Lyon, France (2011).
Prior to joining the University of Utah, he was a research associate at the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland within the Laboratory of Integrated Systems (Prof. De Micheli) and a visiting research associate at Stanford University, Palo Alto, CA, USA. Previously, he was research assistant at CEA-LETI, Grenoble, France. Prof. Gaillardon is recipient of the C-Innov 2011 best thesis award, the Nanoarch 2012 best paper award, the BSF 2017 Prof. Pazy Memorial Research Award, the 2017 NSF CAREER award, the 2018 IEEE CEDA Pederson Award, the 2018 ChemE Education William H. Corcoran best paper award, the 2019 DARPA Young Faculty Award, the 2019 IEEE CEDA Ernest S. Kuh Early Career Award and the 2020 ACM SIGDA Outstanding New Faculty Award.
He has been serving as TPC member for many conferences, including DATE, DAC, ICCAD, Nanoarch, etc.. He is an associate editor of IEEE TNANO and a reviewer for several journals and funding agencies. He served as Topic co-chair "Emerging Technologies for Future Memories" for DATE'17-19. He is a senior member of the IEEE.
The research activities and interests of Prof. Gaillardon are currently focused on the development of novel computing systems exploiting emerging device technologies and novel EDA techniques.
Abstract: Performing energy-efficient deep neural network training and inference at the edge is challenging with current memory technologies, and as neural networks grow in size and computation, this problem is getting worse. Emerging non-volatile memories may be the answer with resistive RAM (RRAM) being one of the most promising candidates. To evaluate RRAM in the context of neural network training and inference at the edge, we design, fabricate, and test CHIMERA, the first non-volatile edge AI SoC using foundry provided on-chip RRAM macros and no off-chip memory. CHIMERA achieves 0.92 TOPS peak performance and 2.2 TOPS/W. We scale inference to 6x larger DNNs by connecting 6 CHIMERAs with just 4% execution time and 5% energy costs, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wakeup/shutdown. We demonstrate the first incremental edge AI training which overcomes RRAM write energy, speed, and endurance challenges. Our training achieves the same accuracy as traditional algorithms with up to 283x fewer RRAM weight update steps and 340x better energy-delay product. We thus demonstrate 10 years of 20 samples/minute incremental edge AI training on CHIMERA.
Bio: Priyanka Raina is an Assistant Professor in Electrical Engineering at Stanford University. Previously, she was a Visiting Research Scientist in the Architecture Research Group at NVIDIA Corporation. She received her Ph.D. degree in 2018 and S.M. degree in 2013 in Electrical Engineering and Computer Science from MIT and her B.Tech. degree in Electrical Engineering from Indian Institute of Technology (IIT) Delhi in 2011. Priyanka’s current research interests are designing energy-efficient and high-performance circuits and systems for image, vision and machine learning applications on mobile devices, integrating emerging non volatile memory technologies in accelerator architectures, and creating frameworks for improving hardware/software system design productivity.
Abstract: The prevalence of deep neural networks today is supported by a variety of powerful hardware platforms including GPUs, FPGAs, and ASICs. A fundamental question lies in almost every implementation of deep neural networks: given a specific task, what is the optimal neural architecture and the tailor-made hardware in terms of accuracy and efficiency? Earlier approaches attempted to address this question through hardware-aware neural architecture search (NAS), where features of a fixed hardware design are taken into consideration when designing neural architectures. However, we believe that the best practice is through the simultaneous design of the neural architecture and the hardware to identify the best pairs that maximize both test accuracy and hardware efficiency. In this talk, we will present novel co-exploration frameworks for neural architecture and various hardware platforms including FPGA, NoC, ASIC and Computing-in-Memory, all of which are the first in the literature. We will demonstrate that our co-exploration concept greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs.
Bio: Dr. Yiyu Shi is currently a professor in the Department of Computer Science and Engineering at the University of Notre Dame, the site director of National Science Foundation I/UCRC Alternative and Sustainable Intelligent Computing, and the director of the Sustainable Computing Lab (SCL). He is also a visiting scientist at Boston Children’s Hospital, the primary pediatric program of Harvard Medical School. He received his B.S. in Electronic Engineering from Tsinghua University, Beijing, China in 2005, the M.S and Ph.D. degree in Electrical Engineering from the University of California, Los Angeles in 2007 and 2009 respectively. His current research interests focus on hardware intelligence and biomedical applications. In recognition of his research, more than a dozen of his papers have been nominated for or awarded as the best paper in top conferences. He was also the recipient of IBM Invention Achievement Award, Japan Society for the Promotion of Science (JSPS) Faculty Invitation Fellowship, Humboldt Research Fellowship, IEEE St. Louis Section Outstanding Educator Award, Academy of Science (St. Louis) Innovation Award, Missouri S&T Faculty Excellence Award, NSF CAREER Award, IEEE Region 5 Outstanding Individual Achievement Award, Air Force Summer Faculty Fellowship, IEEE Computer Society TCVLSI Mid-Career Research Achievement Award, Facebook Research Award, among others. He has served on the technical program committee of many international conferences. He is the deputy editor-in-chief of IEEE VLSI CAS Newsletter, and an associate editor of various IEEE and ACM journals.
Abstract: As the majority of design houses follow the globalized fabless semiconductor manufacturing business model to curtail cost, their products are exposed to a range of security and trustworthiness threats, such as Intellectual Property (IP) theft, unauthorized over-production and counterfeiting. Therefore, the ability to hide sensitive or proprietary portions of a design from a potentially untrusted foundry is becoming paramount for IP protection. Design obfuscation solutions, such as logic locking, which embed the secret IP within broader functionality through the use of a secret key, offer a certain level of protection against brute-force attacks, yet have been shown vulnerable to intelligent attacks, such as the clever use of a Boolean Satisfiability (SAT) solver. To prevent an untrusted fab from obtaining sensitive IP, an alternative approach could rely, instead, on completely redacting parts of the design from the fabricated silicon and reinstating them through post-manufacturing programming. While this idea is, in itself, rather straightforward, realizing it in a cost-effective manner is quite challenging. Indeed, using embedded Field Programmable Gate Arrays (eFPGAs) to implement the omitted portion of the function provides high security levels but results in significant area, power and performance overhead. Towards alleviating this overhead, we developed a Transistor-Level Programmable (TRAP) fabric and an extensive framework for designing and implementing cost-effective hybrid ASIC/Programmable Integrated Circuits (ICs), wherein sensitive IP can be protected through design redaction. In this presentation, we will discuss the design of the latest version of our TRAP fabric in GlobalFoundries' 12nm technology, the CAD tool-flow necessary for supporting such hybrid designs ASIC/Programmable ICs, and the protection that TRAP-based design redaction offers against both brute-force and intelligent attacks seeking to recover the redacted IP. .
Bio: Yiorgos Makris received the Diploma degree in computer engineering from the University of Patras, Greece, in 1995, and the M.S. and Ph.D. degrees in computer engineering from the University of California at San Diego, San Diego, in 1998 and 2001, respectively. After spending a decade on the faculty of Yale University, he joined UT Dallas, where he is currently a Professor of electrical and computer engineering, leading the Trusted and RELiable Architectures (TRELA) Research Laboratory, and the Safety, Security and Healthcare Thrust Leader for Texas Analog Center of Excellence (TxACE). His research interests include applications of machine learning and statistical analysis in the development of trusted and reliable integrated circuits and systems, with particular emphasis on the analog/RF domain. He was a recipient of the 2006 Sheffield Distinguished Teaching Award, Best Paper Awards from the 2013 IEEE/ACM Design Automation and Test in Europe (DATE 2013) Conference and the 2015 IEEE VLSI Test Symposium (VTS 2015), as well as Best Hardware Demonstration Awards from the 2016 and the 2018 IEEE Hardware-Oriented Security and Trust Symposia (HOST 2016 and HOST 2018). He has served as an Associate Editor for the IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY and the IEEE Design and Test of Computers Periodical, and a Guest Editor for the IEEE TRANSACTIONS ON COMPUTERS and the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS. He serves as an Associate Editor for the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS