Speakers

https://whova.com/portal/webapp/iccad_202011/

Leon Stok

IBM

EDA and Quantum Computing: a symbiotic relationship?

Abstract: Quantum computing (QC) is fast emerging as a potential disruptive technology that can upend some businesses in the short-term and many business in the long run. Electronic Design Automation (EDA) is uniquely positioned to not only benefit from quantum computing technologies, but can also impact the pace of development of that technology. In short, we want to address the following: what can EDA offer quantum computing?, and how can quantum computing potentially impact EDA? We will also provide a “sense” for what this technology is all about what it portends.

Bio: Leon Stok is Vice President of IBM's Electronic Design Automation group. Prior to this he held positions as director of EDA and executive assistant to IBM's Senior Vice President of Technology and Intellectual Property and executive assistant to IBM's Senior Vice President of the Technology group. Leon Stok studied electrical engineering at Eindhoven University of Technology, the Netherlands, from which he graduated with honors in 1986. He obtained a Ph.D. degree from Eindhoven University in 1991. Leon Stok worked at IBM's Thomas J. Watson Research Center as part of the team that developed BooleDozer, the IBM logic synthesis tool. Subsequently he managed IBM's logic synthesis group and drove the development of PDS, IBM’s Placement Driven Synthesis tool. From 1999-2004 he led all of IBM's design automation research as the Senior Manager Design Automation at IBM Research. Dr. Stok has presented over fifty keynotes, invited talks and tutorials at major IEEE and ACM conferences worldwide and at many leading universities. Dr. Stok has published over sixty papers on many aspects of high level, architectural and logic synthesis, low power design, placement driven synthesis and on the automatic placement and routing for schematic diagrams. He holds 13 patents in EDA. He was elected an IEEE fellow for the development and application of high-level and logic synthesis algorithms.

Frank Schirrmeister

Cadence

EDA Industry Perspective on AI/ML Hardware/Software

Abstract: This presentation will discuss requirements for the design of systems on chips (SoCs) and systems for artificial intelligence (AI) and machine learning (ML) and introduce solutions that the Electronic Design Automation (EDA) industry provides today, as well as trends to address future challenges. In addition, we will also introduce AI/ML technologies that are used to increase the productivity and optimize the EDA design processes itself.

Specifically, we will discuss aspects of enabling processor and design IP and high-level synthesis to enable optimized circuitry for AI/ML algorithms. Furthermore, we will introduce the requirements for optimized verification of AI/ML designs and specific optimizations of verification tools for this category of designs. Advanced node and low power implementation will be key aspects linking verification to SoC implementation and we will discuss specific optimizations, as well as 3DIC and Chiplet based integration and analysis aspects. For the utilization of AI/ML for EDA, we will introduce trends and experiences with using AI/ML for formal verification, simulation and implementation.

Bio: Frank Schirrmeister is a senior solutions group director at Cadence, where he leads a team translating customer challenges in the hyperscale, communications, consumer, automotive, aerospace/defense, industrial and healthcare vertical domains into specific requirements and solutions. His team focuses on cross-product technical solutions such as 5G, artificial intelligence, machine learning, safety, security and digital twins, as well as key partner collaborations. Frank holds a Dipl.-Ing. in electrical engineering from the Technical University of Berlin, Germany. Prior to joining Cadence, Frank held senior engineering and product management positions in embedded software, semiconductor and system development, both in Europe and the United States.

David Pan

University of Texas at Austin

AI for IC and IC for AI: Closed Loop Perspectives and Recent Results

Abstract: The recent artificial intelligence (AI) boom has been largely driven by three confluence forces: algorithms, big data, and computing power enabled by modern integrated circuits (ICs) including specialized AI accelerators. In this talk, I will present a synergistic approach on AI and intelligent IC/accelerator designs with two main themes, AI for IC and IC for AI. As the semiconductor technology enters the era of extreme scaling, IC design and manufacturing complexities are becoming extremely high. More intelligent and agile IC design technologies are needed than ever to optimize performance, power, area, manufacturability, reliability, security, etc., and to deliver equivalent scaling to Moore’s Law. I will present some recent results leveraging modern AI and machine learning advancement with domain-specific customizations for agile IC design and manufacturing closure. Meanwhile, customized IC can drastically improve AI performance and energy efficiency by orders of magnitude. I will present the hardware/software co-design for energy-efficient neural networks. The bidirectional reinforcement of AI and IC technologies holds great potential to significantly advance the state-of-the-art of each other.

Bio: David Z. Pan received his BS degree in Physics from Peking University and his MS/PhD degrees in Computer Science from UCLA. From 2000 to 2003, he was a Research Staff Member with the IBM T. J. Watson Research Center. He is currently Silicon Labs Endowed Chair Professor at the Department of Electrical and Computer Engineering, The University of Texas at Austin. His research interests include bidirectional AI and IC interaction, cross-layer design for manufacturability, reliability, security, CAD for analog/mixed-signal designs and emerging technologies. He has published over 380 refereed journal/conference papers and 8 US patents. He has served in many journal editorial boards and conference committees, including various leadership roles. He is the ACM/SIGDA Award Chair. He has received 19 Best Paper Awards and 16 additional Best Paper Award nominations. He is a Fellow of IEEE and SPIE.

Deming Chen

University of Illinois at Urbana Champaign

Effective Co-design of Deep Learning Algorithms and Hardware Accelerators

Abstract: In a conventional top-down design flow, deep-learning algorithms are first designed concentrating on the model accuracy, and then accelerated through hardware accelerators trying to meet various system design targets on power, energy, speed, and cost. However, this approach often does not work well because it ignores the impacts and physical constraints that the hardware architectures themselves would have towards the deep-learning algorithm design and deployment. Thus, an ideal scenario is that algorithms and their hardware accelerators are developed simultaneously. In this talk, we will present our DNN/Accelerator co-design and co-search methods. Our results have shown great promises for delivering high-performance hardware-tailored DNNs and DNN-tailored accelerators naturally and elegantly. One of the DNN models coming out of this co-design method, called SkyNet, won a double championship in the competitive DAC System Design Contest for both the GPU and the FPGA tracks for low-power object detection in 2019.

Bio: Dr. Deming Chen obtained his BS in computer science from University of Pittsburgh, Pennsylvania in 1995, and his MS and PhD in computer science from University of California at Los Angeles in 2001 and 2005 respectively. He joined the ECE department of University of Illinois at Urbana-Champaign in 2005. His current research interests include reconfigurable computing, machine learning and cognitive computing, system-level and high-level synthesis, and hardware security. He has given more than 110 invited talks sharing these research results worldwide. He is the Donald Willett Faculty Scholar and the Abel Bliss Professor of the Grainger College of Engineering, an IEEE Fellow, an ACM Distinguished Speaker, and the Editor-in-Chief of ACM Transactions on Reconfigurable Technology and Systems (TRETS).

Song Han

MIT

MCUNet: Tiny Deep Learning on Microcontrollers

Abstract: Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 3 orders of magnitude less than mobile phones, not to mention GPUs. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine). MCUNet provides automated and wholistic design methodology for perfectly matched neural architecture and the inference engine on MCU. MCUNet enables ImageNet-scale inference on microcontrollers that has only 1MB of FLASH and 320KB SRAM. It achieves significant speedup compared to popular MCU libraries: TF-Lite Micro, CMSIS-NN, and MicroTVM. Our study suggests that the era of tiny machine learning on IoT devices has arrived.

Bio: Song Han is an assistant professor in MIT’s Department of Electrical Engineering and Computer Science. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators, which impacted commercial AI chips designed by NVIDIA, Xilinx, Samsung, MediaTek, etc.. His recent work on hardware-aware neural architecture search was highlighted by MIT News, Qualcomm News, VentureBeat, IEEE Spectrum, integrated in PyTorch and AutoGluon, and received many low-power computer vision contest awards in flagship AI conferences (CVPR’19, ICCV’19 and NeurIPS’19). Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning.”

Kevin Cao

Arizona State University

Design Limits on In-Memory Computing: Beyond the Crossbar

Abstract: AI algorithms have achieved increasingly higher accuracy for a variety of applications, with the cost of deeper networks, larger model size, and higher connection density. RRAM-based IMC architecture offers a parallel and energy-efficient solution. Yet, its performance is limited by device non-idealities, circuit precision, on-chip interconnection, and algorithm properties. Based on statistical data from a fully-integrated 65nm CMOS/RRAM test chip and a cross-layer simulation framework, we illustrate that the real bottleneck of the IMC system is not the RRAM cross-bar, but the analog-to-digital converter (ADC) precision and the stability of machine learning models. These factors interact with device variation and the Roff/Ron ratio, limiting the useful number of RRAM levels, inference accuracy, and system energy-delay product (EDP). In addition, the topology of on-chip interconnect needs to be optimized in order to manage a large volume of data movement among the crossbars. The results are summarized into a roofline model and demonstrated on CIFAR-10, SVHN, CIFAR-100 and ImageNet, helping shed light on future IMC research focus.

Bio: Yu Cao received the B.S. degree in physics from Peking University in 1996. He received the M.A. degree in biophysics and the Ph.D. degree in electrical engineering from University of California, Berkeley, in 1999 and 2002, respectively. He is now a Professor of Electrical Engineering at Arizona State University, Tempe, Arizona. He has published numerous articles and two books on nanoelectronic modeling and design. His research interests include neural-inspired computing, hardware acceleration for on-chip learning, and reliable integration of nanoelectronic devices. He is a Fellow of the IEEE.

Yiran Chan

Duke University

Hardware/Software Co-design for AI Systems

Abstract: The rapid growth of modern neural network (NN) models’ scale generates ever-increasing demands for high computing power of artificial intelligence (AI) systems. Many specialized computing devices have been deployed in the AI systems, forming a truly application-driven heterogenous computing platform. In this talk, we discuss the importance of hardware/software co-design in AI system designs. We first use resistive memory based NN accelerators to illustrate the design philosophy of AI computing systems, and then present several hardware-friendly NN model compression techniques. We also extend our discussions to distributed AI systems and briefly introduce the automation of NN co-design flow, e.g., neural architecture search.

Bio: Yiran Chen received B.S and M.S. from Tsinghua University and Ph.D. from Purdue University in 2005. After five years in industry, he joined University of Pittsburgh in 2010 as Assistant Professor and then promoted to Associate Professor with tenure in 2014, held Bicentennial Alumni Faculty Fellow. He now is the Professor of the Department of Electrical and Computer Engineering at Duke University and serving as the director of NSF Industry–University Cooperative Research Center (IUCRC) for Alternative Sustainable and Intelligent Computing (ASIC) and co-director of Duke University Center for Computational Evolutionary Intelligence (CEI), focusing on the research of new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems. Dr. Chen has published one book and more than 400 technical publications and has been granted 96 US patents. He serves or served the associate editor of several IEEE and ACM transactions/journals and served on the technical and organization committees of more than 50 international conferences. He is now serving as the Editor-in-Chief of IEEE Circuits and Systems Magazine. He received 7 best paper awards, 1 best poster award, and 15 best paper nominations from international conferences and workshops. He is the recipient of NSF CAREER award, ACM SIGDA outstanding new faculty award, the Humboldt Research Fellowship for Experienced Researchers, and the IEEE SYSC/CEDA TCCPS Mid-Career Award. He is the Fellow of IEEE, the Distinguished Member of ACM, the distinguished lecturer of IEEE CEDA, and is listed in the HPCA Hall of Fame.

Claudionor Coelho

Palo Alto Networks

AutoQKeras: An AutoML Library for QKeras

Abstract: In this talk, we will introduce AutoQKeras, a library for quantization selection and hyperparameter tuning for Deep Neural Networks. Using AutoQKeras, users can trade-off accuracy by quantization. In order to perform the appropriate trade-offs, we introduce "forgiving factor", which enables a user to answer the question "if I could reduce the model energy or number of bits by a factor, how much accuracy would the user be willing to trade-off". With AutoQKeras, users can select among several optimization strategies, such as global optimization of network hyperparameters and quantizers, or splitting the optimization problems into smaller search problems to cope with search complexity. We show how AutoQKeras and QKeras can be used to create sub-microsecond inference engines.

Bio: Claudionor N. Coelho is a serial innovator with Palo Alto Networks. Prior to joining Palo Alto Networks, he worked on Machine Learning/Deep Learning hardware acceleration for video compression at Google. Previously, he was the VP of Software Engineering, Machine Learning and Deep Learning at NVXL Technology, being responsible for creating new hardware/software acceleration techniques that led to a USD 15 million investment from Alibaba. He did seminal work on AI at Synopsys Inc, the GM for Brazil for Cadence Design Systems, and previously the SVP of Engineering for Jasper Design Automation, leading the team that was awarded the Red Herring most innovative company in the US in 2013. He has more than 80 papers and patents, and he was an Associate Professor of Computer Science at UFMG, Brazil. He has a PhD in EE/CS from Stanford University, MBA from IBMEC Business School, and an MSCS and BSEE (summa cum laude) from UFMG, Brazil.

Priyanka Raina

Stanford University

Creating an Agile Hardware Accelerator Design Flow

Abstract: Although an agile approach is standard for software design, how to properly adapt this method to hardware is still an open question. Stanford's AHA (Agile HArdware) project is working towards this goal while building a system on chip (SoC) with specialized accelerators. Rather than using a traditional waterfall design flow, which starts by studying the application to be accelerated, we begin by constructing a complete flow from an application expressed in a high-level domain specific language (DSL), in our case Halide, to a generic coarse-grained reconfigurable array (CGRA), that functions as our hardware accelerator. As our understanding of the application grows, the CGRA design evolves, and we have developed a suite of tools that allow this to happen. We fabricated our first CGRA, called Jade, last year. With that development experience in hand, we have worked on ways to tune the application code, the compiler, and the CGRA to increase the efficiency of the resulting implementation. To meet our need to continually update parts of the system while maintaining the end-to-end flow, we have created DSL-based hardware generators which not only provide the Verilog needed for the implementation of the CGRA, but also create the collateral that the compiler/mapper/place and route system needs to configure its operation. Moreover, these DSLs support staged generation of hardware, which allows for separation of concerns. This work provides a systematic approach to design and evolve high performance and energy-efficient hardware-software systems for application domains such an image and video processing and machine learning. More details about the Stanford AHA project can be found on our website: https://aha.stanford.edu. All the tools and hardware designed as a part of this project are open-source and can be found on our github page: https://github.com/StanfordAHA.

Bio: Priyanka Raina is an Assistant Professor in Electrical Engineering at Stanford University. Previously, she was a Visiting Research Scientist in the Architecture Research Group at NVIDIA Corporation. She received her Ph.D. degree in 2018 and S.M. degree in 2013 in Electrical Engineering and Computer Science from MIT and her B.Tech. degree in Electrical Engineering from Indian Institute of Technology (IIT) Delhi in 2011. Priyanka’s current research interests are designing energy-efficient and high-performance circuits and systems for image, vision and machine learning applications on mobile devices, integrating emerging non volatile memory technologies in accelerator architectures, and creating frameworks for improving hardware/software system design productivity.

Junjun Xiong

IBM Research

A Statistical Distribution-based Deep Neuron Network Model – a new perspective on effective learning

Abstract: The impressive results achieved by deep neural networks (DNNs) in various tasks,computer vision in particular, such as image recognition, object detection and image segmentation, have sparked the recent surging interests in artificial intelligence (AI) from both the industry and the academia alike. The wide adoption of DNN models in real-time applications has, however, brought up a need for more effective training of an easily parallelizable DNN model for low latency and high throughput. This is particularly challenging because of DNN's deep structures. To address this challenge, we observe that most of existing DNN models operate on deterministic numbers and process one single frame of image at a time, and may not fully utilize the temporal and contextual correlation typically present in multiple channels of the same image or adjacent frames from a video. Based on well-established statistical timing analysis foundations from the EDA domain, we propose a novel statistical distribution-based DNN model that extends existing DNN architectures but operates directly on correlated distributions rather than deterministic numbers. This new perspective of training DNN has resulted in surprising effects on achieving not only improved learning accuracy, but also reduced latency and increased high throughputs. Preliminary experimental results on various tasks, including 3D Cardiac Cine MRI segmentation, showed a great potential of this new type of statistical distribution-based DNN model, which warrants further investigation.

Bio: Dr. Jinjun Xiong is currently Researcher and Program Director for AI and Hybrid Clouds Systems at the IBM Thomas J. Watson Research Center. He co-founded and co-directs the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR.com) with Prof. Wen-mei Hwu at UIUC. His recent research interests are on across-stack AI systems research, which include AI solutions, algorithms, tooling and computer architectures. Many of his research results have been adopted in IBM’s products and tools. He published more than 100 peer-reviewed papers in top AI conferences and systems conferences. His publication won six Best Paper Awards and eight Nominations for Best Paper Awards. He also won top awards from various international competitions, including the recent champion for the IEEE GraphChallenge on accelerating sparse neuron networks, and champions for the DAC'19 Systems Design Contest on designing an object detection neural network for edge FPGA and GPU devices.

Page updated

Report abuse