The HALO Workshop Program Agenda
Thursday, November 4, 2021
Welcome to the ICCAD Workshop on Hardware and Algorithms for Learning On-a-chip (HALO 2021) !
Zoom link: https://us02web.zoom.us/j/88279160214?pwd=bXV6bEJTeHg3ZXlvMEJlaXpRaTJUQT09
Zoom link: https://us02web.zoom.us/j/88279160214?pwd=bXV6bEJTeHg3ZXlvMEJlaXpRaTJUQT09
Opening and Introduction
11:00am - 11:05am CST
Keynote Talk 1
Talk title: "Light in AI: Toward Efficient and Robust Neurocomputing with Optical Neural Networks"
Deep neural networks have demonstrated superior performance in various artificial intelligence tasks. However, as Moore's law winds down, it becomes increasingly difficult for traditional electrical digital computers to support the escalating computation demands with tight performance and energy constraints. As an emerging neurocomputing framework, the optical neural network (ONN) shows promising potentials for next-generation AI acceleration due to its ps-level latency, TOPS-level throughput, and sub-fJ/MAC-level energy efficiency. In this talk, I will present recent progress in leveraging light for efficient AI computing and how hardware-software co-design plays a key role in this synergistic exploration. In particular, I will show how to jointly facilitate the entire ONN design stack with circuit, architecture, and algorithm co-design for efficiency, robustness, and on-chip learnability.
Session 1: Deep Learning Solutions Towards Real-world Applications
Talk title: "Efficient Audio-Visual Understanding on AR Devices"
Augmented reality (AR) is a set of technologies that will fundamentally change the way we interact with our environment. It represents a merging of the physical and the digital worlds into a rich, context aware user interface delivered through a socially acceptable form factor such as eyeglasses. The majority of these novel experiences in AR systems will be powered by AI because of their superior ability to handle in-the-wild scenarios. A key AR use case is a personalized, proactive and context-aware Assistant that can understand the user’s activity and their environment using audio-visual understanding models. In this presentation, we will discuss the challenges and opportunities in both training and deployment of efficient audio-visual understanding on AR glasses. We will discuss enabling always-on experiences within a constrained power budget using cascaded multimodal models, and co-designing them with the target hardware platforms. We will present our early work to demonstrate the benefits and potential of such a co-design approach and discuss open research areas that are promising for the research community to explore.
Talk title: "Privacy in Federated Learning at Scale"
I will start this talk by overviewing Federated Learning (FL) and its core data minimization principles. I will then describe how privacy can be strengthened and rigorized using complementary privacy techniques such as differential privacy, secure multi-party computation, and privacy auditing methods. I will spend much of the talk describing how we can carefully combine technologies like differential privacy and secure aggregation to obtain formal distributed privacy guarantees without fully trusting the server in adding noise. I will present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. I will conclude by showing experimental results that demonstrate that our solution is able to achieve a comparable accuracy to central differential privacy (which requires trusting the server in adding noise) with just 16 bits of precision per value.
Talk title: "Co-Design for Low-Bitwidth Neural Networks with Dynamic Quantization"
Talk title: "Co-Design for Low-Bitwidth Neural Networks with Dynamic Quantization"
This talk presents our recent investigation into low-bitwidth quantization for deep neural networks (DNNs), using a co-design approach featuring contributions to both algorithms and hardware accelerators. We will introduce precision gating (PG), a dynamic, fine-grained, and trainable technique for DNN quantization. Unlike static approaches, PG exploits input-dependent dynamic sparsity at run time, resulting in a significant reduction in compute cost with a minimal impact on accuracy. We will also discuss FracBNN, which exploits PG to substantially improve the accuracy of binary neural networks (BNNs). Our experiments show that for the first time, a BNN model can achieve MobileNetV2-level accuracy on ImageNet. On the embedded FPGA device, FracBNN demonstrates the ability of real-time image classification; it surpasses the best-known BNN design on FPGAs with an increase of 28.9% in top-1 accuracy and a 2.5x reduction in model size.
Break: 1:05pm - 1:25pm CST
Session 2: Hardware-aware Deep Learning Techniques
Talk title: "The Lottery Ticket Hypothesis: On Sparse, Trainable Neural Networks"
Talk title: "The Lottery Ticket Hypothesis: On Sparse, Trainable Neural Networks"
I recently proposed the lottery ticket hypothesis: that the dense neural networks we typically train have much smaller subnetworks capable of reaching full accuracy from early in training. This hypothesis raises (1) scientific questions about the nature of overparameterization in neural network optimization and (2) practical questions about our ability to accelerate training. In this talk, I will discuss established results and the latest developments in my line of work on the lottery ticket hypothesis, including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to practical settings, and the relationship between these subnetworks and their "stability" to the noise of stochastic gradient descent. I will also describe my vision for the future of research on this topic.
Talk title: "Intelligent Visual Computing"
Talk title: "Intelligent Visual Computing"
Addressing the world's more pressing issues such as environmental sustainability and cultural heritage preservation increasingly relies on diverse visual applications running on emerging platforms such as AR/VR headsets, autonomous machines, and smart sensor nodes. In real-time and using low power, visual computing systems must generate visual data for human to consume, immersively, or interpret visual data to provide personalized services, intelligently.
In this talk, I will explain why today's computer systems and architecture are not ready for the visual computing on the horizon, and outline the road that might get us there. Fundamentally, we must innovate both horizontally, by co-designing across different domains that are conventionally designed and optimized in isolation, and vertically, by rethinking the systems stack for 3D perception.
Talk title: "Algorithm and Hardware Co-Design for Efficient Deep Learning: Sparse and Low-rank Perspectives"
Talk title: "Algorithm and Hardware Co-Design for Efficient Deep Learning: Sparse and Low-rank Perspectives"
Deep learning has served as the backbone technique in many intelligent systems. Considering its large-scale and complicate nature, realizing real-time energy-efficient deep learning calls for innovations on both algorithm design and hardware development. This talk will introduce our recent works towards efficient deep learning algorithm and hardware from sparse and low-rank perspectives.
Keynote Talk 2
Talk title: "Democratizing TinyML: Generalization, Standardization and Automation"
Tiny machine learning (TinyML) is a fast-growing field at the intersection of ML algorithms and low-cost embedded systems. TinyML enables on-device analysis of sensor data (vision, audio, IMU, etc.) at ultra-low-power consumption (<1mW). Processing data close to the sensor allows for an expansive new variety of always-on ML use-cases that preserve bandwidth, latency, and energy while improving responsiveness and maintaining privacy. This talk introduces the vision behind TinyML and showcases some of the interesting applications that TinyML is enabling in the field, from wildlife conservation to supporting public health initiatives. Yet, there are still numerous technical hardware and software challenges to address. Tight memory and storage constraints, MCU heterogeneity, software fragmentation and a lack of relevant large-scale datasets pose a substantial barrier to developing TinyML applications. To this end, the talk touches upon some of the research opportunities for unlocking the full potential of TinyML.
Break: 3:25pm - 3:30pm
Session 3: Emerging Device and Neuromorphic Computing
Talk title: "NeuroSim Benchmark Framework"
Talk title: "NeuroSim Benchmark Framework"
DNN+NeuroSim is an integrated framework to benchmark compute-in-memory (CIM) accelerators for deep neural network (DNN), with hierarchical design options from device-level, to circuit-level and up to algorithm-level. NeuroSim is a C++ based circuit-level macro model, which can achieve fast early-stage pre-RTL design exploration (compared to a full SPICE simulation). It takes design parameters including memory types (includes SRAM, RRAM, PCM, MRAM and FeFET), non-ideal device parameters, transistor technology nodes (from 130 nm to 7nm), memory array size, training dataset and traces to estimate the area, latency, dynamic energy, leakage power. A python wrapper is developed to interface NeuroSim with deep learning platforms Pytorch, to support flexible network topologies including VGG, DenseNet and ResNet for CIFAR/ImageNet classification. It supports weight/activation/gradient/error quantization in algorithm, and takes non-ideal properties of synaptic devices and peripheral circuits, in order to estimate training/inference accuracy. The framework is open-sourced and publicly available on GitHub https://github.com/neurosim/
Talk title: "Memristive devices and arrays for computing"
CMOS technology has been the mainstream hardware technology that enables the development of ubiquitous information technology so far. In the era of ‘big data’ and ‘Internet of Things’ nowadays, the traditional computing architecture based on CMOS hardware has become increasingly inefficient to support Artificial Intelligence (AI) and Machine Learning (ML), which necessitates some emerging technologies, such as memristive technology. Memristive devices have become one of the leading candidates for energy-efficient and high-throughput unconventional computing. I will first briefly introduce memristive devices and some recent progress. I will then present a few recent examples of using such devices experimentally for bio-inspired computing with different levels of bio-inspirations.
Talk title: "Secure and Efficient Deep Learning Computing-in-Memory, A Software and Hardware Co-Design Perspective"
In-memory computing is becoming a promising solution to overcome the well-known ‘memory-wall’ challenge, through directly processing the data within memory where data is stored. Therefore, it will reduce massive power hungry data traffic between computing and memory units, leading to significant improvement of entire system performance and energy efficiency. Many different memory technologies have been explored for the design of processing-in-memory (PIM) or in-memory computing (IMC), such as emerging post-CMOS Magnetic Random Access Memory (MRAM), Static Random Access Memory (SRAM) or Dynamic RAM (DRAM), etc. In this talk, Prof. Deliang Fan, from Arizona State University (ASU), will present his recent research in energy efficient and intelligent cross-layer processing-in-memory design for deep learning, spanning from MRAM memory device &circuit to in-memory computing architecture & algorithm co-optimization, to intrinsically integrate memory and processing units. In this talk, Dr. Fan will present the software-hardware co-design of PIM for improving the deep learning system efficiency, as well as vulnerability research against memory hardware fault injection. Please refer to https://dfan.engineering.asu.edu/ for more details about Dr. Fan’s research.