The tutorial aims to educate the research community on emerging challenges and solutions at automating the complicated design process of deep learning algorithms and hardware, and bring together researchers, educators, and practitioners who are interested in automating the co-design of DNNs and hardware for deployment in resource-constrained devices.
Recent breakthroughs of deep neural networks (DNNs) across various artificial intelligence (AI) applications have fueled an increasing research interest in designing efficient DNNs, aiming to bring powerful yet power-hungry DNNs into resource-constrained edge devices. Among them, hardware-aware Neural Architecture Search (HW-NAS) has emerged as one of the most promising techniques thanks to its amazing performance and flexible design procedures. HW-NAS automates the process of designing efficient DNN structure for different applications and devices based on the feedback of specified hardware-cost metrics (e.g., latency or energy). Despite the promising performance, developing optimal HW-NAS solutions is prohibitively challenging owing to computationally expensive training, and often requires cross-disciplinary knowledge in the algorithm, micro-architecture, and device-specific compilation, posing a barrier-to-entry for most developers. On the other hand, from the hardware side, designing DNN accelerators is also non-trivial. It takes a large team of hardware experts months or even years to design a single DNN accelerator because of the numerous design choices spanning dataflows, processing elements, memory hierarchy, etc.. Furthermore, it has been recently recognized that optimal DNN accelerators require a joint consideration for three different yet coupled aspects: the network structure, network precision, and their accelerators. Merely exploring a subset of these aspects leads to suboptimal hardware efficiency or task accuracy. However, the direction of jointly designing or searching for all three aspects has only been slightly touched on. Moreover, there have also rarely been systematic works studying different hardwares’ distinct performance/energy/area tradeoffs on different applications to provide useful insights.
Model-Accelerator Co-design Series[ISCA 2022, HPCA 2022, HPCA 2023, GitHub: 1, 2, 3]
We innovate algorithm and hardware accelerator co-design techniques to reduce the latency, energy, and area size of deep neural networks (DNNs).
In terms of the algorithms, we leverage model compression methods to trim down different levels of redundancy, e.g., depth, layer, MAC and more.
In terms of hardware, we design dedicated accelerators to cooperate with the algorithm for reducing irregular sparse accesses or data movements.
Such algorithm-hardware co-design strategies help to push forward the frontier of accuracy-efficiency tradeoffs in extensive applications, e.g., eye tracking in AR/VR, autonomous driving, etc.
An automated framework that jointly searches for the Networks, Bitwidths, and Accelerators
Efficiently localize the optimal design within the proposed huge joint design space for each target dataset and accelerator specification.
Auto-NBA generated networks and accelerators consistently outperform state-of-the-art designs in terms of search time, task accuracy, and accelerator efficiency.
HW-NAS-Bench [ICLR 2021 Spotlight, GitHub]:
The first public HW-NAS dataset aiming to democratize HW-NAS research to non-hardware experts, and to facilitate a unified benchmark comprising SOTA NAS search spaces of NAS-Bench-201 and FBNet, to make HW-NAS research more reproducible and accessible.
HW-NAS-Bench enhances the above search spaces by providing measured/estimated hardware-cost of all the 46, 875 (NAS-Bench-201) and 1021 (FBNet) architectures on six hardware devices spanning all three categories (i.e., commercial edge devices, FPGA, and ASIC), which are primarily targeted by HW-NAS work.
We demonstrate exemplary use cases to show that HW-NAS-Bench allows non-hardware experts to perform HW-NAS by simply querying our pre-measured dataset and verifying that dedicated device-specific HW-NAS can indeed lead to optimal accuracy-cost trade-offs.
DNN-Chip Predictor [ICASSP 2020, GitHub]:
An analytical tool which accurately predicts various performance metrics of DNN accelerators such as energy, throughput, and latency to help facilitate the fast design space exploration and optimization before actual ASIC/FPGA implementation.
DNN-Chip Predictor enables fast and effective DNN accelerator development and is validated using different DNN models and accelerator designs (i.e., architectures, dataflows, etc).
Video Recording: https://www.dropbox.com/s/3n83eyit4a660tj/GMT20230618-123348_Recording.cutfile.20230628180530729_1790x956.mp4?dl=0