Research

Research Goal

Recently, machine learning algorithms have made remarkable progress and perform incredibly well in a variety of applications. However, to take advantage of these algorithms in reality, we face various limitations, i.e., insufficient performance, restricted power, and time limit. In particular, the current trend of exploding demand for machine learning algorithms, while the advance in hardware speed has slowed with the end of Dennard scaling, has further emphasized the importance of improving efficiency.

In this lab, we aim to find out the best way to optimize and realize the advanced machine learning algorithms in reality in order to enable people to take advantage of the outstanding performance of the algorithms in practice. A deeper understanding of computing systems and emerging algorithms opens up new potential for high efficiency. However, we try to avoid hand-crafted solutions for optimization. Instead, we often utilize machine learning algorithms to search for the best solutions among candidates. Please note that the ultimate goal is the acceleration of machine learning algorithms, but we are pleased to utilize the machine learning algorithms for acceleration.

Neural Network Quantization

When we apply quantization to the neural network, it enables us to store the data with a smaller storage footprint, fetch the data quickly within the given bandwidth, and compute the output with a smaller logic area. However, as a trade-off of the limited degree of freedom, the output quality of DNNs is degraded as the bit-width is narrowed. In order to maximize the benefit we can get, we aim to design a novel quantization scheme to reduce the quality degradation within the small bit-width. We published several notable results at the top-tier conferences (ECCV'20, IJCAI'22, ECCV'22) and are conducting diverse studies to design the best quantization algorithm. We are actively collaborating with Google, Hynix, SNU, and several start-up companies.

Neural Architecture Search

Network architecture search (NAS) is an emerging machine learning application for designing a neural network architecture automatically that shows the highest accuracy within the given restriction. Neural network design has a lot of hyper-parameters (base structure, # of channels, # of layers, etc.) that has a strong impact on efficiency. Besides, the efficiency of neural network architecture is highly dependent on the target algorithm and target hardware. Therefore we need to automate the design process based on the space exploration algorithms. We focus on applying NAS with optimization techniques and show the potential of NAS with optimization at the top-tier conference (ECCV'22). We are actively collaborating with Hynix, SNU and several start-up companies.

Optimization for ML-based Heterogeneous System Optimization

With the end of Dennard scaling, the general purpose device couldn't provide enough energy efficiency for deep learning. Recently, specialized but energy-efficient hardware, e.g., NPU or TPU, has emerged. The heterogeneous system built with a mixture of CPU, GPU, and NPU provides energy-efficient and flexible functionality. However, it is hard to utilize due to its large degree of freedom. Recently several studies have been proposed to automate the runtime optimization for heterogeneous systems via machine learning algorithms. They show strong potential but also unavoidable limitations that the cost of ML algorithms often overwhelms the benefit we can get. In this study, we aim to reduce the cost of ML-based optimization by relying on several advanced topics, i.e., continual learning, federated learning, etc. The initial study was published at the top-tier conference (CC'22), and we are focusing on extending the boundary of studying by actively collaborating with Samsung and SNU.

Geometry-based Video Optimization

Several vision applications for emerging devices, e.g., drones, autonomous driving, and AR/VR devices, often utilize a sequence of images (or video data) in order to produce continuous signals/outputs. Video inference is highly laborious because of a narrowed time window for the sequence of inferences. However, when the temporal difference is small, a sequence of video data has a high spatio-temporal correlation, and thereby we can reuse the knowledge of the adjacency frame inferenced in the previous time frame. In this study, we aim to design an energy-efficient network for video tasks based on the knowledge of 3-D geometry and motion information of the camera and object. We presented an interesting result at the top-tier conference (ICCV'21), and are actively collaborating with SNU.