Research
Real-Time Deep Learning Framework
This initiative focuses on creating a real-time scheduling framework that manages multiple Deep Neural Network (DNN) inference tasks in AI systems, vital for areas such as autonomous driving and healthcare. It aims to tackle the critical challenges of DNNs, such as balancing between execution speed and accuracy, meeting high computational and memory demands, and dealing with limitations in computing power, memory size, and energy. The framework is designed to model DNN behavior across various resources accurately, incorporate comprehensive timing structures, and maintain transparency for users, ensuring both reliable timing and high accuracy in DNN operations.
Difficulty exists in leveraging heterogeneous computing resources (i.e., CPU, GPU) for the real-time execution of DNNs in embedded systems, due to coarse-grained resource allocation, asymmetrical DNN execution performance between CPUs and GPUs, and the absence of a schedulability-aware CPU/GPU allocation strategy.
LaLaRAND, a real-time, layer-level DNN scheduling framework that enables flexible CPU/GPU scheduling for individual DNN layers. It combines CPU-friendly quantization with fine-grained CPU/GPU allocation schemes (allocating resources per layer instead of per task), effectively mitigating accuracy loss while ensuring timing guarantees. [more details]
Real-time object detection systems, such as those used in autonomous vehicles, struggle with processing images from multiple cameras quickly and accurately due to the need for timely and precise inferences.
DNN-SAM, a framework that enhances real-time object detection by splitting DNN tasks into critical and non-critical sub-tasks, executing them independently, and then merging the results. It optimizes task scheduling based on criticality and adjusts image scale to improve accuracy and reduce latency, achieving better performance without violating timing constraints.[more details]
RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference (RTAS 2024)
GPU memory capacity bottleneck introduces significant challenges when executing complex DNN tasks on real-time systems, especially when multiple DNN tasks need to be performed simultaneously with timing guarantees.
RT-Swap is a framework that extends GPU memory capacity by efficiently swapping memory objects with CPU memory, enabling the simultaneous execution of complex DNN tasks in real-time systems.