This initiative focuses on developing a real-time scheduling framework that manages multiple DNN inference tasks in AI systems, which is crucial for areas like autonomous driving and robotics. The goal is to address key challenges in DNNs, such as balancing execution speed and accuracy, meeting high computational and memory demands, and overcoming limitations in computing power, memory, and energy. The framework accurately models DNN behavior across different resources, integrates comprehensive timing structures, and maintains transparency for users, ensuring both reliable timing and high accuracy in DNN operations.
On-going Projects
Zero-Swap: Toward a Near Swapless System
My current project, Zero-Swap: Toward a Near Swapless System, focuses on analyzing the memory usage patterns of DNN tasks to optimize the swap procedure. By employing data compression techniques, we aim to minimize PCIe transfer overhead, further enhancing the efficiency of memory management in real-time AI systems.
ASTRA: neural Architecture Search for Time sensitive, Real-time ApplicationsÂ
Another ongoing project, ASTRA: Neural Architecture Search for Time-Sensitive, Real-Time Applications, addresses the question of why we rely on pre-defined DNN models when we could design models specifically tailored for real-time systems. In this project, we leverage Neural Architecture Search (NAS) to create new DNN models optimized for the unique demands of real-time environments, ensuring both efficiency and timely performance.
GPU memory capacity bottleneck introduces significant challenges when executing complex DNN tasks on real-time systems, especially when multiple DNN tasks need to be performed simultaneously with timing guarantees.
RT-Swap is a framework that extends GPU memory capacity by efficiently swapping memory objects with CPU memory, enabling the simultaneous execution of complex DNN tasks in real-time systems.
Real-time object detection systems, such as those used in autonomous vehicles, struggle with processing images from multiple cameras quickly and accurately due to the need for timely and precise inferences.
DNN-SAM, a framework that enhances real-time object detection by splitting DNN tasks into critical and non-critical sub-tasks, executing them independently, and then merging the results. It optimizes task scheduling based on criticality and adjusts image scale to improve accuracy and reduce latency, achieving better performance without violating timing constraints.[more details]
Difficulty exists in leveraging heterogeneous computing resources (i.e., CPU, GPU) for the real-time execution of DNNs in embedded systems, due to coarse-grained resource allocation, asymmetrical DNN execution performance between CPUs and GPUs, and the absence of a schedulability-aware CPU/GPU allocation strategy.
LaLaRAND, a real-time, layer-level DNN scheduling framework that enables flexible CPU/GPU scheduling for individual DNN layers. It combines CPU-friendly quantization with fine-grained CPU/GPU allocation schemes (allocating resources per layer instead of per task), effectively mitigating accuracy loss while ensuring timing guarantees. [more details]