In Defense of Knowledge Distillation for Task Incremental Learning
and its Application in 3D Object Detection
Peng YUN, Ming LIU
HKUST, RAM-LAB
Submit to RAL-ICRA2021
[Supplementary Materials] [Abbreviation Table][Code][Data][Weights]
Abstract
Abstract
Making robots learn skills incrementally is an efficient way to design real intelligent agents. To achieve this, researchers adopt knowledge distillation to transfer old-task knowledge from old models to new ones.
Making robots learn skills incrementally is an efficient way to design real intelligent agents. To achieve this, researchers adopt knowledge distillation to transfer old-task knowledge from old models to new ones.
However, when the length of the task sequence increases, the effectiveness of knowledge distillation to prevent models from forgetting old-task knowledge degrades, which we call the long-sequence effectiveness degradation (LED) problem.
However, when the length of the task sequence increases, the effectiveness of knowledge distillation to prevent models from forgetting old-task knowledge degrades, which we call the long-sequence effectiveness degradation (LED) problem.
In this paper, we analyze the LED problem in the task-incremental-learning setting, and attribute it to the inevitable data distribution differences among tasks.
In this paper, we analyze the LED problem in the task-incremental-learning setting, and attribute it to the inevitable data distribution differences among tasks.
To address this problem, we propose to correct the knowledge distillation for task incremental learning with a Bayesian approach. It additionally maximizes the posterior probability related to the data distributions of all seen tasks.
To address this problem, we propose to correct the knowledge distillation for task incremental learning with a Bayesian approach. It additionally maximizes the posterior probability related to the data distributions of all seen tasks.
To demonstrate its effectiveness, we further apply our proposed corrected knowledge distillation to 3D object detection. The comparison between the results of increment-at-once and increment-in-sequence experiments proves that our proposed method solves the LED problem. Besides, it reaches the upper-bound performance in the TIL experiments on the KITTI dataset.
To demonstrate its effectiveness, we further apply our proposed corrected knowledge distillation to 3D object detection. The comparison between the results of increment-at-once and increment-in-sequence experiments proves that our proposed method solves the LED problem. Besides, it reaches the upper-bound performance in the TIL experiments on the KITTI dataset.
Quantitative results (KITTI dataset)
Quantitative results (KITTI dataset)
Quantitative results (NuScenes dataset)
Quantitative results (NuScenes dataset)