In Defense of Knowledge Distillation for Task Incremental Learning

and its Application in 3D Object Detection

Peng YUN, Ming LIU

HKUST, RAM-LAB

Submit to RAL-ICRA2021

[Supplementary Materials] [Abbreviation Table][Code][Data][Weights]

Abstract

Making robots learn skills incrementally is an efficient way to design real intelligent agents. To achieve this, researchers adopt knowledge distillation to transfer old-task knowledge from old models to new ones.

However, when the length of the task sequence increases, the effectiveness of knowledge distillation to prevent models from forgetting old-task knowledge degrades, which we call the long-sequence effectiveness degradation (LED) problem.

In this paper, we analyze the LED problem in the task-incremental-learning setting, and attribute it to the inevitable data distribution differences among tasks.

To address this problem, we propose to correct the knowledge distillation for task incremental learning with a Bayesian approach. It additionally maximizes the posterior probability related to the data distributions of all seen tasks.

To demonstrate its effectiveness, we further apply our proposed corrected knowledge distillation to 3D object detection. The comparison between the results of increment-at-once and increment-in-sequence experiments proves that our proposed method solves the LED problem. Besides, it reaches the upper-bound performance in the TIL experiments on the KITTI dataset.

Quantitative results (KITTI dataset)

Quantitative results (NuScenes dataset)