Lin Wang, YuJeon Chae*, Sunghoon Yoon*, Taekyun Kim, Kuk-Jin Yoon

*Equal Contribution

Abstract

Event cameras sense per-pixel intensity changes andproduce asynchronous event streams with high dynamic range and less motion blur, showing advantages over the conventional cameras. A hurdle of training event-based models is the lack of large qualitative labeled data. Prior works learning end-tasks mostly rely on labeled or pseudo-labeled datasets obtained from the active pixel sensor (APS)frames; however, such datasets’ quality is far from rival-ing those based on the canonical images. In this paper, we propose a novel approach, called EvDistill, to learn a student network on the unlabeled and unpaired event data (target modality) via knowledge distillation (KD) from a teacher network trained with large-scale, labeled image data (source modality). To enable KD across the un-paired modalities, we first propose a bidirectional modality reconstruction (BMR) module to bridge both modalities and simultaneously exploit them to distill knowledge via the crafted pairs, causing no extra computation in the inference. The BMR is improved by the end-tasks and KD losses in an end-to-end manner. Second, we leverage the structural similarities of both modalities and adapt the knowledge by matching their distributions. Moreover, as most prior feature KD methods are uni-modality and less applicable to our problem, we propose an affinity graph KD loss to boost the distillation. Our extensive experiments on semantic segmentation and object recognition demonstrate that EvDis-till achieves significantly better results than the prior works and KD with only events and APS frames.

Experimental results

The effectiveness of our method is validated on two tasks: semantic segmentation and object recognition

Semantic segmentation

DDD 17 dataset

General scene

DDD 17 dataset

HDR scene

MVSEC dataset

General scene

MVSEC dataset

Night scene

E2VID dataset

General scene

Object classification

N-Caltech dataset

Event to image translation

N-Caltech dataset

image to event translation

How KD affects Modality reconstruction?

KD loss improves the semantic recovery from events

Paper: click

Validation Code and trained models: Click

If you find this research helpful, please cite the paper as follows:

@inproceedings{wang2021evdistill,

title={EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation},

author={Wang, Lin and Chae, Yujeong and Yoon, Sung-Hoon and Kim, Tae-Kyun and Yoon, Kuk-Jin},

booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

year={2021}

}