Event-based video deblurring based on image and event feature fusion
Motion blur is one of the major factors that degrade image quality. It occurs due to various factors, such as a camera shaking or a subject moving fast during a camera’s exposure time, and improper camera settings such as shutter speed. Event-based video deblurring is a method that performs deblurring by taking the event sequence data obtained from an event camera, which is composed of bio-inspired sensors, along with blurry frames as input. Eventbased video deblurring has gained attention as a method that can overcome the limitations of conventional frame-based video deblurring. In this study, we propose a novel event-based video deblurring network based on convolution neural networks (CNNs). Unlike the existing event-based deblurring methods that only use event data, the proposed method fuses all the available information from current blurry frames, previously recovered sharp frames, and event data to deblur a video. Specifically, we propose an image and event feature fusion (IEFF) module to fuse event data with current intensity frame information. Additionally, we propose a current-frame reconstruction from previous-frame (CRP) module for acquiring a pseudo sharp frame from a previously recovered sharp frame and a fusion-based residual estimation (FRE) module, which fuses the event features with the image features of the previous sharp frame extracted from the CRP module. We demonstrate through a verification experiment using synthetic and real datasets that the proposed method has superior quantitative and qualitative results compared to state-of-the-art methods.
This work has been published in Expert Systems with Applications, 2023
Paper link: https://www.sciencedirect.com/science/article/pii/S0957417423004189?dgcid=author
Fig. 1. (a) Frame and event sequence. (b) Blurry frame. (c) Event input. (d) Frame-based video deblurring result (Pan et al., 2020). (e) Event-based video deblurring result (proposed). (f) Ground-truth. Note that, in (c), the events are overlaid in the blurry frame for better visualization (red color and blue color represent positive and negative polarities of the event, respectively.)
Event-based video deblurring is an image restoration method that utilizes event sequence data obtained from an event camera consisting of a bio-inspired sensor such as the dynamic vision sensor. Event data refers to the data that has asynchronously recorded log intensity changes as (𝑥, 𝑦, 𝑡, 𝑝) four-dimensional data. Here, 𝑥 and 𝑦 denote the spatial domain positions, 𝑡 denotes the timestamp, and 𝑝 denotes the polarity value, which indicates the direction of the log intensity change
In particular, event cameras have several advantages compared with the conventional frame cameras: (1) high temporal resolution (up to one microsecond), (2) low motion blur, and (3) high dynamic range. Hence, it is possible to overcome the limitations of frame-based vision tasks by using event data. Due to these advantages, event cameras can be used for several applications in challenging lighting conditions such as computational photography, robotics, autonomous driving, and video surveillance. Specifically, event cameras are used for moving object detection, object tracking, object recognition, and simultaneous localization and mapping (SLAM). Also, they are used for several imaging applications such as depth estimation, 3D scanning, optical flow estimation, HDR image reconstruction, and motion deblurring.
This study propose a novel event-based video deblurring network that can be trained in an end-to-end fashion. Unlike conventional event-based deblurring methods, the proposed IEFF module enhances the event feature by fusing the current intensity frame information throughout the adaptive instance normalization technique for the intensity residual estimation.
For video deblurring, we also propose an FRE module that fuses the image features of previously recovered sharp frames extracted from the CRP module and spatial attention block.
We compare and verify the proposed method in experiments using synthetic and real datasets. Through quantitative and qualitative comparisons with conventional frame-based and event-based methods, we verify that the proposed method shows outstanding deblurring performance.
Fig. 2. Overall architecture of the proposed event-based video deblurring network (EVDnet).
As shown in Fig. 2, the proposed deep learning network consists of four modules: (1) event feature extraction module, (2) fusion-based residual estimation (FRE) module, (3) current-frame reconstruction from previous-frame (CRP) module, and (4) gate fusion module.
Fig. 3. Structure of the fusion-based residual estimation (FRE) module.
Fig. 4. Structure of the image and event feature fusion (IEFF) module.
Fig. 8. Qualitative comparison results on GoPro synthetic dataset.
This research was funded by the SK Hynix, Republic of Korea