Calibration and Synchronization

Deep Learning-based Time Synchronization and Calibration between LiDAR and Camera

Abstract: This article discusses how to implement time synchronization and spatial calibration between LiDAR and camera under the framework of deep learning.

1. Introduction

On the perception platform of multi-sensors for autonomous driving, data-level or task-level (such as tracking and detection) fusion is critical because of sensor settings’ advantages. However, the prerequisite of sensor fusion is time synchronization and unified coordinate system, the former means the trigger time instant to capture data for different sensors must be the same, and the latter determining the transform between different sensors ’coordinate system, i. e. calibration.

The traditional method of time synchronization mostly is based on hardware clock control of trigger time and transmission delay[1]. Meanwhile, the traditional calibration method captures sensor data in a 3-d control field with accurate structure measurements and calculate the relationship between sensors based on matching of data from different sensors, such as LiDAR and camera[2].

Recently some researchers started working on calibration with deep learning, disturbing the aligned sensor data to generate mis-calibrated training data[3][4], such as RegNet[3] and CalibNet[4]. Difference between RegNet and CalibNet lies in the loss function: RegNet’s loss consists of mismatching between RGB image feature map and sparse depth feature map and Euclidean loss of de-calibration matrix, while CalibNet’s loss include photometric loss term of depth map and 3-D point distance loss term calculated by difference between the target point cloud and predicted point cloud inversely projected from predicted depth map based on calibration parameters.

Usami et al. proposed a time synchronization method between LiDAR and camera using scene flow estimation[5]. It calculates the scene flow first and then compensates the captured point cloud which generates the synchronized point cloud at camera’s data capture time instant. Unfortunately this method is too simple, which assumes a clean homogenous background and some moving point target and separates background from moving targets first before scene flow estimation.

So far, we don’t find any work working on time synchronization of LiDAR and camera using machine learning or deep learning methods.

In this following session, we will introduce a deep learning method to realize joint time synchronization and spatial calibration between LiDAR and camera sensors.

2. Joint calibration and time synchronization by deep learning

In this framework, we want to use deep learning to realize simultaneously time synchronization and calibration of LiDAR and camera on vehicle. It runs in two stages: 1) when the vehicle is not moving and the scene is static, do calibration only; 2) when the vehicle is moving and the scene is dynamic (note: it cannot identify time sync status when the scene is static), do joint calibration and synchronization.

First, we have to note that, LiDAR frame rate is 10fps normally, and camera frame rate is mostly 30fps. To make it simple in problem clarification, it is assumed the camera frame rate is 20fps. Shown in Figure 1, given 4 consecutive images as I_t-1, I_t, I_t+1, I_t+2, and 2 consecutive LiDAR point cloud data L_s, L_s+1, assuming T_t-1<T_s<T_t, to solve the sync problem in data capture of LiDAR and camera, we interpolate point cloud data at time T_t and T_t+1. The synthesized LiDAR data L_t at T_t is kind of time “synchronization” of camera image I_t (depicted as finner dash arrow), while synthesized LiDAR data L_t+1 at T_t+1 is sort of frame “interpolation” of LiDAR (depicted as rougher dash arrow), which doubles the LiDAR frame rate to be equal to camera frame rate.

Figure 1.

The following sessions will introduce our framework of calibration and time sync in two stages.

2.1 Calibration

On the first stage, we assume the scene is static and implement calibration only, show in Figure 2, which framework is similar to CalibNet[4].

Figure 2.

Module “Perspective Projection” projects LiDAR point cloud to the camera image plane based on initial calibration parameters and the result is the sparse depth map. Then depth map and RGB image is encoded in “Encoder” to generate feature map respectively, concatenated later as input of Module “PoseNet”. The output of PoseNet is Calibration parameters. Module “Depth map Inverse projection“ will convert depth map to point cloud which is used to calculate a loss term in model training.

2.2 Joint Calibration and Sync

On the second stage, we have to require the scene is dynamic, and the system diagram of joint time sync and calibration is shown in Figure 3.

Figure 3.

It is obvious that the online calibration module at the right upper corner (in a dash rectangle) follows the first stage, only the input depth map comes from the time synchronized result, not the original one, and this calibration is kind of adjustment of the offline result (on the first stage) which is done once the time sync is finished.

The input LiDAR point cloud also generates sparse depth map via perspective projection based on initial calibration parameters. The camera image will estimate bidirectional optic flow in FlowNet (4 consecutive images for 3 pairs of optic flow maps), then the module “Flow Interp” will interpolate the optic flow at the LiDAR data capture time T_s and T_s+1. The generated bidirectional optic flow between LiDAR and camera data capture time will be used to warp the original depth map to obtain synthesized depth map at the camera data capture time. However, this time the interpolated depth maps are still rough, then they will together with bidirectional optic flow, original RGB image and original depth maps enter module “InterpolationNet” to get better synthesized ones. These time sync depth map will be transformed to time sync point cloud in module “Calibration” too.

All of deep learning network architectures used are the encoder-decoder style. Below are details for each module. First, shown in Figure 4, three pairs of consecutive images input the Flow Estimation module and output three bidirectional flow maps. The network can use FlowNet [6] as reference.

Figure 4.

The optic flow interpolation use linear method, similar to Super SlowMo[7]: assume estimated bidirectional optic flow are

then we get

with

Extended flow maps are derived as

Note: the accuracy are not required highly, just rough estimate of motion for depth map interpolation, as following

where V() is the occlusion function in optic flow estimation, and g(M, L) is warping function of L based on motion M.

Figure 5 shows the diagram of Interpolation Network for depth map, divided into two steps, rough interpolation network and fine interpolation network, which idea is similar to PLIN[8].

Figure 5.

In the fine interpolation network, the RGB image at the interpolated time instant T_t and T_t+1 can be the guide to refine the rough depth map from the rough interpolation network. Since RGB image is dense, so it is possible to transfer sparse depth map into dense depth map. It is an option for use in the system. Eventually the interpolated depth map can be the time sync result for Calibration online.

The loss function for the whole system refers to loss terms defined in CalibNet[4], FlowNet[6] and PLIN[8].

3. Summary

This proposed framework provides a software implementation method for joint time synchronization and calibration for LiDAR and camera sensors. It is based on deep learning to realize optic flow estimation, depth map interpolation and calibration parameter estimation.

References

1. S Schneider et al,“Fusing Vision and LIDAR — Synchronization, Correction and Occlusion Reasoning ”,IEEE IV,2010

2. R Ishikawa et al., “LiDAR and Camera Calibration using Motion Estimated by Sensor Fusion Odometry”, arXiv 1804.05178, 2018.

3. Nick Schneider et. al, “RegNet: Multimodal Sensor Registration Using Deep Neural Networks”, arXiv 1707.03167, 2017.

4. G Iyer et al.,“CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks”, arXiv 1803.08181, 2018

5. H Usami et al,“Synchronizing 3D point cloud from 3D scene flow estimation with 3D Lidar and RGB camera”,ISAT Electronic Imaging,3D Image Processing, Measurement (3DIPM), and Applications 2018

6. P Fischer et al.,“FlowNet: Learning Optical Flow with Convolutional Networks”, ICCV 2015

7. H Jiang et al.,“Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation”, arXiv 1712.00080, 2017

8. H Liu et al,“PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation”,arXiv 1909.07137,2019.9