Low cost HD Map-based Localization

A Low Cost Vehicle Localization System based on HD Map

Abstract: This article proposes a HD map-based vehicle localization method for autonomous driving, where the low cost sensors, i.e. cameras, consumer level GPS and IMU, are used to match the online sensor data with the HD map, consisting of a semantic layer and a visual layer. The visual layer is used for unstructured road and visual key points are used for vehicle pose estimation. The semantic layer works for vehicle localization in a structured road, in which landmarks such as lanes, road markings, traffic signs and lights are extracted to match the same objects stored in HD map.

1. Introduction

Localization in autonomous driving will decide the vehicle location, and the HD map-based localization is the main function for HD map itself. Localization needs the perception module to provide data to match with information from the map, so the sensor fusion idea is also workable for localization as perception, i.e. combination of GPS/IMU, IMU, camera, radar, LiDAR and HD map[10].

GPS (global positioning system) is the traditional method for localization, but the propagation of electromagnetic waves through the air will cause timing error due to some factors, as weather, atmosphere, high buildings and hills etc., then resulting in the distance estimation error. To compensate for this kind of errors, differential GPS (such as RTK) is proposed.

There are two localization devices used for vehicles too, odometry and IMU. Odometry installs wheel encoder to record the wheel movement and estimate the vehicle location, and IMU uses accelerators and gyroscopes for localization. Odometry errors come from wheel slipping, and IMU suffers from integration errors. A normal way for localization is, to combine GPS, odometry and IMU to realize the dead reckoning.

LiDAR data can be used for localization, either point cloud or reflectivity map. The direct matching with point cloud with the typical ICP (iterative closest point) algorithm is computationally expensive, so some fast algorithms are proposed, such as NDT (normal distribution transform)[1]. Histogram filters [2] and particle filters [4] have been used for reflectivity map-based matching. Some companies try data compression to reduce the storage cost in HD map for LiDAR-based localization, such as TomTom RoadDNA, Civil Maps Fingerprint Map, with the performance compromise.

Camera-based localization is well known low cost solution, like VO (visual odometry) [6, 9] or visual SLAM (simultaneous localization and mapping) [7]. Some image-based localization methods [3] are proposed, used for loop closure detection in SLAM and recovery from tracking in VO. The VO and SLAM methods could be sparse, dense and semi-dense types, where the sparse method is the most suitable for vehicle localization [7]. If the IMU device is involved, the visual inertial method could work more stable[12, 16]. Installed with GPS, the localization can have good initialization value [5].

Visual localization based on feature matching still look weak, so semantic maps are considered, where the lanes/road markings/traffic signs/lights are semantic objects stored in the low cost HD map [8, 11]. To some extent, the localization drift errors from visual only methods could be corrected by landmarks-based matchings.

How to utilize the low cost sensors, such as cameras, consumer level GPS and IMU, for HD map-based vehicle localization will greatly effectuate the widespread use of low cost HD maps in autonomous driving.

Popescu et al. proposed a localization method for vehicles at intersections [6]. A extended digital map (EDM) with accurate lane locations at the intersections is used where a data alignment algorithm is called to overlay the landmarks stored in the EDM with the detected landmarks. GPS can assist the camera to determine the lane on which the ego vehicle is driving. A Bayes filter is formulated to realize the whole localization process. IMU is not used yet, no visual layers in the EDM. Only intersections are discussed for localization.

Uber research work in [5] uses the LiDAR, camera, GPS and IMU for low cost HD Map-based vehicle localization. The HD map stores semantic objects as lanes and signs, and the map matching is done at the bird eye view (BEV) where the LiDAR provides spatial transform information. A Bayes filter fuse all the sensor information for the vehicle localization. Apparently this localization system is not cheap, and the transform of traffic signs from the image to the bird eye view could be difficult without the LiDAR’s help. No visual layer considered for unstructured roads.

In this article, we propose a HD map-based low cost vehicle localization system. Both structured and unstructured roads are handled, the former employs the semantic layer and the latter the visual layer of the HD map. The sensors, i.e. camera and consumer level GPS and IMU, capture in real time the data and run visual odometry/visual inertial odometry, lane detection, road marking detection, traffic sign/light detection and HD map matching. The multiple sensor fusion for localization is formulated in a particle filter framework.

2. Low Cost Vehicle Localization based on HD map

Shown in Figure 1, we propose a localization system with the input as GPS, IMU and camera data, as well as the given HD map. If the road is structured, lanes/road markings/traffic signs/lights are detected for HD map matchings. Since the landmarks like lanes and road markings are on the road, we run inverse projection mapping (IPM) to the detected results and match them with the HD map. For the landmarks standing upright, like traffic signs and RGB lights, we project the corresponding landmarks onto the frontal camera’s image plane for matching.

Figure 1.

If the road is unstructured, there are no semantic landmarks found from the image, we run visual odometry or visual-inertial odometry along with GPS/IMU input instead, for HD map matching. Finally, all those matching are fused together in a Bayesian particle filter framework to generate the continuous optimized localization results.

2.1 Lane detection

The lane detection is kind of pixel-level partial segmentation, here Spatial CNN (SCNN) [16] is referred for use. The lanes could be dashed or solid, single or double solid, while or yellow. The lanes are represented by straight line segments or curve segments (formulated by B-spline, for example).

2.2 Road markings detection and segmentation

Road markings such as turn arrow, characters (such as speed limit, ‘ONLY’, ‘TURN’,‘SCHOOL’,’KEEP CLEAR’ , ‘EXIT’, ‘NO PARKING’ etc.), can be extracted by segmentation, similar to session 2.1, PSP-Net [15] is suggested. Those shapes of characters and arrows could be represented by straight lines and curve segments as well.

Detection and recognition work is done in [21]. Utilizing the geometric constraints as vanishing point, extensive work is reported in [22].

2.3 Traffic sign and RGB light detection

As a 2D object detection task, we still refer to some fast one-stage detection method, such as YOLOv3[17] and SSD[23]. The traffic signs’ shapes could be rectangles, circles, triangles, polygons and diamonds. The corners and masks for those signs can be features for map-based localization. Joint detection and classification work of traffic sign is given in [24].

2.4 Visual odometry/Visual inertial odometry

For unstructured road, visual features are used for localization along with GPS and IMU. GPS can provide initial guess for vehicle location or prediction of road loop closure ahead of time. Without IMU, visual odometry (VO) [8,11] is run alone. With IMU’s assistance, visual inertial odometry (VIO) [14,18] is called to get the better result. Note: for the structured road, VO/VIO is avoided.

2.5 HD map matching

The HD map includes two layers, semantic and visual one. For visual layer, we match the key points’ reconstructed 3D point clouds with the detected key points from the camera based on PnP (perspective n points). For semantic layer, we need the detected landmarks to match with the stored ones in the map.

Figure 2.

Shown in Figure 2, it is seen that the landmarks for the structured road can be divided into two groups for map-matching. One is lanes and road markings, based on camera calibration and flat road plane assumption, we run IPM [19] to map the detected landmarks to the road plane on which to match with the HD map’s respective elements directly (given the vehicle location and orientation on the road to decide the bird eye view of the HD map); the other is traffic signs and RGB lights, usually standing upright, so via the calibration matrix and camera pose, we map the corresponding HD map elements to the frontal camera image plane on which to match with the detected landmarks.

Lane matching usually applies for lateral localization, while the traffic signs/lights instead works for longitudinal localization mostly. The road markings contribute both laterally and longitudinally.

2.6 Sensor fusion in a particle filter

Particle filter [4, 20] is a sequential Monte Carlo (SMC) method under the Bayesian framework. Its idea is to use samples (particles) to approximate a nonlinear distribution for its propagation rather than approximate a Gaussian distribution, such as linear Kalman filter or EKF. The weight for each particle is computed from the likelihood, for this case, i.e. the matching score with the HD map.

The initial guess of the particle filter state (vehicle pose) is obtained from GPS and IMU, otherwise we can run camera-based motion estimation and pose estimation. The state transition model could be the constant velocity motion, then the particles are generated around the predicted pose according to a simple distribution (such as Gaussian or uniform). For each particle, HD map matching errors are calculated to get its weight value. Then the final pose is calculated as the average of weighted summation.

Assume the vehicle pose is 3 DOF (degree of freedom), i.e. 2D on-road position and heading angle. The distribution to generate the particles around the predicted vehicle pose is Gaussian, where the heading angle is ±5°, and the position is ±10 meters in both horizontal and vertical directions.

The weight of particle x at time instant t can be formulated:

where c is the normalization constant,p_GPS is the GPS term(GPS error-based likelihood),p_IMU is the IMU term(IMU error-based likelihood)and pmap is the map term. The map term p_map is calculated for structured and unstructured road respectively as

with p_lane as the lane term(lateral lane displacement error-based likelihood),p_roadsign as the road markings term(road marking control points error-based likelihood),p_trafficsign as traffic sign term(traffic sign control points error-based likelihood),p_trafficlight as traffic light term(RGB traffic light rectangle control points error-based likelihood),and p_VO or p_VIO as the odometry term(PnP-based visual key points error likelihood).

Finally, the vehicle pose estimation is computed as the average weighted summation of particles:

Before the particle propagation for the next time instant, sampling importance resampling (SIR) is called.

3. Summary

This article discusses HD map-based low cost localization for different road types, structured and unstructured. For structured road, semantic layer in the map is used for matching the detected landmarks in the image captured from the frontal camera. For unstructured road, visual layer in the map is queried to match the visual features extracted from the image based on visual odometry or visual inertial odometry. Along with the GPS and IMU data, the camera image data is fused into a particle filter framework for localization.

Reference

1. P. Biber et al., “The normal distributions transform: A new approach to laser scan matching,” IEEE/RSJ ICIRS,2003.

2. J. Levinson, M. Montemerlo, and S. Thrun. “Map-based precision vehicle localization in urban environments”. In RSS, 2007

3. M. Cummins and P. Newman. “Fab-map: Probabilistic localization and mapping in the space of appearance”. IJRR, 2008.

4. J. Levinson and S. Thrun. “Robust vehicle localization in urban environments using probabilistic maps”. IEEE ICRA, 2010

5. W Ma et al,“Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization”,arXiv 1908.03274, 2019

6. V Popescu et al,“Lane Identification and Ego-Vehicle Accurate Global Positioning in Intersections”,IEEE IV, 2011

7. Z Tao et al,“Mapping and localization using GPS, lane markings and proprioceptive sensors”,IEEE IROS,2013

8. C. Forster, M. Pizzoli, and D. Scaramuzza, “SVO: Fast semi-direct monocular visual odometry,” IEEE ICRA., May 2014

9. R. Mur-Artal, J. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE Trans. Robot., 2015.

10. X. Qu, B. Soheilian, and N. Paparoditis. “Vehicle localization using mono-camera and geo-referenced traffic signs”. In IVS. IEEE, 2015.

11. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE T-PAMI, 2017.

12. J K Suhr et al.,“Sensor Fusion-Based Low-Cost Vehicle Localization System for Complex Urban Environments”,IEEE T-ITS,2017

13. Y Lu et al.,“Monocular Localization in Urban Environments using Road Markings”, IEEE IV, 2017

14. T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Robot., 2018.

15. H. Zhao et al. “Pyramid scene parsing network”. IEEE CVPR, 2017.

16. Xi Pan, J Shi, P Luo, X Wang, X Tang. “Spatial As Deep: Spatial CNN for Traffic Scene Understanding”, AAAI 2018

17. J Redmon, A Farhadi,“YOLOv3: An Incremental Improvement”,arXiv 1804.02767,2018

18. T. Schneider et al,“maplab: An Open Framework for Research in Visual-inertial Mapping and Localization”,IEEE Robotics and Automation Letters,2018

19. A Ranganathan,D Ilstrup, T Wu,“Light-weight Localization for Vehicles using Road Markings“, IEEE IROS, 2013

20. W Lu et al,“Lane Marking Based Vehicle Localization Using Particle Filter and Multi-Kernel Estimation”,ICARV 2014

21. O Bailo et al.,“Robust Road Marking Detection and Recognition Using Density-Based Grouping and Machine Learning Techniques”, IEEE WACV, 2017

22. S Lee et al.,“VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition”, arXiv 1710.06288, Oct. 2017

23. Z Zhu et al., “Traffic-Sign Detection and Classification in the Wild”, IEEE CVPR 2016.

24. J Mueller and K Dietmayer,“Detecting Traffic Lights by Single Shot Detection”, arXiv 1805.02523, Oct. 2018

25. C Yu et al.,“DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments”, arXiv 1809.08379, 2018

26. M Bloesch et al. “Robust Visual Inertial Odometry Using a Direct EKF-Based Approach”, IEEE IROS, 2015