실시간 카메라 추적 기술
복셀 기반 장면 표현을 이용한 실시간 카메라 추적 기술
1. 연구목적
증강현실, 자율 주행, 로보트 어플리케이션 등의 다양한 분야에서 실시간 카메라 포즈 추정(추적) 연구가 활발하게 연구되고 있다. 본 연구에서는 두 개의 Convolutional neural networks (CNNs)을 이용한 2D-3D 매칭 기반의 카메라 포즈 추정 방법이 제안된다. 3D 공간은 복셀로 표현하며, 해당 복셀 내에 존재하는 3D 점들에 RANSCA 기반의 평면 fitting을 적용하여 인라이어 점들을 결정한다. 영상에서 복셀 영역과 해당 2D 점들을 검출 및 기술하는 딥러닝 네트워크를 이용하여 정확한 카메라 포즈를 구한다.
2. 연구내용
(1) 연구요약
The ability to estimate the pose of a six degree of freedom (6-DoF) camera is important in augmented reality, autonomous navigation, and robotics applications. We introduce a camera pose estimation method based on a 2D-3D matching scheme with two convolutional neural networks (CNNs). The scene is divided into voxels, whose size and number are computed according to the scene volume and the number of 3D points. We extract inlier points from the 3D point set in a voxel using random sample consensus (RANSAC)-based plane fitting to obtain a set of interest points consisting of a major plane. These points are subsequently reprojected onto the image using the ground truth camera pose, following which a polygonal region is identified in each voxel using the convex hull. We designed a training dataset for 2D–3D matching, consisting of inlier 3D points, correspondence across image pairs, and the voxel regions in the image. We trained the hierarchical learning structure with two CNNs on the dataset architecture to detect the voxel regions and obtain the location/description of the interest points. Following successful 2D–3D matching, the camera pose was estimated using n-point pose solver in RANSAC. The experiment results show that our method can estimate the camera pose more precisely than previous end-to-end estimators.
(2) 연구결과
S. Lee, H. Hong, and C. Eem. "Voxel-Based Scene Representation for Camera Pose Estimation of a Single RGB Image," Applied Sciences, 10(24), 8866, (2020). (SCIE)