Post date: Mar 2, 2010 11:05:08 PM
This was the title of my master's thesis for the M.Sc. in Artificial Intelligence, Pattern Recognition and Digital Image. It was developed as part of a multi-camera 3D vision project at the Technological Institute of Computer Science as a solution to the camera calibration problems we had until then using an implementation of Tsai's algorithm.
The thesis completely reviews the mathematical concepts of the camera calibration problem proposing different calibration methods based on the Tsai camera model and Zhang's work with one target in mind: robustness. The idea behind the proposed methods is to calculate an initial closed-form transformation assuming an undistorted environment, extract its camera parameters and set them as initial values for a non-linear optimization function.
The first step can be done by means of the normalized Direct Linear Transformation (DLT) method. Given a set of n >= 6 2D to non-planar 3D point correspondences, a 4x3 homography transformation matrix can be calculated. This requires to find the minimal vector of an overdetermined matrix built from the input points, where each point introduces 2 new equations (so with at least 6 points, 2 equations x 6 points = 12 equations = 4x3 matrix values). To find this minimal vector the Singular Value Decomposition is used, as the vector associated with the smallest singular value (the eigenvector of the smallest A' * A eigenvalue, being A the matrix in question) is the desired one. This is in fact equivalent to solve a total least-squares problem.
Once the 4x3 undistorted transformation matrix has been calculated, the camera parameters are extracted applying the RQ decomposition to the 3x3 leftmost-topmost submatrix with some sign adjustments to get an upper triangular camera-to-image matrix and an orthogonal rotation matrix. This first matrix encodes all intrinsic camera parameters but radial distortion, which is left as zero, and the camera rotation from the extrinsic parameters. The camera position is then computed from the 4th column of the original matrix using the other two just calculated matrices.
In case of planar 3D points the above approach produces singularities during the Singular Value Decomposition step as the generated matrix has two less degrees of freedom (the non-planar rotations) and doesn't have enough information to compute all the required variables. In this case, the Zhang's approach must be used. As one set of 2D to planar 3D points doesn't provide mathematically enough information for a 4x3 matrix, the equations are simplified without loss of generalization to the plane z = 0. This allows the Direct Linear Transformation method to build a 3x3 homography transformation matrix. Then, multiple homographies from non-parallel views of the same 3D points can be combined to compute the camera intrinsic parameters. This is done in a very similar fashion to the DLT, as each point set provides 2 new equations to build an overdetermined matrix that is solved using the Singular Value Decomposition again.
At least 3 point sets (6 equations) are required to compute all the 6 result values required. In case of not having enough data, some constraints like 'force skew factor to zero' or 'make the optical center to be the image center' can be added to increase the matrix rank at cost of result precision. Once the intrinsic parameters have been calculated, the intrinsic camera-to-image matrix can be built and then the extrinsic parameters can be computed from one of the 3x3 homography matrices.
In any case, these camera parameters are fed then to a Levenberg-Marquardt non-linear minimization function that includes radial distortion factors (only one in this case, as experiments showed it was enough). Some error critera are proposed for minimization based on the transformation errors. Also, the thesis introduces the concept of undistorted cameras where the models are trained again with undistorted points (the input distortion is removed using the calibrated intrinsic parameters) assuming a null radial distortion. A distortion map is also built for the input images so that removing distortion from a given 2D image point requires only a table lookup. This simplifies and accelerates the world to image and image to world coordinate transformations to the linear case based in the original 4x3 homography matrix.
Another good thing about the proposed methods is that separate calibration of intrinsic and extrinsic parameters can be done. This was especially important in our project context as the cameras where installed on fixed positions and should not be moved later. All intrinsic parameter calibration can be performed before installing, trying later many different approaches to extrinsic parameter calibration.
The results obtained with the different proposed algorithms were quite good. In fact, the separate calibration methods proposed had an extraordinary robustness against input noise. This was one of the reasons of why this thesis was developed in the first place. Our Tsai's algorithm implementation was too sensitive to environmental noise in the images or the positions leading too easily to numerical unstabilities in the camera parameters. Just a simple comparison: when introducing Gaussian noise of 1 ~ 2 pixels of standard deviation the Tsai's algorithm implementation becomes completely unstable. Repeating the same testing for the best proposed methods, they remained quite stable up to noise of 11 ~ 13 pixels of standard deviation. Detailed results can be found on the thesis (this time it's written in English).
There is however one big point missing from the thesis contents. Though the title says clearly multi-view 3D vision environment no multi-view calibration methods is mentioned in the thesis: all proposed ones are per-camera calibration methods. Does this mean that multiple view information is not being used at all? Yes but no. Further research was done about multi-view extrinsic parameters adjustment with very good results. In fact the method is quite innovative and we're planning to publish a paper about it as soon as we have enough time. However, because of the thesis deadline it could not be included in the final thesis contents. It's a shame, but it's not so strange since all those contents were developed, implemented and written in about only one month and half. Before that, I had no idea of the Tsai camera model or what camera calibration was about, and my thesis subject was going to be about my speaker recognition project on the iPhone 3G.
Now, thanks to this thesis and the later research on multi-view extrinsics calibration we performed, all camera calibration problems have been solved in our project. Again, if any questions please don't hesitate to ask.