The transformation required to rectify the video is given in the form of a matrix. To calculate this matrix we need the coordinates of at least 8 correspondence pairs, in order to estimate the epipolar geometry.
The epipolar geometry, AKA Fundamental Matrix, is calculated/estimated using the 8-points-algorithm, and the result is sent to the previous phase, used in the refining of the correspondences, until they stop changing for 3 iterations.
After the final correspondences are obtained, the Transformation Matrices are calculated. These matrices represent the transformations necessary to rectify the images.
In this project we want to, for each coordinate of the final rectified image, know the coordinate to interpolate from the taken (unrectified) image. We have to calculate one different matrix for each image, independently, beeing:
H = D . [ C . T . G . R ]^(-1) . N (right image)
and
H' = D . [ C . G' . R' ]^(-1) . N (left image)
The matrices in these formulas are calculated in the following way:
N and D: Are the Normalizing and Denormalizing matrices. These matrices put the coordinates in the [-1,1] range, improving precision of the method as described in the 8-points-algorithm. In this project the cameras have a resolution of 640x480 pixels, so the matrices are:
R and G: These are the matrices with the same name described by Hartley. These matrices sends the epipoles of the images to the point at infinity in the horizontal axis. This makes the epipolar lines become horizontal and parallel between them. For their calculation we need to know the fundamental matrix, and than calculate the epipole. The images from each camera have their own epipole, so the calculation is independent. The fundamental matrix is calculated as described in the 8-points-algorithm, using SVD. The epipole is extracted from the fundamental matrix, because F.e = 0. So the SVD is applied on the fundamental matrix to calculate the epipole of the right image. The epipole of the left image is calculated in the same way but from the transpose of the fundamental matrix. With the coordinates of the epipole, the matrix R and G can be constructed as:
where f is the distance of the epipole to the origin, and (theta) is the angle the line passing through the epipole and the origin makes with the horizontal axis.
T: This is a matrix of scaling and vertical translation, that makes the epipolar lines coincide between both images. For it's calculation we need to apply the previous matrices to the original coordinates, and then finding k and d so that Y.k + d = Y'. This is the same as performing Y.k - Y' + d = 0, which is a homogeneous system, solvable by SVD. We can now construct the matrix as:
C: This is a matrix that maximizes the visibility of the common area between the images. It's very useful for stereoscopy, since only the common area can be analyzed. It performs the operation described in the next figure.
This is done by applying a matrix of scaling and translation. The scaling is the same in both orientation, maintaining the aspect-ration of the original images, and the translation is independent. In order to find the scaling and translation values, the matrices calculated so far, L, are multiplied by the coordinates of the edge's middle points, represented in red dots in the last figure:
This calculation is done for the matrices of each image: L and L', and their minimum values P1, P2, P3 and P4 are extracted. These values are the inner red dots, for each direction, in previous figure. With these points known, the matrix C is constructed as:
With all the matrices known, we calculate H from the first formula, and apply it to the videos of both cameras, in real time. This implementation is described in the next phase.