Investigation in the area of image processing raised significantly in the 90s, and with it the necessity for stereo images rectification appeared. Various authors, like Richard Hartley or Andrea Fusiello, propose some methods and give good mathematical bases for images rectification.
What is Stereo Images Rectification?
It's a tranformation of each image such that pairs of conjugate epipolar lines become collinear and parallel to the horizontal axis.
In most cases, two cameras pointing at the same direction - such as in a stereoscopic kit - are misaligned, which means their CMOS (or other) sensors aren't coplanar. The next figure illustrates this situation.
The illustrated cameras are both pointing to object X, although from different positions. The blue shadow represents the camera sensor.
The process of rectification of these images consists of transforming the coordinates of the images, in order to simulate the environment represented in the next figure.
In this figure, the "virtual" cameras are coplanar, and horizontally aligned.
What is it for?
Stereo vision uses triangulation based on epipolar geometry to determine distance to an object.
Between two cameras there is a problem of finding a corresponding point or object viewed by one camera in the image of the other camera (correspondence problem). In most camera configurations, finding correspondences requires a search in two dimensions. This searching for corresponding points becomes much simpler for the case of rectified images: to find the point corresponding to an object in the other image, we just need to look along an horizontal line, at the same Y coordinate as the object. This means that disparities between the images are in the x-direction only (no y disparity).
The next figure, from Wikipedia, represents both cases (of unrectified and rectified images) and the search area required in each case.
As we can see, the search for a point, characteristic or object in a pair of rectified images is significantly simpler, because is done in a narrow horizontal area.
As you will see in the following steps, the rectification process isn't always perfect, but even so it's very useful: the more precise the rectification is, the smaller the search area can be.
How is it done?
The process of rectification is normally divided in two main phases:
- Calculation of the required transformation;
- Application of the transformation to the images.
There are various methods to calculate the required transformation, and the choice between them depends on some aspects. The most general case is the one implemented in this project, where every parameter is corrected, except for lens distortion. This method is advised for random camera placement, with objects in view relatively near the camera. This is important so the images contains valuable spatial information. This fact is because the method is based on epipolar geometry, which is only possible to calculate if the objects filmed aren't coplanar. An example of a set, on which this method shouldn't be applied, is satellite photography (google maps, etc.). The objects (house, etc) are practically coplanar, forming the plane corresponding to the earth surface.
For the calculation of the required transformation based on epipolar geometry, we need to find a set of correspondences between the images, and than preform some matrix calculations.
For the application of the transformations to both videos of the cameras, an FPGA-based bi-linear interpolation was implemented.