Mathematical background

Fig. 5.1

In this section, we discuss the relevant mathematical background for our image processing goals.

Hilbert space and Euclidean distance

One of the foundations in this class is the concept of a Hilbert space. A Hilbert space is a vector space that is equipped with an inner product rule and the "completeness" property. Inside a Hilbert space, we can determine how similar two elements are by calculating the distance between them. Although the RGB color space is not a Hilbert space (it is not closed under addition, so it is not a vector space in the first place), the concept of the Euclidean distance in a Hilbert space can be carried over to the RGB color space. A slightly modified version of this concept (the "redmean" approximation) is used in the color detection algorithm.

Fig. 5.2

Change of basis

Information can be represented in different ways. A basis for a vector space is a set of linearly independent vectors in that space that span the entire space, which can be used to represent any arbitrary vector in the space. Changing the representation of a vector from one space to another does not modify the fundamental information represented by the vector. In the context of image processing, an image can be represented in different color spaces. Two of the commonly used color spaces are the RGB (red, green, and blue) space and the YIQ (luminance and chrominance) space. When converting a colored image to a grayscale image, as in the first step of the stage of chessboard segmentation in our chessboard coordinates determination algorithm, the function we use (rgb2gray()) essentially first converts the image from the RGB space to the YIQ space and then eliminates the chrominance components while retaining the luminance component.

Fig. 5.3

Commutativity of systems

One important property of linear, time-invariant (LTI) systems is commutativity. The commutativity property means that regardless of the order that we arrange two LTI systems, the net input-output relation is the same. This concept relates to our project because two of the shape-based filtering operations we have repeatedly used throughout the development of our algorithms are called "erosion" and "dilation." These two operations are most commonly used together as opposed to individually, but they turn out not to be commutative (see the diagrams below). Therefore, we always need to pay careful attention to the correct order that these two operations are applied to our images.

Fig. 5.4

Fig. 5.5

When we perform the erosion operation followed by the dilation operation, the overall operation is called "morphological opening," whose net effect is to remove the small foreground regions surrounded by the background region. See the stage of chessboard segmentation in our chessboard coordinates determination algorithm for a concrete example.

In contrast, when we perform the dilation operation followed by the erosion operation, the overall operation is called "morphological closing," whose net effect is to remove the small background regions surrounded by the foreground region. See the stage of segmentation of connected squares in our chessboard coordinates determination algorithm for a concrete example.

For a more detailed explanation of morphological operations, see Types of Morphological Operations - MATLAB & Simulink (mathworks.com).

Hough transform

One of the important image processing tools that are not covered in class is the Hough transform. The Hough transform is a feature extraction technique that can be used to detect arbitrary shapes (most commonly straight lines, circles, and ellipses) in an image using a voting procedure. The heart of our circle detection algorithm is the imfindcircles() function from MATLAB's Image Processing Toolbox, which uses the theories of circular Hough transforms to achieve consistently high performance even in the presence of noise, occlusion, and varying illumination. The basic procedures for estimating the centers of the circles in an image using the circular Hough transform are as follows:

1) we define the "parameter space" (also known as the "accumulator space") of our input image

2) we find the high-gradient foreground pixels (also known as the "edge pixels") of the input image

3) we have each of those edge pixels cast "votes" in the parameter space

4) the positions the local maxima (also known as "accumulator points") in the parameter space are the estimated centers of the circles in the input image

Fig. 5.6

Windowing/masking

We have seen in this class that window signals can be used to, for example, obtain a finite impulse response filter from an infinite impulse response filter. The basic idea is that a window signal has finite duration by definition, so when we multiply it by a signal of infinite duration, the resulting signal is also going to have finite duration. This concept can be generalized to the context of image processing. One of the most common techniques that we have employed in the project is to create a binary image (which only has values of zeros and ones) from the original input image and then multiply it (elementwise) by the original input image. The resulting image is a "masked version" of the input image that only has nonzero values where the binary image has nonzero values. The binary image in this case can thus be viewed as a generalized window signal. See the last step of the stage of chessboard segmentation in our chessboard coordinates determination algorithm for a concrete example.

Fig. 5.7

Edge detection

In one of the image-processing homework problems in this class, we had the opportunity to develop an edge detection algorithm using Sobel operators. The algorithm uses two 3 x 3 matrices and convolve them with the input image. In the first step of the stage of separation of connected squares in our chessboard coordinates determination algorithm, we use precisely this Sobel edge detector to obtain the boundary pixels of the inner portion of the chessboard.

In our own research, however, we have found that the Canny edge detector could be especially effective when working with chessboard images. Indeed, in the first step of the stage of segmentation of connected squares in our chessboard coordinates determination algorithm, we choose the Canny edge detector to obtain the edges of the segmented chessboard image.

Fig. 5.8