## Focus Measure Operators

This page presents additional information of the test sets, experimental results and source code of the paper "Analysis of focus measure operators for shape-from-focus" [Pertuz et al., 2013].

## Working principles of Focus measure operators

The aim of focus measure operators is assessing the degree of sharpness or degree of focus of an image or image pixel. As illustrated below, the image on the right was captured by focusing the camera on the background whereas the image on the left was focused on the foreground object. This information is relevant for multiple applications, such as autofocusing, image enhancement, and focus-based depth estimation, among others.

A wide variety of algorithms and operators have been proposed in the literature to measure the degree of focus of either a whole image or an image pixel depending on the application. In order to facilitate an exposition of the working principles of the focus measure operators studied in this chapter, they have been grouped into six broad families: gradient-based, Laplacian-based, wavelet-based, statistics-based, DCT-based and miscellaneous operators. This section presents a brief description of each family. Notice that some of the operators studied were originally devised for autofocusing applications tailored to measuring the focus level of a whole image region. Therefore, they had to be extended and adapted in order to allow a pixel-wise focus measure. In addition, in order to keep a global perspective on the concepts behind the operators and facilitate the comprehension, the operators are not presented individually but within the scope of the corresponding family. A detailed exhaustive description of each focus operator, as well as their parameters and implementation details can be found in [Pertuz et al. 2013]. The implementation of all the focus measure operators described below can be found in our download page.

**Gradient-based operators:** This family groups the focus measure operators based on the gradient or approximations of the first derivatives of the image. These algorithms follow the assumption that focused images present sharper edges than blurred ones. Thus, the energy of the gradient can be exploited in order to estimate the degree of focus. This principle is exploited by the widely known Canny edge detector [Canny, 1986]. The operators based on this principle are expected to work properly as long as the imaged scene is highly-textured. However, it is important to remark that this is a common restriction for most focus measure operators.

In the frequency domain, the gradient operator can be interpreted as a high-pass filtering of the image. On the one hand, this provides a sensitive response to defocus, which in turn corresponds to a low-pass filtering. On the other hand, a well-known issue is the noise sensitivity of gradient-based schemes, specially at small scales [Bergholm, 1987].

**Laplacian-based operators:** Similarly to the previous family, the goal of these operators is to measure the amount of edges present in the images, although through the second derivative or Laplacian. The image Laplacian is also a widely-known basic image processing tool used for edge detection and image enhancement [Torre & Poggio, 1986; Gonzalez & Woods, 2008]. Its downside is its increased sensitivity to noise as compared to the image gradient [Haralick, 1984].

**Wavelet-based operators:** The wavelet decomposition of an image can be interpreted as a simultaneous frequency and scale-space analysis [Mallat, 1989]. The figure below illustrates the computation of the 1st-level discrete wavelet transform (DWT) coefficients by means of the \textit{two-channel filter bank} scheme [Strang & Nguyen, 1996]. In this scheme, the image is decomposed into 4 sub-images by means of a high-pass filter, G_H, and a low-pass filter, G_L, which operates on the source image row-wise and column-wise alternately. This yields three detail sub-bands that emphasize the horizontal variations (W_{LH}), the vertical variations (W_{HL}) and the diagonal variations (W_{HH}), and a coarse approximation image (W_{LL}). In order to keep the number of total pixels constant, the computation of the wavelet coefficients implies downsampling in order to halve the size of the coefficient sub-bands (not shown in the figure below). This process can be further repeated on the coarse approximation image in order to add more levels to the decomposition.

Notice that, by following a similar reasoning as in previous families, the energy of the detail sub-bands can be used for estimating the degree of focus of an image, since they are related to the highest frequencies of the image [Gopinath et al., 1994]. From a spatial-domain perspective, the wavelet transform can be interpreted as a multi-resolution representation of an image. This fact makes it suitable for addressing the support window issue in the application of the focus measure operators (that is, the problem of selecting an appropriate support window size). This fact has been exploited not only for focus measurement but also for focus stacking [Forster et al., 2004; Wang et al., 2003] and image compression for the JPEG2000 standard [Taubman & Marcelin, 2002].

**Statistics-based operators:** In the spatial domain, the effect of defocus can be assessed from its effects on the textures of the imaged scene. In turn, statistical operators have proven to be quite successful as texture descriptors [Petrou & Sevilla, 2006]. Intuitively, a defocused image can be interpreted as a texture whose smoothness increases for increasing levels of defocus. In real imaging conditions, in the presence of different noise sources, statistical moments such as the variance and Chebishev moments, the energy of the principal components, etc, are robust texture descriptors. In fact, interpreting the image as a noisy 2D statistical process has been exploited in image restoration through inverse filtering, Wiener restoration and image denoising, among others [Berriel et al., 1983; Pratt, 2007].

**DCT-based operators:** The discrete cosine transform (DCT) can be interpreted as an alternative to the Fourier transform for the representation of signals in the frequency domain. One of its main characteristics is its ability to pack most of the information of the input signal in the lowest coefficients of the transform. This characteristic is referred to as the energy compaction property. Thus, Reininger & Gibson (1983) empirically showed that the distribution of the DCT coefficients follows the Laplace distribution. Subsequently, this fact has been demonstrated theoretically by Lam & Goodman (2000). The compaction property can be exploited for achieving lossy compression, for instance, of image and video signals [Wallace, 1992; Sikora, 1997]. In the space-domain, the DCT coefficients can be interpreted as an estimator of the image sharpness. For instance, as noted by Baina & Dublet (1995), the sum of the AC components of the DCT is equal to the variance of the image intensity and can, therefore, be used as a focus measure. Being part of many popular video and image formats, the main motivation for using the DCT for focus measure has been the reduced cost of its computation. Notwithstanding, in these formats, the DCT coefficients are typically computed in fixed support windows of 8x8 pixels. As a result, DCT-based operators have mostly been applied to autofocusing.

**Miscellaneous operators:** This family groups operators that do not belong to any of the previous five groups. The operators in this group are based on different concepts, such as the image contrast, local binary patterns and steerable filters, among others.

## Synthetic and real test datasets

**Synthetic datasets:** synthetic focus sequences were generated by mapping different textures to synthetic surfaces located between 0.075m and 0.125m from the lens. The focus sequences were generated by moving the in-focus position between 0.05m and 0.2m from the lens. The parameters of the lens used for simulation were: focal length = 3e-3 [m]; aperture number N = 1.6; and pixel density k = 0.17e6 [pixel/m].

**Real datasets:** real defocus sequences were captured using an Logitech Orbit AF webcam and a Sony RZ50P camera. The minimum and maximum distances for the Logitech camera are 14mm and 63mm, respectively. All the test objects are carefully placed within this distance range at known positions. The camera performs a focus sweep between 11.9 and 81.0mm approximately. As for the Sony camera, the minimum and maximum distances are 3.84m and 4.20m, respectively. The camera performs a focus sweep between 3.10m and 5.81m approrimately. The exact in-focus range varyies for each scene in order to cover a distance range of 1.88m approximately around the imaged objects. All image sequences are publicly available from our download page.

## Experimental remarks

- The relative performance of the focus measure operators depends on the capturing device, the imaging conditions and the captured scene. Therefore, an absolute ranking of focus measure operators is rather unfeasable.
- Focus measure operators respond to different image factors (such as noise, contrast, and saturation) according to their working principle.
- Image gradient-based and Laplacian-based operators are the most sensitive to a reduction of the size of the evaluation window, whereas wavelet-based operators are the most robust to this factor.
- Laplacian-based operators are the most sensitive to image noise, whereas image statistics-based operators are the most robust to this factor.
- Operators based on different principles respond similarly to image contrast and saturation.

## References

Baina, J. and Dublet, J. (1995). Automatic focus and iris control for video cameras. In proc. International Conference on Image Processing and its Application, pp 232–235.

Bergholm, F. (1987). Edge focusing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(6):726 –741.

Berriel, L. R., Bescos, J., and Santisteban, A. (1983). Image restoration for a defocused optical system. Applied Optics, 22(18):2772–2780.

Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679–698.

Forster, B., Van De Ville, D., Berent, J., Sage, D., and Unser, M. (2004). Complex wavelets for extended depth-of-field: A new method for the fusion of multichannel microscopy images. Microscopy Research and Technique, 65(1-2):33–42.

Gonzalez, R. C. and Woods, R. E. (2008). Digital Image Processing. Prentice Hall, 3rd edition.

Gopinath, R., Odegard, J., and Burrus, C. (1994). Optimal wavelet representation of signals and the wavelet sampling theorem. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 41(4):262 –277.

Haralick, R. M. (1984). Digital step edges from zero crossing of second directional derivatives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1):58 –68.

Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674 –693.

Pertuz, S., Puig, D. and Garcia, M. A. (2011), Analysis of focus measure operators in shape-from-focus, *Pattern Recognition*, 46(5), pp. 1415-1432.

Petrou, M. and Sevilla, P. G. (2006). Image Processing, Dealing with Texture. John Willey & Sons.

Pratt, W. K. (2007). Digital Image processing: PISK scientific inside. John Willey & Sons, 4th edition.

Reininger, R. and Gibson, J. (1983). Distributions of the two-dimensional DCT coefficients for images. IEEE Transactions on Communications, 31(6):835 – 839.

Strang, G. and Nguyen, T. (1996). Wavelets and filter banks. Wellesley-Cambridge Press, 2nd edition.

Taubman, D. S. and Marcellin, M. W. (2002). JPEG 2000: image compression fundamentals, standards and practice. Kluwer Academic Publishers.

Torre, V. and Poggio, T. A. (1986). On edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(2):147 –163.