We'll start by learning common pre-processing tasks such as, changing the color representation of an image, transforming how an image looks geometrically, and filtering images to change their appearance. We'll also go through some simple object detection techniques like: selection objects based on their unique shape or color. There are all skills that will help you extract desired and important information from any given image.
Pre-processing is about making an image or sets of images easier to analyze and process computationally.
Image Coordinate System: Digital images are stored as matrices or 2D arrays, and each index in the matrix corresponds to one pixel in the displayed image! The image coordinate system is similar to the Cartesian coordinate system you may have seen before: images are two dimensional and lie on the x-y plane. The main difference is that the origin (0, 0) is at the top left of the image. The x index increases as you move to the right, and y increases as you move downward an image. This coordinate system is pictured below, and you can find more information in Matlab documentation.
Now that we know that we can represent images as a function of space, we can also think about operating on these functions mathematically. Treating images as functions is the basis for many image processing techniques. This basically means transforming an image pixel-by-pixel. We'll get more into the math behind these transformations in the next few sections.
Now, let's get started with some of the most common image processing techniques:
Change how blurry or sharp an image appears by using filters to intensify or lessen the amount of contrast in an image. See OpenCV implementation for different types of image filters used in blurring and sharpening.
The above techniques are used to push this computer vision towards its end goal: capturing important patterns in image data and interpreting them.
Now that we know how to represent an image by its grid of pixels, let's see how to use the color information in and image to isolate a particular area. This technique is commonly used in the pipeline step: "Selecting Areas of Interest". Color Threshold is used in a wide range of applications like computer graphics and videos. A common example is a blue screen that splits the image into background and objects.
That's it. This task is simple enough. Let's see how to do this in code.
This is because in many objects, color is not needed to recognize and interpret an image. Gray scale is generally more useful for recognizing objects, and because color images contain more information than black and white images, they can add unnecessary complexity
and take up more space in memory
.
In the example below, you can see how patterns in lightness and darkness of an object (intensity) can be used to define the shape and characteristics of many objects. However, in other applications, color is important to define certain objects. Like skin cancer detection which relies heavily on the skin colors (red rashes).
This can be scaling an image, rotating it or even change how far away an object appears. See OpenCV implementation here.
To transform any 2D image, you can create a mathematical mapping that rotates, shifts, and stretches points into a desired shape. This mapping is calculated by OpenCV and applied using an 3x3 perspective transformation matrix, which you can learn more about, here. This matrix transforms the appearance of points but preserves straight lines that define an image plane.
To create and apply this transformation, we used OpenCV's getPerspectiveTransform
and warpPerspective
functions, which are documented, here.
Let's see the first exercise to transform an image of a business card.
Now, it's your turn to create a geometric transform! Geometric transforms are often useful for aligning text for better readability; from scanning important documents to reading in information about a scene in the world.
In this example, you'll be using a geometric transform to warp the appearance of a license plate on a car so that it appears as if it's being viewed from the front. As below, aim to make the license plate a perfect rectangle in the x-y plane.
Here is an example that you can use as a code starter for this exercise
Filters are commonly used in image processing to filter out unwanted information and amplify features of interest.
We have an intuition of what frequency means when it comes to sound. High-frequency is a high pitched noise, like a bird chirp or violin. And low frequency sounds are low pitch, like a deep voice or a bass drum. For sound, frequency actually refers to how fast a sound wave is moving to make a certain pitch. Faster sound waves, make higher pitches and are called high-frequency waves.
Similarly, frequency in images is a rate of change
. But, what does it mean for an image to change? Well, images change in space, and a high frequency image is one where the intensity changes a lot. And the level of brightness changes quickly from one pixel to the next. A low frequency image may be one that is relatively uniform in brightness or changes very slowly. This is easiest to see in an example.
Most images have both high-frequency and low-frequency components. In the image above, on the scarf and striped shirt, we have a high-frequency image pattern; this part changes very rapidly from one brightness to another. Higher up in this same image, we see parts of the sky and background that change very gradually, which is considered a smooth, low-frequency pattern.
High-pass filters are used to sharpen the image
and enhance high frequency parts
of an image. These are areas where the level of intensity in neighboring pixels rapidly change. Like from very dark to very light pixels.
When there is no or little change in the intensity of an area, the high-pass filter will black these areas out and turn the pixels black. But in other areas (specially edges) the filter will enhance the change and create a line. You can see that this is great for emphasizing edges. Since edges are just areas in an image where the intensity changes very quickly. And these edges often indicate object boundaries.
Here is an example of an edge detection filter look like (orange square):
Known as Convolutional Kernels or Weights because it tells you how important this pixel is.
If the value of a pixel is > 0, then this pixel is an edge and needs to be amplified.
Low-pass filters are used to reduce noise in an image by blurring or smoothing the appearance of an image which will reduce (or block) high frequency noise. The technique that these filters use, is taking a kind of average between neighboring pixels so that there are not such big jumps in intensity, especially in small areas.
An example of low-pass filter is Averaging Filter. This filter produces an image with a smoothed out image with fewer abrupt changes in intensity. In fact, this filter is often used in Photoshop to soften and blur parts of an image.
In this exercise we will use Gaussian Blur filter. This is the most frequently used low-pass filter in computer vision applications. Read about OpenCV function GaussianBlur
on this page.
As you can see from the exercise above, it is a very common technique to apply a low-pass filter (like averaging or gaussian) to smooth the image then apply a high-pass filter for edge detection.
So far, Canny operator has been considered the best edge detection algorithm. It consists of 4 basic processes:
In this exercise, we used the OpenCV function Canny
, which is well-documented, here.
Now that you have plenty of experience creating filters and using them to blur images and detect edges, Let's see how this kind of image processing can be used for image segmentation.