Essentials of Image Processing

Image Enhancement

Histogram equalization will produce an image with a uniform distribution as its histogram. A uniform distribution has a flat histogram, equal values for all DN values.
Gaussian stretch on the other hand will create an image with normal distribution as its output histogram.
After convolution, images need to be contrast stretched for display

Edge Enhancement

Edges are enhanced with high pass filters. The filtered edge image is added (entirely or a fractional values) to the original image. The output has the low frequency components of original but with edges enhanced and visible prominently.

Directional first differencing is adopted.

Horizontal 1^st diff = DN_A – DN_H (vertical edges are pronounced)

Vertical 1^st diff = DN_A – DN_V (horizontal edges are pronounced)

Diagonal 1^st diff = DN_A – DN_D (orthogonal diagonal edges are pronounced)

the differenced image is scaled for display purposes. This is the edge image. Experimentally, two adding to diagonally filtered (perpendicular to each other) images contains all the edges.

Fourier analysis

The peaks and valleys along a row / column is described mathematically by a combination of sine and cosine waves of varying amplitude, frequency and phase. A Fourier transform will create a frequency domain image with low frequency components of original image in the centre and high frequency components in the periphery.

Inverse Fourier transform will bring back the original image. Lots of filtering, noise removal are performed in the frequency domain and using the IFT the effect is seen in the original image.

Spectral Filtering

Spectral filters or band ratios are illumination invariant. Sometimes due to topography, the features in shadows have a proportional decrease in brightness. Performing a ratio between bands, this decrease is cancelled. Thus in ratio, same features have same values no matter whether they are sunlit or in shadow. Example in the following table.

Note: ratioing will work only for effects that are multiplicative (such as illumination conditions). For effects that are additive (such as path radiance) a direct subtraction of offset prior to ratioing has to be applied.

Principal Components

A scatter plot of two bands is made. The scatter takes shape of an ellipse (in most cases as bands are generally correlated). A set of new axes is created one along the major axis of ellipse and other orthogonal to it. The origin is at the mean of the plot.

BV_PC1 = a₀BV_A + a₁BV_BBV_PC2 = b₀BV_A + b₁BV_B

Coefficients a0, a1, b0, b1 are eigen vectors or principal components. Some people perform classification on PC images.

Canonical component analysis

Also called as “multiple discriminant analysis” are used when some details about the image are known. Only the pixels of interest to the analyst are considered and the axis is chosen to maximize separability between the classes while minimizing the variance between them.

Image classification

The following techniques can be employed individually or together to classify images.

Spectral pattern classification – widely used, based purely on DN numbers of pixels.
Spatial pattern classification – using texture, pixel proximity, feature size, shape, directionality, repetition, context.
Temporal pattern classification – using multitemporal data.

Supervised classification

Minimum distance to mean

The distance of a pixel to the each mean of training class pixels is computed. The pixel is assigned to the class whose mean is at closet distance from the pixel. (Distance is measured in feature space – scatter plot). Disadvantage: if there is a large class, a pixel at its periphery is farther apart from it than to a different smaller class. This leads to misclassification

Parallelopiped classification

To avoid misclassification due to large classes, rectangular bounds are drawn around each class. All those pixels that fall within the bounds are assigned to that class. Error arose when classes were elliptical and tilted, the rectangular bounds of the class occupied far too much space leading to misclassification.

Gaussian Maximum Likelihood Classifier

It is assumed that the histogram of training site pixels for a class would take a normal distribution. Then together, all classes can be represented by a mean vector (1D matrix) and a variance - covariance matrix (representing each class variance and covariance with other classes). Then the statistical probability density function (PDF) for each class is computed. Using this, ellipsoidal contour lines are drawn around each cluster mean. The closer the ring is to the mean, greater is the probability of a pixel to belong to that class.

Now for any given pixel, the probability of it to belong to a particular class is computed by reading at which probability contour it lies. It is assigned to that class with the highest probability contour passing through the pixel.

Bayesian classifier

This is an extension of Gaussian MLC, but uses apriori probability of a class and the cost of misclassification of a class.

Disadvantage: with more number of bands, the dimensions of covariance matrix, PDF increases. Some people run MLC on PCA for that matter.

Training stage

All spectral classes constituting each information class must be adequately modeled. The theoretical lower limit for the number of pixels per class is (n+1) where ‘n’ is the number of bands. For any class given, it is better to have 20 locations with 40 pixels each, rather than having 1 location with 800 pixels.

Training site refinement process

In the case of MLC, the histograms of training sets in all bands are observed. It should be a normal distribution. Then coincident spectral plots for all classes together in each band is drawn. This has DN in the Y axis and class number on the X axis. Mean and 2 SD value is plotted for each class. Here overlap between classes is observed. Since in the previous step it is made sure that each class conforms to normal distribution, a value of mean + or – 2SD represents 95% of a class.

Further measures like Transformed Divergence (TD) – a covariance weighted distance between class means and Jeffries Matusita(JM) are used to describe quantitatively the separablity between classes. TD has max value 2000 meaning highly separable and a value less than 1500 means poorly separable. JM has max value of 1414.

Unsupervised Classification

Clustering algorithms determine the natural spectral grouping in a dataset.

K-means

Few arbitrary seed points are selected and made as mean of clusters. Each pixel is allotted to one of these clusters whose mean value is closest. When all the pixels are allotted, the mean of each of the cluster is recomputed. Then the process of pixel assignment is repeated. This continues until there is no significant change between cluster means in successive iterations or until a specified number of iterations is achieved.

ISODATA – Iterative Self Organizing Data Analysis.

This follows the general process of K-means, but after each iteration, if the distance between means of 2 clusters is less than threshold they are merged. On the other hand, if a cluster has a large SD, it is split. Clusters with fewer than a specified number of pixels are deleted.

Mixed pixel classification

SMA – Spectral Mixture Analysis

This is a deterministic model rather than a statistical one. Many land cover types tend to occur as heterogeneous mixtures even when viewed with a high FOV. So SMA is a more realistic representation. Individual spectral signatures are assumed to be mixed linearly. Weight of an endmember is the proportion of area occupied by the class associated with that endmember. Thus sum of fractions of all endmembers in a pixel is equal to 1, given by

The DN of a pixel in a band λ is

The DN of each endmember is multiplied by the fractional abundance of that endmember plus an unknown error term at the end. Now for each band, we get a new equation so totally for B bands, we have (B+1) equation including the first equation. If ‘n’ the number of endmember is equal to the number of bands, all the coefficients (fractions) from F1 to Fn is determined, but the error term cannot be. If n < B+1, error term can be found. But if n > B+1, we don’t get a unique solution. The result of SMA is abundance images, one for each endmember. Brighter tones represent greater prevalence in that pixel.

Post Classification Smoothing.

Salt and pepper noise with island polygons of other classes has to be smoothed. A ‘majority filter’ is used.

Classification Accuracy Assessment.

Error matrix or confusion or contingency matrix is drawn. The classification performed is checked with a standard (field verification). The results of classification algorithm is presented in columns while that of field truth is in rows.

Values along the diagonal represent correctly classified pixels (where classification result conforms with ground truth). Pink shaded values along the second column represent those number of pixels allotted to classes other than C1 by the algorithm, but present only in C1 in the ground truth data. Since these pixels are omitted, they represent omission error. On the other hand, green shaded cells represent pixels wrongly allotted to C1 by the algorithm, but present in different classes in ground truth data. Since these pixels are included, they represented commission error.

Test areas – these are similar to training sites, but larger in number. When all the pixels in the image has a ground truth, it is called “Wall to wall” approach, but this defeats the objective of remote sensing. So other techniques are adopted. Random sampling – involves random selection of pixels are performing field verification (but practically access to certain pixels may not be possible).

Sometimes we may check if the classification was no more than a random allotment of pixels using the formula

Where r = number of rows / columns in confusion matrix, x_iirepresents diagonal elements. x_i*represents row sum and x_i#represents column sum. N is the total number of pixels in the image. Khat can vary from 0 to 1 representing purely random to perfect classification.

Multitemporal data merging.

Involves arrangement of data from different periods of acquisition. Generally anniversary dates (yearly) are preferred to avoid sun angle and seasonal variations. As a requirement, accurate coregistration (1/4^th to ½ of a pixel) is followed.

Change detection procedures.

Post classification comparison – detect changes in pixel class assignment.
PCA of multidate, multiband data. The uncorrelated bands represent changes. But the nature of change cannot be ascertained.
Temporal image differencing – subtracting one image from another. The result is scaled for display.
Temporal image ratioing – when there is no change, value is 1. Ratio of band ratios of two dates is made. The advantage is the difference due to sun angle is compensated and would not reflect as a change.
Thus as a general rule, it is better to radiometrically correct data (inclusive of atmospheric effects) to radiance or reflectance to avoid error due to extraneous effects prior to change detection.
Change vector analysis – plot with DN on the Y axis and dates on the X axis. This is drawn for every pixel. Every value pair is connected by a vector. The magnitude and direction is used to determine whether a change has occurred in a pixel or not.
Change Vs no change binary mask – classify time 1 image; coregister both date images; do either ratioing or differencing of the bands; using a custom threshold extract zones of change; use this as a mask and classify only the changed zones in the second image; do post classification comparison between two date images.
Delta transformation – scatter plot of both time images is made. The pixels that have not changed would align along 45 deg due to high correlation. This cluster is called ‘stable spectral space ellipse’. Clusters other than this represent changes. Like in PCA, the axes are rotated and now we can define a threshold and represent the pixels according to the magnitude of change.

Hyperspectral Image Analysis

Hyperspectral bands can be used to compute total atmospheric column water vapour content.

Comparison of spectra

Comparing individual wavelength specific absorption feature. Example a band at trough and two bands on either side outside the trough is selected. Difference between trough and interpolation between the bracket bands gives the depth of absorption feature. The strength of absorption in both spectra is compared. Drawbacks – high sensitivity to noise and inability to deal with multiple adjacent or overlapping absorption features.
Spectrum band ratioing. If ratio in a band is close to 1, it’s a match in that band. Again, we need to correct for sun angle and other illumination effects.
SAM – is illumination invariant.
Spectrum derivative analysis – advantage is to locate and identify subtle spectral features. Disadvantage is extreme sensitivity to noise. So common practice is to use ‘Savitsky – Golay’ smoothing filter before differentiating the spectra.

Biophysical modeling

3 approaches are followed

Physical modeling – account mathematically for all known parameters affecting the radiometric character of RS data (Earth sun distance, solar elevation, atmospheric effect, sensor gain, offset, viewing geometry)
Empirical modeling – quantitative relation between RS data and ground truth is arrived at using statistical regression methods.
Biophysical modeling – is a combination of the above two
Environmental modeling – a recent concept – simulate processes present in an environmental system and to predict and understand their behavior under altered conditions.

Page updated

Google Sites

Report abuse