Sparse samples in superpixel (3S) for GMM image segmentation: A very simple image segmentation

Post date: Aug 26, 2011 3:30:22 AM

Empirically I found that pixel-level Gaussian mixture model does pretty good job on image segmentation (given good features are extracted from the image) in terms of both quality and run-time. However, a main drawback of the method is the resulting segmentation is noisy. Nevertheless, by observation I found that the noisy segments are usually formed by only a single pixel to a few pixels which can be eliminated simply by filtering out using some moving-average-like filter.

One way to create such a filter is to use a majority vote scheme within a homogeneous continuous region namely superpixel. Every pixel in a superpixel is guaranteed to share the similar common properties [cite some superpixel papers here], hence implies that the majority of pixels in that particular superpixel can represent well the class label of the superpixel. A good question to ask is then

What if we make a segmentation base on only some of the pixels in the image, then propagate the class labels to others using majority vote scheme?

In this experiment, I provided some answer to the question above. The aims of this experiment are

Test majority vote scheme by using the entire pixels in the image. At the end we found that the resulting segmentation is very satisfactory.
Next, we randomly sample only some pixels from a superpixel then compare the result with the experiment in (1).

The image features used in this experiment are of 11 dimension which is composed of

generalized RGB (gRGB)
standardized CIELuv (sLuv)
generalized and standardized CIELab (gsLab)
standardized location of each superpixel's centroid (sX and sY)

The algorithm is described as follows:

Apply superpixel algorithm on an input image
Randomly pick some pixels from each superpixel; let's call them the candidate pixels.
Extract a feature vector from each candidate pixel and put all the feature vectors in the feature space.
Apply segmentation algorithm, e.g., GMM and GMM+BIC, on those candidates in the feature space.
Assign the most likely class/label to each of them
Back in the image plane, apply majority vote scheme to pick a winner class among the candidates' classes in each superpixel
Assign the winner class to all the pixel within the superpixel

Experiment#1: Test majority vote scheme by using the entire pixels in the image

We overlay the superpixel-boundary on the segmentation result using GMM segmentation with all the pixel in the image, and see how well the resulting segmentation fits the natural boundary of the image. Here are some results. Notice from the results that the noisy segmentation due to the pixel-level segmentation is substantially suppressed by the majority vote scheme, and the results look promising.

The results also imply the choice of number of class labels does matter to the segmentation result. It is not quite right an understanding that having more class would yield a better result. Compare the results from using 6-class and 7-class segmentation in Table1 above.

Table1: Segmentation results using different number of classes

6-class

7-class

Experiment#2: Test majority vote scheme by randomly picking only some of the pixels in each superpixel

In this experiment, we simply use 10% of pixels within a superpixel to segment in feature space, then propagate the 'winner-voted' class to other pixels in each superpixel. The number of classes is fixed at 7 for all images. The results are shown in Table2, which can be interpreted that using more pixels does not guarantee better segmentation result, on the other hand, it seems that the one using 10% of pixels slightly outperforms the one using 100%. Also note the significant difference in the run-time. The overall results show the idea is promising.

Table2: Segmentation using different percentage of pixel in a superpixel.

It is not necessary that the segmentation using more pixel performs better than the one using less. This is an example where the experiment#2 outperforms experiment#1.

In this example, the pixel-level segmentation alone is very noisy, but much nicer after the majority-vote scheme is applied.

run-time

Experiment#1 (100%)

Experiment#2 (10%)

180 - 270 sec per 3 reps/image

11 - 21 sec per 3 reps/image

Experiment#3: The choice of number of classes

So far the user has to determine the number of classes a priori. How can the algorithm pick a proper number of classes automatically? Here I propose two simple ways:

Adding BIC on top of GMM (GMM+BIC)
Using Variational Bayes GMM

In general, I found that given the same number of classes at the end of the process, GMM outperforms VBGMM in terms of the segmentation accuracy. Furthermore, the number of classes learned by GMM+BIC is acceptable and can be tuned by weighting the penalty term in the BIC differently. The figures in Table3 below show the result using VBGMM with the intial cluster C=5, but in the end the number of the classes is reduced to 3.

Table3: Compare the quality of segmentation between VBGMM and GMM at the same number of classes.

3-class VBGMM

3-class GMM

Therefore, from this point forward we prefer using GMM+BIC to VBGMM.

Experiment#4: GMM+BIC for image segmentation

The preliminary results show that the BIC objective function alone biases to over-segmentation result, that is, the complexity penalty term is too small with respect to the likelihood terms. So, I added a coefficient alpha to the complexity penalty term. By setting # point = 10%, alpha = 10, the GMM+BIC gives sensible number of clusters around 3-6 as depicted in Table4. Note that the bigger alpha is, the fewer number of the class we will get at the end.

Table4: The segmentation results in which the number of the classes are obtained automatically using GMM+BIC with the initial cluster number in the range of 2-7.

I also run the GMM+BIC on Berkeley BSDS500 dataset using the initial classes number from 4 to 7, alpha = 7, some sample results are illustrated in the Table5.

Table5: The GMM+BIC segmentation. Note the optimal class number for each image is different.

Future work

Note that we only use the majority vote in each superpixel on top of the simple pixel-level image segmentation, and the results look fine already. However, there are a few concerns:

What should be the sample size? I believe that this depends on the size of superpixel, but Can we find such a relationship between sample size and the superpixel size? [waiting for experiment]
The sample size, in fact, would affect the VBGMM performance. Imagine clustering result using 1 tenth of the image would give different result than using the whole image. Can we find where on the image affected the most? I guess that the noisy parts, so many classes mixed in the region, would be affected the most. [waiting for experiment]
[waiting for experiment] We should compare the segmentation result from majority_vote vs sampled_majority_vote with different reduction factor, say 1%, 5%, 10% --- 100% vs time and Rand index. So that we can pick the best factor. But for now, let's use 10%.
Next step is to use this as the initial value for ITSBN. -- [done] The algorithm is implemented in ITSBN version 4.5 already, and the results look good. Please refer to the post.