Using supervoxels in fMRI classification

Post date: May 19, 2012 1:16:42 AM

We can use PRI, spectral clustering, whatever that gives good oversegmented clusters.

In this test, at first I thought I would use principle of relevance information (PRI) to generate the supervoxels in the brain, however, my implementation of PRI is too slow, and I need to get a rough result ASAP so as to check whether the idea of using supervoxels is any good. I realize that a supervoxel would contain only a few voxels, so any clustering algorithm would suffice. We just set the clustering algorithm to over-segment the data such that each cluster (supervoxel) contains only 1-5 voxels with similar response. So I use spectral clustering code (by Andrew Ng) as the supervoxel generator in this experiment.

The experiment is conducted on the Haxby's data set containing 80 observations from 8-class stimuli. A feature vector fed to the spectral clustering comprises 80 real-valued responses and normalized (x,y,z) locations of each feature (i.e., voxel in this scenario). Each dimension is normalized by z-score (i.e., mean zero with 1 standard deviation). The objectives are:

  1. Group voxels with similar response together into a supervoxel.
  2. Voxels in a supervoxel should be in close vicinity, best when they are contiguous in 3D space.

I found that some voxels are very similar, but far from each other. However, in this case we would weight more on the vicinity, that is, even though the voxels are similar, but if they are far from each other, they will be put in the different supervoxel, which focusing more on capturing the local similarity than global.

Advantages of having supervoxels?

  • Reduce the number of voxels to use in mvpa
  • Capture more group information, e.g., variance of voxel values within a supervoxel, mean, spin image? polynomial kernel?
  • Supervoxels can be a good foundation for other machine learning algorithm

Disadvantages?

  • Takes some time to process supervoxel

Experiment1: Supervoxels produced from spectral clustering

In this experiment, we will show how to use spectral clustering to produce the supervoxels.

desired # of supervoxels = 20

set weight for (x,y,z) = 1, 5, 10

weight for each response = 1

The code can be downloaded from here.

We compare the cluster obtained from applying 3 different weight parameters, w=1, 5 and 10

It takes only 5 sec, to run 20 clusters.

#cluster

20

20

20

weight for response

1

1

1

weights for location x, y, z

1

5

10

description

The algorithm groups voxels with similarity in both response and location. However, you can see that the supervoxel regions are NOT contiguous.

Therefore, we add more weight, w=5, to the location (x,y,z) and obtain more contiguous region of supervoxel.

Now we use w=10, and the cluster really focuses on the xyz location as the scatterness of the points are relatively low compared to ones using w=1 and 5. I think the results when using w=5 and 10, are only slightly different. Perhaps using w=5 is better than w=10.

resulting cluster

spectral clustering: #cluster:20, w_xyx:1
spectral clustering: #cluster:20, w_xyz:5
spectral clustering: #cluster:20, w_xyx:10

Experiment2: Results from using supervoxels vs voxels

Following the previous experiment, we will use weight for xyz=5, # of cluster = 300, which takes 5 minutes to run in MATLAB. The experiment has its procedures as follows:

  1. We calculate the mutual information for each feature with the class label vector . The features are then ranked according to their corresponding MI in the descending order.
  2. Feed the feature matrix to SVM by accumulating the feature (i.e., voxels, supervoxels) one at a time until all the features are used.
  3. Report the accuracy curve vs the supervoxel used.

Accuracy of voxel vs supervoxel as features. This plot shows the accuracy when the number of features (not voxels) used in classification is equal.

Blue: The accuracy curve when using supervoxels as features. The best accuracy for supervoxel-based feature is 95%.

Red: The accuracy curve when using voxels as features. The order of voxels fed to the classifier is based on the voxel-MI. The best accuracy for voxel-based feature is 95% too.

Accuracy of voxel vs supervoxel as features. The plot is made such that the number of voxels used for classification is comparable. For instance, 20 supervoxels might contain 60 voxels (i.e., approximately 3 voxels/supervoxel)

Blue: The accuracy curve when using supervoxels as features similar to the previous plot except that we extend the curve such that the number of voxels used in the classifier is equal to the red curve. The best accuracy for supervoxel-based feature is 95%.

Summary: With the same number of either features or voxels, they both perform competitively.

Blue: The accuracy curve when using supervoxels as features. The best accuracy for supervoxel-based feature is 95%.

In the magenta curve, the features are fed into the classifier in the supervoxel order, however, we classify the observation by using voxels (not supervoxel) as features.

Summary: When using the same set of voxels, the accuracy of using supervoxels as features is better.

Clean up the code a bit. done

The MATLAB codes are available:

runSVMNFoldAccumFeatures_voxel.m

runSVMNFoldAccumFeatures_supvoxel.m

runSVMNFoldAccumFeatures_voxel_correspond_supvoxel_allatonce.m

runSVMNFoldAccumFeatures_voxel_correspond_supvoxel_oneatatime.m

plotCompareAccuracyCurve.m

Question to ask is

  1. what is the optimal number of cluster to use N=50, 100, 200, 300 -- how is the performance in each case?
  2. What would be a criteria to rank? As the MI is not always a good measure. Or we can use accuracy-based ranking. We should have
    1. Eliminate uniform-response voxels
    2. Pick all-different-response voxels (classification accuracy based measure.)
  3. How to test the contiguousness of the supervoxel
  4. Can we make each supervoxel more interpretable by use some anatomical-region-based supervoxel.