We show the experiment using SA on voxel selection. There are 3 classification algorithms: SVM, LR and GNB. The strategies used to direct the SA are obtained from the combination of the following criteria:
Therefore, there will be 8 combinations in total for a choice of classifier. The SA parameters are set as follows:
% @#$% SA parameters
T = 1;
T_stop = 0.005; % the stopping temperature
alpha = 0.9;
itt_max = 5000;
Rep_T_max = 200; % max number of Rep in one temperature T
Rep_accept_max =50; % if solution accepted this many, then update T
kc = 10; % k-constant. The smaller kc --> less solution accepted @#$% user-defined
Mapping function for solution
NextSolutionX.m is the mapping function of the solution, that is,
x_new <-- NextSolutionX(x_current)
Objective function
ObjectiveFunctionX.m is the objective function that takes a solution x as an input, that is, an objective
value/score <-- ObjectiveFunctionX(x_current).
In this particular application, the objective function consists of 2 ingredients:
* (averaged over 10 runs of 10-fold)
** time per fold
My plan is to pick only a few criteria, perhaps {0, randomly shuffle, 50-50} and {0, MI-descend, 50-50}, to use with GNB and LR--just to save some times.
Another strategy to try is the "aggregate and prune", that is, in the first few rounds, we will just keep collecting a new voxel even if it does not increase the accuracy, and in the end rounds, we will start pruning the voxels that does not degrade the performance.
And once things are pretty clear, we can plan to work on the whole brain accordingly.
The table shows the experimental results using the whole brain as ROI.
* (averaged over 10 runs of 10-fold)
** time per fold
*** use SA with 5000 iterations instead, and use the pre-calculated MI in order to save the time. Ideally, MI needs to be calculated for every fold on the training data set only, however, there are >43000 features + 100 experiments to run, which it would take too long to run. So, I will have to calculate MI for all the data (both train and test) once and for all to save the time. In fact, the MI calculated from the training set (9 folds) alone and the whole, (9-fold) training + (1-fold) test set, would be pretty much the same, so my approximation would not change the order of the feature that much as there is only a fold for testing.