Here are some results from using L1- and L2-norm with logistic regression on lexical data
code
demo3.m
function
The can apply ridge, lasso and elastic-net regularization with LR. The optimal parameters lambda and alpha are optimally selected for the whole dataset one one single time. That is, the parameters are shared across all runs.
The datamatrix is stored in the dir:
/NAS_II/Projects/MVPA_Language/lexical/${subj}/stats_images_fmri
each with the name:
${data}_sexemplar_beta_tstat_matrix_${mask}.mat
where
${data} = 'animtool', 'mamnonmam', 'allmamnonmam'
${mask} = 'lh_mask_vtc2', 'mask_vtc2', 'gray_mask2'
Rawdata --> whole scan datamatrix --> task-specific data matrix --> sweep-parameter classification results --> best-parameter classification result for each subject
Raw data --->
some code, I don't remember
---> whole scan datamatrix --->
/mnt/home/kittipat/Dropbox/random_matlab_code/fMRI/lexical_project/rdm/
make_datamatrix_allmamnonmam.m
make_datamatrix_animtool.m
make_datamatrix_mamnonmam.m
---> task-specific data matrix --->
/mnt/home/kittipat/Dropbox/random_matlab_code/fMRI/lexical_project/lr_regularization/
experiments_animtool_4fold.m
experiments_mamnonmam_4fold.m
experiments_animtool_lou.m
experiments_mamnonmam_lou.m
---> task-specific data matrix --->
/mnt/home/kittipat/Dropbox/random_matlab_code/fMRI/lexical_project/lr_regularization/
summarize_out_4fold.m
summarize_out_4fold.m (not completed yet)
---> the best results from a selected criterion
More details regarding the stored files.
raw data
dir
file
raw data: the nifti file containing beta coefficients for each observation.
/NAS_II/Projects/MVPA_Language/lexical/${subject}/stats_images_fmri/{run,srun}${run_id}_${data_level}
subject = {3211, 3402, 3424, ...}
run_id = {1,2,3,4}
run = regular run
srun = "smoothened"
data_level = {exemplar,item}
${exemplar_name}_{zstat,beta}.nii.gz
whole scan data matrix
dir
file
whole scan data matrix: the n by m data matrix, where n and m are # of observations and # of features respectively.m is actually the number of voxels for the whole scan. Each row of the data marix represents the brain response to that particular stimulus presented to the subject.
/NAS_II/Projects/MVPA_Language/lexical/${subject}/stats_images_fmri/
filtered_${data_level}_beta_tstat_matrix.mat
data_level = {exemplar, sexemplar, item, sitem}
task-mask-specific data matrix
dir
file
task-mask-specific data matrix: this data matrix is trimmed by the choice of specific task, e.g., "animals vs tools" or "mammals vs non-mammals", and the choice of mask, e.g., "lh_mask_vtc2", "mask_vtc2", "gray_masks". So, this is more like a submatrix of the whole-scan data.
/NAS_II/Projects/MVPA_Language/lexical/${subject}/stats_images_fmri/
${task}_${data_level}_beta_tstat_matrix_${mask}.mat
task = {animtool, mamnonmam, allmamnonmam}
data_level = {exemplar, sexemplar, item, sitem}
mask = {lh_mask_vtc2, mask_vtc2, gray_mask2}
sweep-parameter classification results
dir
file
sweep-parameter classification results: Here we report the classification results, which includes accuracy for train/validation/test set for each pair of parameter lambda and alpha, # non-zero coefficients, etc. You can get the optimal parameters and the best accuracy from here.
/NAS_II/Projects/MVPA_Language/lexical/${subject}/sparse_LR
${subject}_${mask}_combined_${task}_${cv_type}.mat
cv_type = {lou,4fold}
best-parameter classification result for each subject
dir
file
best-parameter classification result for each subject: we pick the optimal parameters from the previous process with any criterion we like. The result here is for the optimal parameter for each subject.
/NAS_II/Projects/MVPA_Language/lexical/${subject}/sparse_LR
${subject}_${mask}_${classifier_regu}_${criterion}_${task}_${cv_type}.mat
classifier_regu = {lr_lasso, lr_ridge, lr_none, lr_elnet}
criterion = {cri1, cri2} // criterion#1, #2
Each data can be treated by 2 ways:
(using the code summarize_out_4fold.m)
There are 2 experiments:
Each reported on 3 masks
There are 3 types of classifiers used in this experiment:
For optimal parameter selection, I use 2 criteria:
The report results include
The data log for post-processing data for each subject (run using the code summarize_out_4fold.m).
The summary table for lexical data
The summary bar plot when averaging across subjects.
(using the code summarize_out_lou.m)
There are 2 experiments:
Each reported on 3 masks
There are 3 types of classifiers used in this experiment:
For optimal parameter selection, I use 2 criteria:
The report results include