fMRI Modeling/Stats

How to Factor out trialwise time - epoch duration or parametric modulator?

SPM: Why is it wrong to use statistical t-maps (spmT*.nii) rather than the linear combination of beta estimates (con*.nii) in the second-level analysis?

Why do I have T-stats/Z-stats that are infinite (= Inf)?

what is the "smoothness" needed to calculate Extent (cluster-size) threshold?

Which multiple correction to use? Bonferroni-Holms, FDR, FWE, cluster-size threshold, TFCE, pTFCE?

Q & A

How to Factor out trialwise time - epoch duration or parametric modulator?

Grinband et al. 2008 (NeuroImage) provide simulations advocating treating response times (RT) or time of study events in event-related fMRI studies as epoch durations rather than ignoring them as is often done in event-related designs.

as duration:

If less than 2 s, will probably have little effect on inferences as it will not affect the shape of the predicted HRF but affects the size of the parameter estimates and so does not affect statistical inferences. Longer durations will affect the predicted shape changes, so might make a difference.

Why? Keep in mind the difference between study design and modeling design. Block designs can be modeled as event-related models or blocks, and event-related designs can be designed as mini epochs. In SPM, all conditions in a model are specified by its onset or duration. Conditions of zero duration are event-related models (series of events) and non-zero for epoch model design ("sustained"). In a sense, SPM treats all models as "event-related" -- either way, SPM creates delta functions that gets convolved with the HRF. When duration is zero, SPM only calculates one delta function (or stimulus train function; rather than just using a boxcar function) while when the duration is a nonzero value, SPM creates more (closely spaced) delta functions. Impact of this? Inference on parameter estimates is slightly different for the two models. With epoch design, parameters are about a block/condition. With event models, so focus could be on inferences between the same events.

as parametric modulator:

Remember that for a GLM with regressors [C1 C1xPM C2 C2xPM], C1 is the estimated effect of C1 not captured by the parametric modulator, and vice versa, as with all regressors in the model (SPM mean-centers each condition to itself) -- but this also means that this model setup will not be appropriate if you are trying to consider whether someone is just a a faster responder. As SPM mean-centers parametric modulators on the main effect), this model does not capture the overall RT (which is now across all the RT parametric modulators), if you had your conditions of interest as separate regressor in your model with their associated trialwise RT as parametric modulators, you are not

The problem with including RT (or any other measure) as a parametric modulator is if it is highly correlated with the condition. You might be able to mitigate the influence of RT on the condition, you risk only seeing the difference of the correlation instead.

Treat as continuous or categorical?

If your continuous model collapses into 2 categories, then the two models will converge. You make more assumptions by treating your factor as continuous.

SPM: Why is it wrong to use statistical t-maps (spmT.nii) rather than the linear combination of beta estimates (con.nii) in the second-level analysis?

SPM's summary statistic approach (2-level approach) to approximate the more computationally expensive random effects analysis (see Will Penny's chapter on random effects analysis) only works when using con images but not T-stat images.

Alternatively, as Robert Viviani explains, the t-stat is not a consistent estimator of effect size because as sample size increases, df -> inf and var(t) -> 1 => t -> z, whereas a beta estimate b_hat -> b, i.e. var(b_hat) -> 0 or epsilon = b_hat -b -> 0

Why do I have T-stats/Z-stats that are infinite (= Inf)?

The values are not infinite. It is a numerical problem when the p-value is very small as SPM uses the p-value to convert t-stats to z-stats (according to John Ashburner). Mark Jenkinson and Mark Woolrich have talked about this problem and provided a workaround (FMRIB Technical Report Asymptotic T to Z and F to Z Statistical Transformations) for FSL, which Stephen Fromm at the NIH implemented as a function in Matlab.

Apparently this was common in early SPM versions (pre early 2000s / SPM99) and didn't seem to bother the SPM community, and maybe a problem with computer memory when switching computer architecture problem with SPM2.

what is the "smoothness" needed to calculate Extent (cluster-size) threshold?

The smoothness that is needed to calculate extent threshold (e.g. with AFNI's AlphaSim) is about the interdependencies in your image (smoothness of the data) -- not to be confused with the smoothing you apply to the data (usually by blurring the data with a Gaussian kernel of a given FWHM size) to compensate for noise (stemming, for example, from spatial imprecision).

In SPM, spm_est_smoothness.m creates the image with the resels per voxel (RPV = #resels/#voxel) from the SPM.mat, and you can calculate the smoothness with ImCalc:

FWHM = RPV^(−1/3)

Which multiple correction to use? Bonferroni-Holms, FDR, FWE, cluster-size threshold, TFCE, pTFCE?

Sometimes true effects are weak -- but how do you objectively distinguish true, weak effects from strong noise (strong spurious effects)? There is no one-size-fits-all dataset approach -- thresholding is an art that depends on the strength of your effects and your data set -- see Geerligs & Maris (2020) for many of the pertinent points to consider. There are obviously more multiple correction approaches (e.g. LISA proposed by Lohmann et al. 2018) than the most common ones which are listed here.

Bonferroni-Holms
* threshold = divide chosen rate of allowed false positives by number of tests
+ fast
- assumes that comparisons are independent -- but fMRI (or EEG) data have spatially dependent, so this correction is too conservative for MRI data

False discovery rate (FDR).
* if choose threshold 5%: 5% of significant voxels are false positives
* notice that since SPM8, what is reported in the results table is not voxel-wise FDR, but peak-wise FDR -- to make SPM report voxel-wise FDR, need to set that in spm_defaults.m or put in your environment/script:
global defaults; defaults.stats.topoFDR=0;
+ fast
+ good specificity
- higher risk of false negatives
- depends on the distribution of p-values => the more significant effects there are, the more unreliable effects are will survive thresholding, i.e. become "significant"

Familywise error rate (FWER)
* if choose threshold 5%: each voxel surviving 5% threshold is 5% a false positive under H0, i.e. prob of making ore or more Type I errors in a family of tests -- so rejecting a single null is rejecting H0

Cluster-extent
* use when likelihood of false negatives when effects are weak but sizeable
* requires user to set 2 thresholds: voxel-based and cluster size
+ based on prior knowledge and plausible assumptions (more likely that large clusters not spurious although voxel intensities are low)
+ detection of small/weak but "real" effects
+ can control as well as FWER when used carefully (for EEG data: Pernet et al. 2015)
- low spatial specificity: as focus is on large clusters though weak being plausible, has low spatial specificity -- if choose a voxel-threshold that is too liberal/high, will get large clusters
- if choose a cluster-size too big, however, would exclude real effects that are strong but has small extent AND result in clusters extending beyond anatomical/functional regions
- suggestions to use stricter voxel-threshold such as corrected voxel thresholding (as suggested in Woo et al. 2014) to counter the previous pitfall thwarts the original motivation of cluster-extent thresholding (see notes below on TFCE)
- unstable: small variations around threshold chosen can has large effect on results
- computationally expensive (permutation tests required, e.g. Monte Carlo simulation)

TFCE/pTFCE
* nonparametric voxelwise statistic: considers cluster mass at each voxel by balancing cluster and height (i.e. weights each contribution in calculating the mass statistic -- reasons for default weights in toolboxes are in Smith & Nichols 2009)
* the TFCE p-values are computed from the TFCE test statistic, not the t-statistic, and does not have an associated DOF (and note: FSL TFCE p-values reportedd are based on one-sided tests)
+ no initial thresholding
+ generally more sensitive than clusterwise but less specific
+ seems more stable than cluster-extent approaches (Li et al. 2016)
- like cluster-extent thresholding, low spatial specificity
-/+ TFCE is computationally expensive but not pTFCE

A few words about TFCE/pTFCE vs. cluster-extent thresholding:

Threshold Free Cluster Enhancement (TFCE, available as an SPM toolbox or in FSL via randomise; Smith & Nichols 2008) is a weighted average of voxel intensity ("height") AND cluster size (extent/"width"). There is a probabilistic version (pTFCE, available as an SPM toolbox or in R; Spisák et al. 2019) which uses Bayesian stats (given prior prob of voxel intensity and likelihood of a cluster size) to pre-empt needing to do permutation testing used in vanilla TFCE.

Cluster-size threshold approaches require that you specify a voxel threshold, and through permutation testing, will take the area/mass of voxel t-values of different cluster sizes above this threshold give you cluster sizes that are "likely" (so another threshold that you choose) to be stable after repeated experiments. Because you have to specify a voxel threshold (before specifying a cluster threshold), it is conceivable that you miss clusters that are large but that have lower voxel heights.

Page updated

Report abuse