Validating Measurement Model

Construct Validity

It indicates the extent to which a set of measured variables actually represents the theoretical latent construct they are designed to measure.

Validating Procedure

The validating procedure of measurement model in SEM is done through Confirmatory Factor Analysis (CFA). CFA has abilities to assess construct validity in terms of convergent validity , internal consistency, and discriminant validity.

The researcher MUST perform CFA for all latent constructs before modeling their interrelationship in a structural model. However, in assessing convergent validity, unidimensionality (factor loading) MUST be assessed first prior to the AVE and Composite Reliability.

The following steps could be followed:

Run the Confirmatory Factor Analysis (CFA) for the pooled measurement model
Report the normality assessment for remaining items of a construct in the study.
Examine the Fitness Indices obtained for the measurement model. If the indexes obtained do not achieved the required level, then examine the factor loading for every item. Identify the items having low factor loading since these items are considered problematic in the model.
Delete an item having factor loading less than 0.4. Delete one item at a time (select the lowest factor loading to delete first)
Run the new measurement model (the model after an item is deleted).
Examine the Fitness Indexes - repeat step 3-5 until the fitness indexes achieved. If the Fitness Index is still not achieved after low factor loading items have been removed, look at the Modification Indices (MI). High value of MI (above 15) indicates there are redundant items in the model. The MIs indicate a pair of items which are redundant in the model. To solve the redundant items, the researcher could choose one of the following:

Choice 1: Delete one of the item (choose the lower factor loading), and run the measurement model and repeat the above steps.

Choice 2: Set the pair of redundant item as "free parameter estimate", and run the measurement model and repeat the above steps.

Obtain the CR, and AVE for every construct in the study.
Report discriminant validity assessment results.

Goodness of Fit

To indicate how closely the data fit the model through comparing theory and reality.

Model fit is intended to examine how closely the data fit the model.

Fitness indices

Absolute Fit Index

It measures overall GOF for both structural and measurement models collectively.

It does not make any comparison to a specified null model (incremental fit measure) or adjust for the number of parameters in the estimated model (parsimonious fit measure).

Incremental Fit Index

It measures GOF that compares the current model to a specified “null” (independent) model to determine the degree of improvement over the null model.

Parsimonious Fit Index

It measures GOF representing the degree of model fit per estimated coefficient

It attempts to correct for any “overfitting” of the model and evaluates the parsimony of the model compared to the GOF

RMSEA: Root Mean Square Error of Approximation

GFI: Goodness of Fit Index

SRMR: Standardized root mean square residual

RMR: Root mean square residual

CFI: Comparative Fit Index

TLI: Tucker-Lewis index (Non-normed fit index)

AGFI: Adjusted goodness of fit index

PNFI: Parsimony Normed Fit Index

Text Macros of Goodness of Fit to Display Results in AMOS Graphic

We can use text macros to display goodness of fit in AMOS Graphic.

Click here for Goodness of Fit text macros and reporting style of Fit indices. We can copy and paste the text macros to the AMOS graphic.

Cho et al. (2020)

When N = 100, researchers may choose a GFI cutoff value of .89 and an SRMR cutoff value of .09. That is, a GFI≥.89 and an SRMR≤.09 indicate an acceptable level of model fit. Although both indexes can be used to assess model ft, using the SRMR with the above cutoff value (i.e., SRMR≤.09) may be better than using the GFI with the suggested cutoff value (i.e., GFI ≥ .89), because the former generally resulted in Type I and Type II error rates having a smaller average. In addition, if SRMR ≤ .09, then a GFI cutoff value of ≥ .85 may still be indicative of an acceptable fit.

When N > 100, researchers may choose a GFI cutoff value of .93 and an SRMR cutoff value of .08. In this case, there is no preference for one index over the other, or for using a combination of the indexes over using them separately. Each index’s suggested cutoff value may be used independently to assess the model fit. That is, a GFI≥ .93 or an SRMR≤.08 indicates an acceptable fit.

Convergent Validity

Convergent validity indicates the extent to which the items of the specific construct converge together.

It reflects correlation between items measuring the same construct.

Indicator 1: Factor Loading

High factor loadings of measurement items indicate that the items converge together on a common construct.

Factor Loading is the indicator for unidimensionality. All items must have acceptable loadings for a respective latent construct. Therefore, any item with low factor loading should be deleted from the construct.

All indicators' outer loadings should be statistically significant. Because a significant outer loading could still be fairly weak, a common rule of thumb is that the (standardized) outer loadings should be 0.708 or higher. The rationale behind this rule is that square of the outer loading (R-square) should be higher than 0.50.

Deletion of low loading items should be made one item at a time with the lowest loading should be first deleted. After an item is deleted, the researcher needs to estimate the model again until the unidimensionality requirement is achieved for all constructs.

All factor loadings MUST be positive.

Factor Loading Relevance Testing

Indicator 2: Average Variance Extracted (AVE)

Indicates how much variation in the multiple items is explained by the latent variable.
Is comparable to the proportion of variance explained in factor analysis.
Value ranges from 0 and 1.
AVE should exceed 0.5 to suggest adequate convergent validity (Bagozzi & Yi, 1988; Fornell & Larcker, 1981).

AVE is equivalent to the communality of a construct. An AVE value of 0.50 or higher indicates that the construct explains more than half of the variance of its indicators.

AVE of less than 0.50 indicates that more error remains in the items than the variance explained by the construct.

If AVE < 0.50, then item with the lowest factor loading for that particular construct should be deleted.

Indicator 3: Internal Consistency Reliability

Internal consistency reliability indicates consistency of measurement items to measure a common construct. It shows the extent to which a variable or set of variables is consistent in what it is intended to measure. If multiple measurements are taken, the reliable measures will all be consistent in their values. It differs from validity in that it relates not to what should be measured, but instead to how it is measured.

The traditional criterion for internal consistency is Cronbach's alpha, which provides an estimate of the reliability based on the inter-correlations of the observed indicator variables.

Limitation of Cronbach's Alpha:

It assumes that all indicators are equally reliable (i.e., all the indicators have equal outer loadings on the construct).
It is sensitive to the number of items in the scale & generally tends to underestimate the internal consistency reliability.

To overcome the limitations of Cronbach's Alpha, Composite Reliability (CR) is suggested as a replacement of the traditional criterion.

If CR < 0.70, then item with the lowest factor loading for that particular construct should be considered to be deleted.

CR of 0.60 to 0.70 are acceptable in exploratory research, while in more advanced stages of research, values between 0.70 and 0.90 can be regarded as satisfactory (Nunally & Bernstein, 1994).

Values above 0.90 (and definitely> 0.95) are not desirable because they indicate that all the indicator variables are measuring the same phenomenon and are therefore unlikely to be a valid measure of the construct.

Discriminant Validity

Indicates the uniqueness of a construct from other constructs.

A latent variable should explain better the variance of its own indicators than the variance of other latent variables.

Fornell & Larcker Criterion

The square root of AVE of a latent variable should be higher than the correlations between the latent variable and all other variables (Chin, 2010; Chin 1998b; Fornell & Larcker, 1981).

Limitation of Fornell & Larcker Criterion

Performs very poorly when indicator loadings of the construct differ only slightly (e.g., between 0.6 & 0.8). If the loadings vary more strongly, its performance improves, but is still rather poor overall (Hair et al., 2017; Henseler et al., 2015; Voorhees et al., 2016)

Heterotrait-Monotrait Ratio (HTMT)

Average heterotrait-heteromethod correlations relative to the average monotrait-heteromethod correlation (Hair et al., 2017; Henseler et al., 2015).

HTMT values close to 1 indicate a lack of discriminant validity

Threshold value:

HTMT_.85 (Kline, 2011): More than 0.85 indicates a lack of discriminant validity. This threshold is used when the variables are conceptually dissimilar.
HTMT_.90 (Gold et al., 2001): More than 0.90 indicates a lack of discriminant validity. This threshold is used when the variables are conceptually similar.

Handling Discriminant Validity Issue

Approach 1

Retain the constructs that cause discriminant validity problems in the model and aims at increasing the average monotrait-heteromethod correlations and/or decreasing the average heteromethod-heterotrait correlations of the constructs measures.

One can eliminate items that have low correlations with other items measuring the same construct.

Decrease the average heteromethod-heterotrait correlations, one can...

eliminate items that are strongly correlated with items in the opposing construct, or
reassign these indicators to the other construct, if theoretically plausible.

Approach 2

Merge the constructs that cause the problems into a more general construct. Again, measurement theory must support this step.

Rules of Thumb for Evaluating Reflective Measurement Models

Internal consistency reliability: composite reliability should be higher than 0.70 (in exploratory research, 0.60 to 0.70 is considered acceptable). Consider Cronbach’s alpha as the lower bound and composite reliability as the upper bound of internal consistency reliability.
Indicator reliability: the indicator’s outer loadings should be higher than 0.70. Indicators with outer loadings between 0.40 and 0.70 should be considered for removal only if the deletion leads to an increase in composite reliability and AVE above the suggested threshold value.
Convergent validity: the AVE should be higher than 0.50.
Discriminant validity:

1. Use the HTMT criterion to assess discriminant validity in PLS-SEM.
2. The confidence interval of the HTMT statistic should not include the value 1 for all combinations of constructs.
3. According to the traditional discriminant validity assessment methods, an indicator’s outer loadings on a construct should be higher than all its cross-loadings with other constructs. Furthermore, the square root of the AVE of each construct should be higher than its highest correlation with any other construct (Fornell-Larcker criterion).

Things to take note

If more than one item in a construct do not achieve the threshold value of outer loading, the researcher should only delete the item one at a time for that particular construct, starting form the item with lowest loading. After that, the model should be re-estimated.
Items with loading lower than 0.708 can be kept when the AVE is more than 0.50.
If negative value of outer loading is found in a construct, then the researcher should check if the reversed coded items have been addressed. If the negative value still remains, the item should be deleted.
Caveat: the researcher should not delete more than 20% of the indicators in the model (Hair, Babin, & Krey, 2017; Hair et al., 2014). Otherwise, the whole research moves into EFA rather than CFA. In addition, the credibility of the research instrument is very much questionable.

Is the measurement model valid?

No, then refine and improve measures, and design new study.
Yes, Proceed to test structural model

Sources

Click here for convergent validity calculator.