Reflective Measurement Model

It is a measurement model specification, in which it is assumed that the indicators are caused by the underlying construct.

Changes in the latent variable directly cause changes in the assigned indicators. When the construct score is high, all the answer to all indicators will be high. Conversely, when the construct score is low, all the answer to all indicators will be low.

The indicators are interchangeable and expected to be highly correlated, deleting any item will not change the meaning of the construct.

Recall... Measurement model assessment aims at ensuring construct validity & reliability.

For a reflective model, construct validity is assessed by using convergent, internal consistency and discriminant validity.

Convergent Validity

The extent to which the items of the specific construct converge together.

Reflects correlation between items measuring the same construct.

Indicator 1: Outer Loading

High outer loadings of measurement items indicate that the items converge together on a common construct.

All indicators' outer loadings should be statistically significant. Because a significant outer loading could still be fairly weak, a common rule of thumb is that the (standardized) outer loadings should be 0.708 or higher. The rationale behind this rule is that square of the outer loading (R-square) should be higher than 0.50.

Outer Loading Relevance Testing

Source: Hair et al., (2022)

Indicator 2: Average Variance Extracted (AVE)

Indicates how much variation in the multiple items is explained by the latent variable.
Is comparable to the proportion of variance explained in factor analysis.
Value ranges from 0 and 1.
AVE should exceed 0.5 to suggest adequate convergent validity (Bagozzi & Yi, 1988; Fornell & Larcker, 1981).

AVE is equivalent to the communality of a construct. An AVE value of 0.50 or higher indicates that the construct explains more than half of the variance of its indicators.

AVE of less than 0.50 indicates that more error remains in the items than the variance explained by the construct.

If AVE < 0.50, then item with the lowest factor loading for that particular construct should be deleted.

Internal Consistency Reliability

Indicates consistency of measurement items to measure a common construct.

Cronbach's Alpha

Indicates consistency of measurement items to measure a common construct.

The traditional criterion for internal consistency is Cronbach's alpha, which provides an estimate of the reliability based on the inter-correlations of the observed indicator variables.

Limitation of Cronbach's Alpha:

It assumes that all indicators are equally reliable (i.e., all the indicators have equal outer loadings on the construct).
It is sensitive to the number of items in the scale & generally tends to underestimate the internal consistency reliability.

COMPOSITE RELIABILITY

To overcome the limitations of Cronbach's Alpha, Composite Reliability (CR) is suggested as a replacement of the traditional criterion.

If CR < 0.70, then item with the lowest factor loading for that particular construct should be considered to be deleted.

CR of 0.60 to 0.70 are acceptable in exploratory research, while in more advanced stages of research, values between 0.70 and 0.90 can be regarded as satisfactory (Nunally & Bernstein, 1994).

Values above 0.90 (and definitely> 0.95) are not desirable because they indicate that all the indicator variables are measuring the same phenomenon and are therefore unlikely to be a valid measure of the construct

Rho_A

It helps that these measurements of reliability are presented together because a rho_A value between Cronbach's alpha and composite reliability is a good indication of reliability. On the data output report, the measures are reflected side-by-side, with rho_A between Cronbach's Alpha and CR. This is helpful to determine if the value is good, in between the CA and CR values.

summary

Cronbach’s alpha is the lower bound, and the composite reliability rho_c is the upper bound for internal consistency reliability.
The reliability coefficient rho_A usually lies between these bounds and may serve as a good representation of a construct’s internal consistency reliability.
minimum of 0.70 (or 0.60 in exploratory research).
Maximum of 0.95 to avoid indicator redundancy, which would compromise content validity.
The recommended values for all measures of internal consistency reliability are 0.80 to 0.90.

Discriminant Validity

Indicates the uniqueness of a construct from other constructs.

A latent variable should explain better the variance of its own indicators than the variance of other latent variables.

For this purpose, a researcher opts to use either Heterotrait-Monotrait Ratio, Fornell and Larcker Criterion, or Cross Loading.

Cross-Loading

The loadings of an item on its assigned latent variable should be higher than its loadings on all other latent variables.

Limitation of cross loading

Fails to indicate a lack of discriminant validity when 2 constructs are perfectly correlated, which renders this criterion ineffective for empirical research (Hair et al., 2017; Henseler et al., 2015).

Fornell & Larcker Criterion

The square root of AVE of a latent variable should be higher than the correlations between the latent variable and all other variables (Chin, 2010; Chin 1998b; Fornell & Larcker, 1981).

Limitation of Fornell & Larcker Criterion

Performs very poorly when indicator loadings of the construct differ only slightly (e.g., between 0.6 & 0.8). If the loadings vary more strongly, its performance improves, but is still rather poor overall (Hair et al., 2017; Henseler et al., 2015; Voorhees et al., 2016)

Heterotrait-Monotrait Ratio (HTMT)

Average heterotrait-heteromethod correlations relative to the average monotrait-heteromethod correlation (Hair et al., 2017; Henseler et al., 2015).

HTMT values close to 1 indicate a lack of discriminant validity

Threshold value:

HTMT.85 (Kline, 2011): More than 0.85 indicates a lack of discriminant validity. This threshold is used when the variables are conceptually dissimilar.
HTMT.90 (Gold et al., 2001): More than 0.90 indicates a lack of discriminant validity. This threshold is used when the variables are conceptually similar.
If HTMT is greater than 0.9, then use bootstrapping to test whether HTMT is significantly different from 1 (HTMTinference). Does the 90% bootstrap confidence interval of HTMT include 1? If yes, discriminant validity is not satisfactory. If No, discriminant validity is satisfactory, then the researcher can proceed with analysis.

Handling Discriminant Validity Issue

Source: Hair et al., (2017)

Approach 1

Retain the constructs that cause discriminant validity problems in the model and aims at increasing the average monotrait-heteromethod correlations and/or decreasing the average heteromethod-heterotrait correlations of the constructs measures.

One can eliminate items that have low correlations with other items measuring the same construct.

Decrease the average heteromethod-heterotrait correlations, one can...

eliminate items that are strongly correlated with items in the opposing construct, or
reassign these indicators to the other construct, if theoretically plausible.

Approach 2

Merge the constructs that cause the problems into a more general construct. Again, measurement theory must support this step.

Summary

Rules of Thumb for Evaluating Reflective Measurement Models

Internal consistency reliability: composite reliability should be higher than 0.70 (in exploratory research, 0.60 to 0.70 is considered acceptable). Consider Cronbach’s alpha as the lower bound and composite reliability as the upper bound of internal consistency reliability.
Indicator reliability: the indicator’s outer loadings should be higher than 0.70. Indicators with outer loadings between 0.40 and 0.70 should be considered for removal only if the deletion leads to an increase in composite reliability and AVE above the suggested threshold value.
Convergent validity: the AVE should be higher than 0.50.
Discriminant validity:

1. Use the HTMT criterion to assess discriminant validity in PLS-SEM.
2. The confidence interval of the HTMT statistic should not include the value 1 for all combinations of constructs.
3. According to the traditional discriminant validity assessment methods, an indicator’s outer loadings on a construct should be higher than all its cross-loadings with other constructs. Furthermore, the square root of the AVE of each construct should be higher than its highest correlation with any other construct (Fornell-Larcker criterion).

Things to take note

If more than one item in a construct do not achieve the threshold value of outer loading, the researcher should only delete the item one at a time for that particular construct, starting form the item with lowest loading. After that, the model should be re-estimated.
Items with loading lower than 0.708 can be kept when the AVE is more than 0.50.
If negative value of outer loading is found in a construct, then the researcher should check if the reversed coded items have been addressed. If the negative value still remains, the item should be deleted.
Caveat: the researcher should not delete more than 20% of the indicators in the model (Hair, Babin, & Krey, 2017; Hair et al., 2014). Otherwise, the whole research moves into EFA rather than CFA. In addition, the credibility of the research instrument is very much questionable.

Is the measurement model valid?

No, then refine and improve measures, and design new study.
Yes, Proceed to test structural model

Sources

Click here for convergent validity & internal consistency calculator.

Exercise

Exercise 1

Please download the data HERE, and draw the following model.

Exercise 2

Fiancial Performance. Please download the data HERE, and draw the following model.

Exercise 3

Manufacturing Strategy: Please download data-set here, and draw the model below.

Exercise 4

Business Competitiveness: Please download data-set here, and draw the model below.

Exercise 5

Smartphone Addiction: Please download data-set here, and draw the model below.

Reference Video

Google Sites

Report abuse