Stata 12 SEM package
22 Aug 2011
Stata 12's SEM option
Stata 12 can do structural equation modeling (SEM). One can reference factor loadings, means and intercepts, and residual variances and variances of both observed and latent variables. Parameters can be freed, constrained, etc. This report provides examples of a CFA, multitrait multimethod matrix, and a latent growth model. Example 1 illustrates some generic Stata SEM syntax useful to most projects. Results from Stata's SEM package are compared with those from MPLUS in Examples 2-3. The intent here is to explore the building blocks and not to put the building blocks together in unique combinations. That is left to the investigator.
Specific capabilities in Stata 12
--Constrain paths, coefficients.
--Get all the regular fit statistics.
--Request stdyx results.
--SEM works seamlessly with lincom, test, predict (to get factor scores), and other built-in Stata functions we know and love.
--Get modification indices (cutoff of your choosing, you can even select types of modification indices you wish to see)
--Request LISREL-style output with estat framework, which comes with friendlier return matrices but does not provide standard errors for those parameters. Example 3 uses this to pull values from matrices (Example 2 does it using normally returned matrices). A limitation is that standard errors of estimates are not returned with estat framework.
--Read data from a covariance/correlation matrix.
--Like its regression, Stata's default way of handling missing data in SEM is listwise complete deletion, but there is an extra option method(mlmv) that asks for MAR estimates.
--Estimate a latent growth model, multiple-group model, and other fancy things.
--Draw a diagram of your model, and even use the diagram to specify and run your model. Recall sweet memories of long nights with Amos.
Limitations in Stata 12 compared to MPLUS
--You can estimate a model by drawing a path diagram, and it'll put estimates on your picture for you. But I am not sure you can produce a path diagram using model syntax. That would be too fantastic.
--Stata drops any rows with missing data by default. They have options for assuming data are MAR, but it is different than MPLUS' MAR estimator and incompatible with certain other post-command options in Stata12.
--No mention of categorical dependent variables in the user's guide, or any mention of thresholds. If you live by categorical outcomes, stick to MPLUS.
--No mention of categorical latent variables.
In the following examples, I show syntax and emphasize comparisons of parameter estimates and standard errors between MPLUS and Stata12. Emphasis is not on substantive interpretation. MPLUS is run through Stata using Dr. Jones' RunMPLUS.ado, which is freely available with findit runmplus.
Example 1. CFA.
This example presents a single factor CFA. Here, we demonstrate some of the
cool features available in Stata 12 and benefits of the package being integrated with existing
features in Stata. We use baseline data from the ACTIVE study. Below is a basic diagram of the model:
RunMPLUS syntax (output omitted):
runmplus immrawb1 hvlttb1 avlttb1 ufov1b1 ws_corb1 ls_corb1 lt_corb1, ///
model(f BY immrawb1* hvlttb1 avlttb1 ufov1b1 ws_corb1 ls_corb1 lt_corb1; f@1; )
Stata syntax :
sem (f -> immrawb1 hvlttb1 avlttb1 ufov1b1 ws_corb1 ls_corb1 lt_corb1) ///
, latent(f) var(f@1) // method(mlmv)
estat gof, stats(all) // model fit.
capture drop fscore1
Commonly used options.
Predicting a factor score is as easy as:
predict fscore1, latent(f)
You can use method(mlmv) to account for missing data under MAR conditions, but this method strongly relies on joint normality of variables. Other options in the Stata manual you can read about are ML, QML, ADF, and MLMV.
The post-estimation option estat gof, stats(all) provides conventional fit statistics that makes the world go round.
If you want stdyx estimates, add the option standardized to the SEM command after the comma.
If you ever want to see syntax for how to refer to a given parameter, like a covariance or error or path loading, add coeflegend to the SEM command after the comma.
If you want to estimate a model in a multiple groups framework, for example by sex or intervention group, add group(groupvarname) to the SEM command after the comma.
Tip: To make sure Stata knows what is a latent variable in your model, make it explicit using the latent(lvname) command after the comma. But that is not required. By default, Stata recognizes any variable that starts with an uppercase letter as a latent variable. If you use a variable in SEM that starts with an uppercase letter, well that might be like crossing streams. We all know about crossing the streams.
To estimate modification indices, this command. It is possible to request only certain sets of modification indices, for example covariances of residual errors or factor loadings, and to request them at particular levels of significance.
estat mindices, min(6)
It is really easy to request residuals, although this post-estimation command is incompatible with method(mlmv) .
estat residuals, normalized
Request Lisrel-type output, which is closer to what MPLUS users are used to seeing. In addition to the nicer display, you get nicer matrices, which are demonstrated further in Example 3.
estat framework, standardized
If you have variables measured with error and want to incorporate their reliability in the model, add reliability() after the comma in the sem comma. Syntax is below using a freely available dataset which you will see again in Example 3.
use http://www.stata-press.com/data/r12/sem_lcm
local rel="0.6"
sem (lncrime0 <-I@1 S@0 _cons@0) ///
(lncrime1 <-I@1 S@1 _cons@0) ///
(lncrime2 <-I@1 S@2 _cons@0) ///
(lncrime3 <-I@1 S@3 _cons@0), ///
latent(I S) means(I S) ///
var(e.lncrime0@var e.lncrime1@var ///
e.lncrime2@var e.lncrime3@var) ///
standardized reliability(lncrime0 `rel' lncrime1 `rel' lncrime2 `rel' lncrime3 `rel' )
Example 2. Multi-trait multimethod matrix.
Data. Three functional measures (IADL, ADL, IQCODE) were measured for each individual using both self and proxy reports. Reporting type are the methods factors; traits are handled with correlations because there are only 2 indicators for each trait.
RunMPLUS syntax (output omitted):
use P:\PROJECTS\SQCODE\POSTED\DATA\Derived\CLEANS~1.DTA, clear
keep if inlist(study_status, 700)
tex {\bf RunMPLUS syntax (output omitted):}
quietly {
local methslf "piadlt piiadlt piiqc"
local methinf "fiadlt fiiadlt fiiqc"
runmplus studyid `methslf' `methinf', ///
variable( idvariable = studyid; ) ///
analysis(estimator=WLSMV; ) ///
model(methslf BY `methslf' *; ///
methinf BY `methinf' *; ///
methslf WITH methinf; ///
piadlt WITH fiadlt; ///
piiadlt WITH fiiadlt; ///
piiqc WITH fiiqc; )
mat fooestmplus=r(estimate)
mat foosemplus=r(se)
mat list fooestmplus
mat semplus = J(29,1,.)
forvalues i=1/24 {
mat semplus[`i',1] = foosemplus[`i',1]
}
mat estmplus = J(29,1,.)
forvalues i=1/24 {
mat estmplus[`i',1] = fooestmplus[`i',1]
}
mat estmplus[25,1]=`r(CFI)'
mat estmplus[26,1]=`r(RMSEA)'
mat estmplus[27,1]=`r(chisquare)'
mat estmplus[28,1]=`r(chisquare_df)'
mat estmplus[29,1]=`r(chisquare_P)'
}
Stata syntax with a diagram produced by Stata.
Please note in the diagram above that you may request no coefficients, raw coefficients, or standardized coefficients to be displayed on the picture.
sem (methslf -> piadlt@1) ///
(methslf -> piiadlt) ///
(methslf -> piiqc) ///
(methinf -> fiadlt@1) ///
(methinf -> fiiadlt) ///
(methinf -> fiiqc), ///
covstruct(_lexogenous, diagonal) ///
method(mlmv) ///
latent(methslf methinf ) ///
cov( methslf*methinf e.piadlt*e.fiadlt e.piiadlt*e.fiiadlt e.piiqc*e.fiiqc)
mat foo = r(table)
mat list foo
local test: rowfullnames foo
display "`test'"
local test: colfullnames foo
display "`test'"
estat gof, stats(all)
texdoc stlog close
capture mat drop eststata
matrix eststata = J(29,1,.)
matrix sestata = J(29,1,.)
estat gof, stats(all)
mat eststata[25,1]=`r(cfi)'
mat eststata[26,1]=`r(rmsea)'
mat eststata[27,1]=`r(chi2_ms)'
mat eststata[28,1]=`r(df_ms)'
mat eststata[29,1]=`r(p_ms)'
foreach m in 1 2 {
if `m'==1 {
local type "est"
}
if `m'==2 {
local type "se"
}
local count=0
foreach i in 1 3 5 7 9 11 {
local `++count'
matrix `type'stata[`count',1]=foo[`m',`i']
}
*mat list `type'stata
local count=10
foreach i in 2 4 6 8 10 12 {
local `++count'
matrix `type'stata[`count',1]=foo[`m',`i']
}
mat list `type'stata
matrix `type'stata[7,1] =foo[`m',24]
matrix `type'stata[8,1] =foo[`m',21]
matrix `type'stata[9,1] =foo[`m',22]
matrix `type'stata[10,1]=foo[`m',23]
matrix `type'stata[17,1]=foo[`m',19]
matrix `type'stata[18,1]=foo[`m',20]
mat list `type'stata
local count=18
foreach i in 13 14 15 16 17 18 {
local `++count'
matrix `type'stata[`count',1]=foo[`m',`i']
}
*mat list `type'stata
}
Comparison of MPLUS and Stata12 parameter estimates and standard errors from Example 2.
. mat combo = estmplus, eststata, semplus, sestata
. mat list combo
combo[29,4]
c1 c1 c1 c1
r1 1 1 0 .
r2 1.012 1.0119967 .145 .14460613
r3 -.037 -.03662236 .017 .0166771
r4 1 1 0 .
r5 1.07 1.0699935 .177 .17700328
r6 -.072 -.07214883 .021 .02116697
r7 .674 .6739953 .078 .07837412
r8 1.584 1.584216 .334 .3336187
r9 .092 .09209461 .189 .18908674
r10 .007 .00731822 .003 .00347975
r11 20.046 20.046243 .135 .13479361
r12 13.503 13.50289 .087 .08661336
r13 3.093 3.0929061 .015 .01500737
r14 19.807 19.806591 .186 .18646199
r15 13.397 13.396949 .093 .09320208
r16 3.119 3.1186013 .018 .01811102
r17 1 1 0 .
r18 1 1 0 .
r19 2.143 2.1432918 .309 .3094079
r20 .274 .27368695 .237 .23722743
r21 .038 .03762204 .004 .00410517
r22 4.913 4.9125067 .61 .61031889
r23 .335 .33484612 .322 .32197249
r24 .05 .05028367 .006 .00572992
r25 .989 .98936004 . .
r26 .043 .04266794 . .
r27 9.205 9.2046899 . .
r28 7 7 . .
r29 .2383 .23829239 . .
Columns are in this order: MPLUS estimates, Stata estimates, MPLUS standard errors, Stata standard errors.
Rows are particular parameter estimates. The last 5 rows are for the CFI, RMSEA, chi2, chi2 df, and chi2 p-value, respectively.
Example 3. Growth model.
RunMPLUS Syntax (output omitted):
use http://www.stata-press.com/data/r12/sem_lcm, clear
tex \newpage
tex {\bf Example 3. Growth model. }
tex
tex RunMPLUS Syntax (output omitted):
quietly {
* MPLUS
runmplus lncrime0 lncrime1 lncrime2 lncrime3, ///
model(i s | lncrime0@0 lncrime1@1 lncrime2@2 lncrime3@3; ///
lncrime0 - lncrime3 (a); ///
) output(!stdyx;)
mat fooestmplus=r(estimate)
mat foosemplus=r(se)
mat list fooestmplus
local test: rowfullnames fooestmplus
display "`test'"
capture mat drop semplus
mat semplus = J(26,1,.)
forvalues i=1/26 {
mat semplus[`i',1] = foosemplus[`i',1]
}
capture mat drop estmplus
mat estmplus = J(26,1,.)
forvalues i=1/21 {
mat estmplus[`i',1] = fooestmplus[`i',1]
}
mat estmplus[22,1]=`r(CFI)'
mat estmplus[23,1]=`r(RMSEA)'
mat estmplus[24,1]=`r(chisquare)'
mat estmplus[25,1]=`r(chisquare_df)'
mat estmplus[26,1]=`r(chisquare_P)'
}
Here is Stata code that also features estat framework, which returns matrices with like parameters nicely bundled. I preface SEM with quietly to ask it to use its inside voice.
quietly sem (lncrime0 <-I@1 S@0 _cons@0) ///
(lncrime1 <-I@1 S@1 _cons@0) ///
(lncrime2 <-I@1 S@2 _cons@0) ///
(lncrime3 <-I@1 S@3 _cons@0), ///
latent(I S) means(I S) ///
var(e.lncrime0@var e.lncrime1@var ///
e.lncrime2@var e.lncrime3@var)
capture mat drop eststata
estat gof, stats(all)
matrix eststata = J(26,1,.)
matrix sestata = J(26,1,.)
mat eststata[22,1]=`r(cfi)'
mat eststata[23,1]=`r(rmsea)'
mat eststata[24,1]=`r(chi2_ms)'
mat eststata[25,1]=`r(df_ms)'
mat eststata[26,1]=`r(p_ms)'
estat framework
foreach m in Gamma Phi kappa alpha Psi {
matrix `m' =r(`m')
}
local i=0
foreach c in 1 2 3 4 {
mat eststata[`++i',1]=Gamma[`c',1]
}
foreach c in 1 2 3 4 {
mat eststata[`++i',1]=Gamma[`c',2]
}
mat eststata[`++i',1]=Phi[1,2]
mat eststata[`++i',1]=kappa[1,1]
mat eststata[`++i',1]=kappa[1,2]
foreach c in 1 2 3 4 {
mat eststata[`++i',1]=alpha[1,`c']
}
mat eststata[`++i',1]=Phi[1,1]
mat eststata[`++i',1]=Phi[2,2]
foreach c in 1 2 3 4 {
mat eststata[`++i',1]=Psi[`c',`c']
}
Comparisons of MPLUS and Stata12 parameter estimates and standard errors in Example 3.
. mat combo = estmplus, eststata, semplus
. mat list combo
combo[26,3]
c1 c1 c1
r1 1 1 0
r2 1 1 0
r3 1 1 0
r4 1 1 0
r5 0 0 0
r6 1 1 0
r7 2 2 0
r8 3 3 0
r9 -.034 -.03431601 .009
r10 5.338 5.3379154 .041
r11 .143 .1426952 .01
r12 0 0 0
r13 0 0 0
r14 0 0 0
r15 0 0 0
r16 .527 .52740902 .045
r17 .02 .0196198 .003
r18 .098 .09819558 .005
r19 .098 .09819558 .005
r20 .098 .09819558 .005
r21 .098 .09819558 .005
r22 .993 .99345946 .
r23 .054 .05358317 .
r24 16.246 16.245962 .
r25 8 8 .
r26 .039 .03899193 .
Columns are in this order: MPLUS estimates, Stata estimates, MPLUS standard errors.
Rows are particular parameter estimates (which ones? please see earlier output). The last 5 rows show the CFI, RMSEA, chi2, chi2 df, and chi2 p-value, respectively.