Recent site activity

Stata 12 SEM package

22 Aug 2011
Stata 12's SEM option

Stata 12 can do structural equation modeling (SEM). One can reference factor loadings, means and intercepts, and residual variances and variances of both observed and latent variables. Parameters can be freed, constrained, etc. This report provides examples of a CFA, multitrait multimethod matrix, and a latent growth model. Example 1 illustrates some generic Stata SEM syntax useful to most projects. Results from Stata's SEM package are compared with those from MPLUS in Examples 2-3. The intent here is to explore the building blocks and not to put the building blocks together in unique combinations. That is left to the investigator.

Specific capabilities in Stata 12

--Constrain paths, coefficients.
--Get all the regular fit statistics.
--Request stdyx results.
--SEM works seamlessly with lincom, test, predict (to get factor scores), and other built-in Stata functions we know and love.
--Get modification indices (cutoff of your choosing, you can even select types of modification indices you wish to see)
--Request LISREL-style output with estat framework, which comes with friendlier return matrices but does not provide standard errors for those parameters. Example 3 uses this to pull values from matrices (Example 2 does it using normally returned matrices). A limitation is that standard errors of estimates are not returned with estat framework.
--Read data from a covariance/correlation matrix.
--Like its regression, Stata's default way of handling missing data in SEM is listwise complete deletion, but there is an extra option method(mlmv) that asks for MAR estimates.
--Estimate a latent growth model, multiple-group model, and other fancy things.
--Draw a diagram of your model, and even use the diagram to specify and run your model. Recall sweet memories of long nights with Amos.

Limitations in Stata 12 compared to MPLUS

--You can estimate a model by drawing a path diagram, and it'll put estimates on your picture for you. But I am not sure you can produce a path diagram using model syntax. That would be too fantastic.
--Stata drops any rows with missing data by default. They have options for assuming data are MAR, but it is different than MPLUS' MAR estimator and incompatible with certain other post-command options in Stata12.
--No mention of categorical dependent variables in the user's guide, or any mention of thresholds. If you live by categorical outcomes, stick to MPLUS.
--No mention of categorical latent variables.

In the following examples, I show syntax and emphasize comparisons of parameter estimates and standard errors between MPLUS and Stata12. Emphasis is not on substantive interpretation. MPLUS is run through Stata using Dr. Jones' RunMPLUS.ado, which is freely available with findit runmplus.

Example 1. CFA.

This example presents a single factor CFA. Here, we demonstrate some of the
cool features available in Stata 12 and benefits of the package being integrated with existing
features in Stata. We use baseline data from the ACTIVE study. Below is a basic diagram of the model:


RunMPLUS syntax (output omitted):

runmplus immrawb1 hvlttb1 avlttb1 ufov1b1 ws_corb1 ls_corb1 lt_corb1, ///

   model(f BY immrawb1* hvlttb1 avlttb1 ufov1b1 ws_corb1 ls_corb1 lt_corb1; f@1; )

Stata syntax :
 sem (f -> immrawb1 hvlttb1 avlttb1 ufov1b1 ws_corb1 ls_corb1 lt_corb1) ///
   , latent(f) var(
f@1) // method(mlmv)
estat gof, stats(all) // model fit.
capture drop fscore1
Commonly used options.
Predicting a factor score is as easy as:
predict fscore1, latent(f)
You can use method(mlmv)  to account for missing data under MAR conditions, but this method strongly relies on joint normality of variables. Other options in the Stata manual you can read about are ML, QML, ADF, and MLMV.
The post-estimation option estat gof, stats(all) provides conventional fit statistics that makes the world go round.

If you want stdyx estimates, add the option standardized to the SEM command after the comma.

If you ever want to see syntax for how to refer to a given parameter, like a covariance or error or path loading, add coeflegend to the SEM command after the comma.

If you want to estimate a model in a multiple groups framework, for example by sex or intervention group, add group(groupvarname)  to the SEM command after the comma.

Tip: To make sure Stata knows what is a latent variable in your model, make it explicit using the latent(lvname) command after the comma. But that is not required. By default, Stata recognizes any variable that starts with an uppercase letter as a latent variable. If you use a variable in SEM that starts with an uppercase letter, well that might be like crossing streams. We all know about crossing the streams.


To estimate modification indices, this command. It is possible to request only certain sets of modification indices, for example covariances of residual errors or factor loadings, and to request them at particular levels of significance.
estat mindices, min(6) 
It is really easy to request residuals, although this post-estimation command is incompatible with method(mlmv) .
estat residuals, normalized
Request Lisrel-type output, which is closer to what MPLUS users are used to seeing. In addition to the nicer display, you get nicer matrices, which are demonstrated further in Example 3.
 estat framework, standardized
If you have variables measured with error and want to incorporate their reliability in the model, add reliability() after the comma in the sem comma. Syntax is below using a freely available dataset which you will see again in Example 3.
use http://www.stata-press.com/data/r12/sem_lcm
local rel="0.6"
sem (lncrime0 <
-I@1 S@0 _cons@0) ///
   (lncrime1 <
-I@1 S@1 _cons@0) ///
   (lncrime2 <
-I@1 S@2 _cons@0) ///
   (lncrime3 <
-I@1 S@3 _cons@0), ///
   latent(I S) means(I S) ///
e.lncrime0@var e.lncrime1@var ///
e.lncrime2@var e.lncrime3@var) ///
    standardized reliability(lncrime0 `rel' lncrime1 `rel' lncrime2 `rel' lncrime3 `rel' )
Example 2. Multi-trait multimethod matrix.

Data. Three functional measures (IADL, ADL, IQCODE) were measured for each individual using both self and proxy reports. Reporting type are the methods factors; traits are handled with correlations because there are only 2 indicators for each trait.

RunMPLUS syntax (output omitted):
keep if inlist(study_status, 700)
tex {\bf RunMPLUS syntax (output omitted):}
quietly {
   local methslf "piadlt piiadlt piiqc"
   local methinf "fiadlt fiiadlt fiiqc"
   runmplus studyid `methslf' `methinf', ///
      variable( idvariable = studyid; ) ///
      analysis(estimator=WLSMV; ) ///
      model(methslf BY `methslf' *; ///
            methinf BY `methinf' *; ///
methslf@1; methinf@1; ///
         methslf WITH methinf; ///
         piadlt WITH fiadlt; ///
         piiadlt WITH fiiadlt; ///
         piiqc WITH fiiqc; )
   mat fooestmplus=r(estimate)
   mat foosemplus=r(se)
   mat list fooestmplus
   mat semplus  = J(29,1,.)
   forvalues i=1/24 {
      mat semplus[`i',1] = foosemplus[`i',1]
   mat estmplus = J(29,1,.)
   forvalues i=1/24 {
      mat estmplus[`i',1] = fooestmplus[`i',1]
   mat estmplus[25,1]=`r(CFI)'
   mat estmplus[26,1]=`r(RMSEA)'
   mat estmplus[27,1]=`r(chisquare)'
   mat estmplus[28,1]=`r(chisquare_df)'
   mat estmplus[29,1]=`r(chisquare_P)'
Stata syntax with a diagram produced by Stata.

Please note in the diagram above that you may request no coefficients, raw coefficients, or standardized coefficients to be displayed on the picture.
sem (methslf -> piadlt@1) ///
   (methslf -> piiadlt) ///
   (methslf -> piiqc) ///
   (methinf ->
fiadlt@1) ///
   (methinf -> fiiadlt) ///
   (methinf -> fiiqc), ///
   covstruct(_lexogenous, diagonal) ///
   method(mlmv) ///
   latent(methslf methinf ) ///
methslf@1  methinf@1) ///
   cov( methslf*methinf e.piadlt*e.fiadlt e.piiadlt*e.fiiadlt e.piiqc*e.fiiqc)
mat foo = r(table)
mat list foo
local test: rowfullnames foo
display "`test'"
local test: colfullnames foo
display "`test'"
estat gof, stats(all)
texdoc stlog close
capture mat drop eststata
matrix eststata = J(29,1,.)
matrix sestata = J(29,1,.)
estat gof, stats(all)
mat eststata[25,1]=`r(cfi)'
mat eststata[26,1]=`r(rmsea)'
mat eststata[27,1]=`r(chi2_ms)'
mat eststata[28,1]=`r(df_ms)'
mat eststata[29,1]=`r(p_ms)'
foreach m in 1 2 {
   if `m'==1 {
      local type "est"
   if `m'==2 {
      local type "se"
   local count=0
   foreach i in 1 3 5 7 9 11 {
      local `++count'
      matrix `type'stata[`count',1]=foo[`m',`i']
   *mat list `type'stata
   local count=10
   foreach i in 2 4 6 8 10 12 {
      local `++count'
      matrix `type'stata[`count',1]=foo[`m',`i']
   mat list `type'stata
   matrix `type'stata[7,1] =foo[`m',24]
   matrix `type'stata[8,1] =foo[`m',21]
   matrix `type'stata[9,1] =foo[`m',22]
   matrix `type'stata[10,1]=foo[`m',23]
   matrix `type'stata[17,1]=foo[`m',19]
   matrix `type'stata[18,1]=foo[`m',20]
   mat list `type'stata
   local count=18
   foreach i in 13 14 15 16 17 18 {
      local `++count'
      matrix `type'stata[`count',1]=foo[`m',`i']
   *mat list `type'stata
Comparison of MPLUS and Stata12 parameter estimates and standard errors from Example 2.

. mat combo = estmplus, eststata, semplus, sestata

. mat list combo


c1 c1 c1 c1

r1 1 1 0 .

r2 1.012 1.0119967 .145 .14460613

r3 -.037 -.03662236 .017 .0166771

r4 1 1 0 .

r5 1.07 1.0699935 .177 .17700328

r6 -.072 -.07214883 .021 .02116697

r7 .674 .6739953 .078 .07837412

r8 1.584 1.584216 .334 .3336187

r9 .092 .09209461 .189 .18908674

r10 .007 .00731822 .003 .00347975

r11 20.046 20.046243 .135 .13479361

r12 13.503 13.50289 .087 .08661336

r13 3.093 3.0929061 .015 .01500737

r14 19.807 19.806591 .186 .18646199

r15 13.397 13.396949 .093 .09320208

r16 3.119 3.1186013 .018 .01811102

r17 1 1 0 .

r18 1 1 0 .

r19 2.143 2.1432918 .309 .3094079

r20 .274 .27368695 .237 .23722743

r21 .038 .03762204 .004 .00410517

r22 4.913 4.9125067 .61 .61031889

r23 .335 .33484612 .322 .32197249

r24 .05 .05028367 .006 .00572992

r25 .989 .98936004 . .

r26 .043 .04266794 . .

r27 9.205 9.2046899 . .

r28 7 7 . .

r29 .2383 .23829239 . .

Columns are in this order: MPLUS estimates, Stata estimates, MPLUS standard errors, Stata standard errors.
Rows are particular parameter estimates. The last 5 rows are for the CFI, RMSEA, chi2, chi2 df, and chi2 p-value, respectively.
Example 3. Growth model.
RunMPLUS Syntax (output omitted):

use http://www.stata-press.com/data/r12/sem_lcm, clear
tex \newpage
tex {\bf Example 3. Growth model. }
tex RunMPLUS Syntax (output omitted):
quietly {
   * MPLUS
   runmplus lncrime0 lncrime1 lncrime2 lncrime3, ///
      model(i s |
lncrime0@0 lncrime1@1 lncrime2@2 lncrime3@3; ///
      lncrime0 - lncrime3 (a); ///
      ) output(!stdyx;)
   mat fooestmplus=r(estimate)
   mat foosemplus=r(se)
   mat list fooestmplus
   local test: rowfullnames fooestmplus
   display "`test'"
   capture mat drop semplus
   mat semplus  = J(26,1,.)
   forvalues i=1/26 {
      mat semplus[`i',1] = foosemplus[`i',1]
   capture mat drop estmplus
   mat estmplus = J(26,1,.)
   forvalues i=1/21 {
      mat estmplus[`i',1] = fooestmplus[`i',1]
   mat estmplus[22,1]=`r(CFI)'
   mat estmplus[23,1]=`r(RMSEA)'
   mat estmplus[24,1]=`r(chisquare)'
   mat estmplus[25,1]=`r(chisquare_df)'
   mat estmplus[26,1]=`r(chisquare_P)'


Here is Stata code that also features estat framework, which returns matrices with like parameters nicely bundled. I preface SEM with quietly to ask it to use its inside voice.
quietly sem (lncrime0 <-I@1 S@0 _cons@0) ///
   (lncrime1 <
-I@1 S@1 _cons@0) ///
   (lncrime2 <
-I@1 S@2 _cons@0) ///
   (lncrime3 <
-I@1 S@3 _cons@0), ///
   latent(I S) means(I S) ///
e.lncrime0@var e.lncrime1@var ///
e.lncrime2@var e.lncrime3@var)
capture mat drop eststata
estat gof, stats(all)
matrix eststata = J(26,1,.)
matrix sestata = J(26,1,.)
mat eststata[22,1]=`r(cfi)'
mat eststata[23,1]=`r(rmsea)'
mat eststata[24,1]=`r(chi2_ms)'
mat eststata[25,1]=`r(df_ms)'
mat eststata[26,1]=`r(p_ms)'
estat framework
foreach m in Gamma Phi kappa alpha Psi {
   matrix `m' =r(`m')
local i=0
foreach c in 1 2 3 4 {
   mat eststata[`++i',1]=Gamma[`c',1]
foreach c in 1 2 3 4 {
   mat eststata[`++i',1]=Gamma[`c',2]
mat eststata[`++i',1]=Phi[1,2]
mat eststata[`++i',1]=kappa[1,1]
mat eststata[`++i',1]=kappa[1,2]
foreach c in 1 2 3 4 {
   mat eststata[`++i',1]=alpha[1,`c']
mat eststata[`++i',1]=Phi[1,1]
mat eststata[`++i',1]=Phi[2,2]
foreach c in 1 2 3 4 {
   mat eststata[`++i',1]=Psi[`c',`c']
Comparisons of MPLUS and Stata12 parameter estimates and standard errors in Example 3.

. mat combo = estmplus, eststata, semplus

. mat list combo


c1 c1 c1

r1 1 1 0

r2 1 1 0

r3 1 1 0

r4 1 1 0

r5 0 0 0

r6 1 1 0

r7 2 2 0

r8 3 3 0

r9 -.034 -.03431601 .009

r10 5.338 5.3379154 .041

r11 .143 .1426952 .01

r12 0 0 0

r13 0 0 0

r14 0 0 0

r15 0 0 0

r16 .527 .52740902 .045

r17 .02 .0196198 .003

r18 .098 .09819558 .005

r19 .098 .09819558 .005

r20 .098 .09819558 .005

r21 .098 .09819558 .005

r22 .993 .99345946 .

r23 .054 .05358317 .

r24 16.246 16.245962 .

r25 8 8 .

r26 .039 .03899193 .

Columns are in this order: MPLUS estimates, Stata estimates, MPLUS standard errors.

Rows are particular parameter estimates (which ones? please see earlier output). The last 5 rows show the CFI, RMSEA, chi2, chi2 df, and chi2 p-value, respectively.