mixed.model.fnc

Copy, Paste and Adapts

model1= 'rt ~ cova1*facA*facB + (facA+facB | subject) + (1 | item)'

mod.alt= 're ~ cova1*facA*facB + (1 | subject) + (1 | item)'

mixed.model.fnc(stacked.data, model=model1, alternative=mod.alt,

poshoc='facA:facB')

mod.hit='hit ~ facA*facB + (facA+facB | subject) + (1 | item)'

mixed.model.fnc(stacked.data, model=mod.hit, poshoc='facA:facB',

family='logistic')

Objetive

It performs the estimation of mixed effects models (fixed and random effects) Mix model and Mix Logit model (for binomial variables such as success-error, etc.).

The Mix-Model model assumes the estimation of 2 types of effects: fixed and random effects. In the context of the paradigm of presenting a sample of items to a sample of subjects it means assuming that together with the effects expected from experimental manipulation there is a source of variance located in the random variation of the subjects and another belonging to the items. In addition, the model assumes the crossed existence of both sources of variance (the subjects that fluctuate randomly with respect to a common mean and the items that also do so but obviously in general the subjects pass through all the items). The mix-models substantially improve the power loss that occurs in the analysis of the minF to achieve the generalization of the fixed effects found simultaneously to the subjects and items).

mixed.model.fnc

As an example we will use the already known one used in the estimation of an Anova F1, F2 and minF model. Go to that page if you need to understand the nature of the data and the design.

Using the model argument, the user must explicitly include the model he wishes to estimate.

In the following example we request 2 models: 1 with only random intercept for subjects and items and one second to which we add random slope in subjects for all purposes (A, B and AxB). Since these factors are item intergroups we will only include random intercept in items.

We will also estimate the contrasts of simple effects of the interaction of both factors including the posthoc argument.

model1= 'vd ~ mrA*mrB + (1|subject) + (1|item)'

model2= 'vd ~ mrA*mrB + (mrA*mrB|subject) + (1|item)'

res1=mixed.model.fnc(stacked.data, model=model1, poshoc='mrA:mrB')

res2=mixed.model.fnc(stacked.data, model=model2, poshoc='mrA:mrB', graphic=T)

The user can also include a more parsimonious alternative model using the alternative argument. The function will compare both models and generate an anova table of the comparison.

Next, a model is proposed that only has as a second level random part the subject and item intercept (1 to the left of the vertical bar of the random factor).

mixed.model.fnc(stacked.data, model=model2, alternative=model1,

poshoc='mrA:mrB')

If you look at the last plot of the first figure, you can see that the residuals do not have the normal distribution expected by the model. In this case we should re-estimate the model in two phases: in the first phase we will include the boxcox=T argument to estimate the appropriate power (or logarithmic) parameter to transform the criterion variable and improve or solve the problem of incorrect distribution of ressiduals. Later we must create the new variable from the estimated power and redefine the model replacing the dependent variable with the newly one created.

res2=mixed.model.fnc(stacked.data, model=model2, boxcox=T)

We extract from the res2 list only the boxcox section of the output:

res2$boxcox

bcPower Transformation to Normality

Est.Power Std.Err. Wald Lower Bound Wald Upper Bound

vd -0.9482 0.0562 -1.0585 -0.838

Likelihood ratio tests about transformation parameters

LRT df pval

LR test, lambda = (0) 292.6936 1 0

LR test, lambda = (1) 1271.2201 1 0

As can be seen, we reject the null hypothesis of the need for a logarithmic transformation (lambda 0) such as the one that the appropriate exponent for the data is the constant 1 (lambda 1). So we must apply the transformation of the vd using the boxcox procedure using the estimated power constant -0.94822: vd.p=(vd^-0.9482 -1)/-0.9482

In relation to the two models previously estimated, we can compare them and evaluate whether or not a random slope is necessary for both factors per subject.

models.comparation.fnc(mod1=res1$estimated, res2$estimated)

#------------------------------------------------------------------

# MIXED MODELS COMPARATION MIXED

#------------------------------------------------------------------

refitting model(s) with ML (instead of REML)

Data: dat.st

Models:

object: vd ~ mrA * mrB + (1 | sujeto) + (1 | item)

..1: vd ~ mrA * mrB + (mrA * mrB | sujeto) + (1 | item)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

object 7 58211 58256 -29098 58197

..1 16 58206 58309 -29087 58174 23.425 9 0.005309 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can see that the second model with random intercept and slope for both factors, generates the lowest deviance, AIC and BIC so that despite consuming 9 more parameters and therefore less parsimonious we must select that model against the random intercept model as the only random effect to consider.

If you want to carry out the model estimation on the errors, you must include the argument family='logistic'. Of course you will have to modify the model with the name of the variable with the errors and successes (0,1). Suppose your variable is called hit the model would be:

model= 'hit ~ mrA*mrB + (mrA|subject) + (mrA|item)'

mixed.model.fnc(stacked.data, model=model, poshoc='mrA:mrB',

family='logistic')

Obviously, this last model of example proposed cannot be estimated since we do not have this variable of successes/errors in data base stacked.data.

res.mixed2.txt
res.mixed1.txt

MIX MODEL WITH NON EXPERIMENTAL OR CORRELACIONAL DATA

Starting from the known databases of students and schools from Brik and Raudenbush (1992) (Extracted from page appendix J. Fox R companion to Applied regression) The database contains the values students of 7165 children in 160 different schools . The database contains schools moreover six variables of each of the 160 schools. We must, therefore, merge the two databases, which we do with the merge.data.fnc function.

schools= read.file.fnc('herarquical.shools.Rdata')

students= read.file.fnc('herarquical.students.Rdata')

head(students)

School Minority Sex SES MathAch MEANSES

1 1224 No Female -1.528 5.876 -0.428

2 1224 No Female -0.588 19.708 -0.428

3 1224 No Male -0.528 20.349 -0.428

4 1224 No Male -0.668 8.781 -0.428

5 1224 No Male -0.158 17.898 -0.428

6 1224 No Male 0.022 4.583 -0.428

head(shools)

School Size Sector PRACAD DISCLIM HIMINTY

1224 1224 842 Public 0.35 1.597 0

1288 1288 1855 Public 0.27 0.174 0

1296 1296 1719 Public 0.32 -0.137 1

1308 1308 716 Catholic 0.96 -0.622 0

1317 1317 455 Catholic 0.95 -1.694 1

1358 1358 1430 Public 0.25 1.535 0

We combine both databases with School as pairing variable: (pairing.var='School')

students=merge.data.fnc(students,schools, add.var=T,

pairing.var='School')

#------------------------------------------------------------------

# MERGE FILES. ADD VARIABLES

#------------------------------------------------------------------

*** WARNING. In first database your register unit: School occupies more than one row.

*** Dimensions of input and output ***

dim.datos1 dim.datos2 dim.merged

rows 7185 160 7185

columns 6 6 11

One issue of particular relevance has to do with the nature of the variable that defines the second hierarchical level (School in this example). It is especially important, when applying exploratory graphics that this variable been categorical or factor. This will succeed easily transforming this variable with as.factor( )

students$School= as.factor(students$School)

We want to model the mathematics performance of students (MathAch) from variables of students and schools. We have two types of schools gathered in the sector (public and Catholic) variable and we want first assess the importance of socioeconomic status (SES) on mathematics performance for both types of schools.

First we will assess an assumption of extraordinary importance in linear models, especially in mixed ones. We discussed the necessary linear relationship that should exist between the criterion and predictors variables. To do, we will ask for one panel plot that allows us to see that relationship for both types of schools (including Sector variable in by.panel argument).

panelplot.fnc(students, dv='MathAch',

which.factor='SES',

by.panel='Sector',

regression=T)

We see that the regression slope is more pronounced in public sector schools than in Catholics. It is time to ask whether this slope may have a random character through schools (second-level), or what is the same whether the effect of socioeconomic status on performance in math is different in each school. Since we have a lot of schools, it makes no sense to request a graphic panel generates many subplots as schools there. It is highly desirable to extract a random sample of 20 schools of both types of Sector variable. Use for this purpose the random.sample.fnc function.

my.sample = random.sample.fnc(students, which.factor='Sector',

n=20, ID='School')

#------------------------------------------------------------------

# EXTRACCION DE MUESTRA ALEATORIA

#------------------------------------------------------------------

*** Se ha extraido una muestra de 20 registros por cada nivel de Sector ***

For convenience, then create a list where public and Catholic schools will separate using split.by.factor.fnc.

by.sector= split.by.factor.fnc(my.sample,

which.factor='Sector')

#------------------------------------------------------------------

# DIVIDE LA BASE DE DATOS POR FACTOR

#------------------------------------------------------------------

*** Se ha creado una lista con 2 elementos, correspondientes a los niveles del factor Sector

*** Estos son los elementos de esa lista: Public Catholic

*** Si deseas acceder a uno de ellos indicalo mediante: nombre.de.la.lista$que.lista

*** donde que.lista se refiere al elemento que deseas extraer.

We can now apply for two graphics panel (one for each type of school) where we can evaluate the regression slope of mathematics performance of students with the socioeconomic status thereof. Unlike the previous graph the argument by.panel now has School variable as value. Let it will subgraphs or panels 20 (one for school).

panelplot.fnc(by.sector$Public, dv='MathAch',

which.factor='SES',

by.panel='School',

regression=T,

title='Publicos')

panelplot.fnc(by.sector$Catholic,dv='MathAch',

which.factor='SES',

by.panel='School',

regression=T,

title='Catholic')

As we noted above, it appears that the regression slope is greater in public schools than the (Catholic) private. That seems that in the first, performance in mathematics is more linked to socioeconomic status of students than in the second.

Then we will raise a multilevel hierarchical model where the first hierarchical level would be students and second schools to which these students attend.

First we centering socieconomic level SES variable. Remember that in any regression model the intercept is the value that adopts the criterion variable and the predictor when x is 0. We must, therefore, make the regression model to estimate through zero introducing the variable socioeconomic centered (difference scores). The centering can be performed by the global average (high average) or the mean of each school.

Group mean centering

students= centering.variable.fnc(students, variable='SES', which.factor='School')

Grand mean centering

students= centering.variable.fnc(students, variable='SES')

We will use the grand mean centered because the intercept can then be interpreted as the performance value adjusted by the global average.

#------------------------------------------------------------------

# CENTRADO DE VARIABLE

#------------------------------------------------------------------

*** Se ha centrado con exito la variable SES .El centrado

*** se ha realizado sobre la gran media. Si deseas un centrado

*** en cada nivel de un determinado factor, incluye el argumento

*** que.factor. Ej: que.factor='School'

To estimate the model, we will use again mixed.model.fnc function with model argument

model1='MathAch ~ c.SES + (c.SES | School)'

mixed.model.fnc(students, model=model1)

Based on student data, we estimate a multilevel hierarchical model with MathAch variable as the dependent variable and independent variable as c.SES. This variable is assumed with a random slope in the second hierarchical level (a different slope per school).

An important assumption of multilevel models has to do with the normal distribution of random factors. In this case, we can see in the QQplot of the estimated random values for the intercept and slope of the regression are within the confidence interval for this normality.

Logistic Mix Model

In our experiment, we have accuracy variable that measure if that item was hit or error. We can modelize this variable through a logit mixed model including family argument with 'logistic' as the value.

mixed.model.fnc(dat.cl, dv='accuracy', within.factor=wf,

family='logistic')

#------------------------------------------------------------------

# LOGIT MIXED MODEL

#------------------------------------------------------------------

$proposed.model

[1] "accuracy ~ mrA + mrB + mrA:mrB + ( mrA+mrB |subject ) + (1 | item)"

$Result

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod

]

Family: binomial ( logit )

Formula: accuracy ~ mrA + mrB + mrA:mrB + (mrA + mrB | subject) + (1 | item)

Data: dat.st

AIC BIC logLik deviance df.resid

2632.2 2704.2 -1305.1 2610.2 5149

Scaled residuals:

Min 1Q Median 3Q Max

-4.1550 0.2493 0.2638 0.2822 0.3540

Random effects:

Groups Name Variance Std.Dev. Corr

item (Intercept) 0.1030702 0.32105

subject (Intercept) 0.0000000 0.00000

mrAhig 0.0201767 0.14204 NaN

mrBlong 0.0002556 0.01599 NaN -1.00

Number of obs: 5160, groups: item, 172; subject, 30

Fixed effects:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.59734 0.12149 21.378 <0.0000000000000002 ***

mrAhig 0.08114 0.17356 0.468 0.640

mrBlong 0.17823 0.17345 1.028 0.304

mrAhig:mrBlong -0.35099 0.24169 -1.452 0.146

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:

(Intr) mrAhig mrBlng

mrAhig -0.665

mrBlong -0.664 0.466

mrAhg:mrBln 0.477 -0.695 -0.718

$ExpB

Exp(B) z value Pr(>|z|)

(Intercept) 13.428028 21.378357 0.000000

mrAhig 1.084528 0.467530 0.640120

mrBlong 1.195096 1.027535 0.304169

mrAhig:mrBlong 0.703990 -1.452250 0.146432

$Anova

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: accuracy

Chisq Df Pr(>Chisq)

mrA 0.5691 1 0.4506

mrB 0.0005 1 0.9822

mrA:mrB 2.1090 1 0.1464

GoUp