Module 17

Repeated-measures ANOVA

Introduction

1. Repeated-measures ANOVA

1.1 What is repeated-measures ANOVA?

1.2 Example: Simple within-subject repeated-measures ANOVA

1.3 Post-hoc analysis: Correction for multiple comparisons

2. Factorial (2 x 2) repeated-measures ANOVA

3. Mixed factorial design

3.1 What is a mixed factorial design?

3.2 Example: Repeated-measures ANOVA with both between- and within-subjects factors

Interpretation

Module Exercise (4% of total course assessment)

Complete the exercise!

Introduction

One-way ANOVA (Module 08) is for analyzing data obtained from a study design with a single between-subjects factor with three or more levels.
There are other types of ANOVA for analyzing data from other study designs - Factorial ANOVA and repeated-measures ANOVA are two common types.
Generally, we use factorial ANOVA when our design has two or more factors, all between-subjects. For example, a 2 x 2 design has two between-subjects independent variables and each variable has two levels.
Repeated-measures ANOVA can be used for within-subjects designs. It can also be called within-subjects ANOVA.

1. Repeated-measures ANOVA

1.1 What is repeated-measures ANOVA?

Repeated-measures ANOVA is to test a within-subjects design. It is like one-way ANOVA (Module 08), in which there is one independent variable (IV) and one dependent variable (DV). The only difference is that the IV is manipulated within-subjects.
The same measure is repeatedly implemented to the participants across different time points or under different conditions,
The factor that is repeatedly measured/observed is a within-subject factor.
One common usage of repeated-measures ANOVA is for experimental studies with within-subjects designs. Typically, each participant goes through multiple (three or more) conditions and the same DV is measured in each condition. The research question is usually about whether the mean of the DV varies across the conditions. You will see an example below in a Stroop task (the delay in reading the ink color of a word when the text of the word is incongruent with the ink color, the text "blue" printed in red ink).
Another common design in which repeated-measures ANOVA is used is longitudinal study. In longitudinal studies, the same DV is measured at multiple (three or more) time points for each subject. Therefore, we can treat time as a within-subjects IV and use repeated-measures ANOVA to test for changes across the means over time. For example, we want to investigate whether providing training for an athlete each week can increase their performance. We set “training” as the independent variable with 3 levels, which are week 1, week 2 and week 3. For each subject from the sample, we collected their performance level at week 1, week 2 and week 3.

1.2 Example: Simple within-subject repeated-measures ANOVA

Stroop task (data file for this demonstration can be found under STROOP project on the Important files page)
In one version of the classic Stroop experiment, participants were asked to read the ink colors of some printed items. In the congruent condition, the items were color words that matched with the ink colors (e.g., RED BLUE GREEN PINK). In the incongruent condition, the items were color words that did NOT match with the ink colors with which they were printed (e.g., BLUE PINK RED GREEN). In the nontext condition, the items were not text (e.g., ⊠⥚ⷕ ▲☻∇⛃ ═⋹⠛ⶩ▄ ⴾ⫸⛥⚦). The time taken for each participant to read through the ink colors of all items printed on a sheet of paper was recorded.
Question: Did the Stroop conditions influence response time?
IV: Task (3-level): [congruent_1, incongruent_1, nontext_1]; DV: response time (in milliseconds)

Example 16.2.2 Repeated-measure ANOVA_3_task.mp4

Conclusion/ Interpretation (APA format):

A repeated-measures ANOVA showed that the main effect of task was statistically significant, F(2,198) = 93.8, p < .001, η2 = .223.
From the plot (on the right), it seems that response time was the longest for the Incongruent condition, followed by Non-text, and response time was the shortest for the Congruent condition.
Question: Was the response time for the Incongruent condition longer than the other two conditions (Congruent and Non-text)?

1.3 Post-hoc analysis: Correction for multiple comparisons

To find out which pair(s) of means were significantly different from each other, we conduct all possible pairwise comparisons across the three conditions.
As discussed in Module 08, conducting multiple hypothesis tests using the same alpha level inflates the overall (family-wise) alpha level (i.e., the overall probability of making at least Type I error across all tests increases). In other words, the more tests you conducted, the higher the chance for you to reject the null hypothesis when it is true in reality.
Theoretically, we should lower the alpha level for each of the multiple comparisons in order to maintain the family-wise alpha at the desired level. However, an easier and more practical way is to increase the p value to the same extent and compare p with the originally-determined alpha level.
There are several ways to correct the p value. Two commonly-used ones are:
- Bonferroni
  - most conservative method for multiple-comparison correction
  - for less equal variance data
  - when no. of comparisons is large
- Tukey
  - less conservative than Bonferroni
  - for more equal variance data
  - when no. of comparisons is relatively small
  - some researchers argue that Tukey correction is not applicable for within-subjects design
- Holm
  - It's like Bonferroni but the conservativeness of the correction depends on the rank order of the p value for each pairwise comparison.
  - Specifically, the largest p value was not corrected at all (multiplied by 1), the second-largest one would be multiplied by 2, the third-largest one would be multiplied by 3, etc.. The smallest one (i.e., the m-th largest one if you have m pairwise comparisons) would be multiplied by m (i.e., the same as Bonferroni).

[Note: The problem of alpha inflation is present no matter you're using between- or within-subjects design. Therefore, correction for multiple comparisons is needed for both independent-samples and repeated-measures ANOVA, as long as you're conducting post-hoc analysis.

In this example, we use Bonferroni post-hoc tests to do the multiple comparison. You can see the details in the above table.
Based on the results from jamovi, we can draw the following conclusion that response time for the Incongruent condition was significantly longer than those for the Congruent and Non-text conditions.
Conclusion/ Interpretation (APA format):

A Bonferroni post-hoc analysis indicated that the response time for the Incongruent condition (M = 676.13, SD = 165.82) was significantly longer than that for Congruent (M = 492.47, SD = 130.07; t(198) = 13.36, p < .001) and than that for Non-text (M = 548.25, SD = 133.93; t(198) = 9.30, p < .001). The response time for the Non-text condition was significantly longer than that for the Congruent condition (t(198) = 4.06, p < .001), meaning that the response times for Incongruent and Congruent were, respectively, the longest and the shortest among the three conditions.

2. Factorial (2 x 2) repeated-measures ANOVA

Let's use the Stroop experiment example in to illustrate how to use repeated-measures ANOVA. We have two factors, namely, "Task" and "Session". "Task" refers to the congruent task and incongruent task in the Stroop experiment. Hence, there are two levels in "Task", congruent and incongruent. In addition, "Session" refers to the time point doing the Stroop experiment. We have collected the data in two sessions, namely, the first and second sessions. Therefore, "Session" has two levels which are first and second. Now, we can make a two-way, within-subject design and the dependent variable is still response time.

In short, here is the 2x2 factorial design:

Task (2-level): [Congruent. Incongruent]
Session (2-level): [First, Second]

Q: Do Task and Session have an effect on response time? (α = .05)

A: We used two-way, repeated-measures ANOVA to examine.

Step 1: Perform statistical analysis in jamovi (Please use full-screen mode).

module 16 repeated-ANOVA_stroop.mp4

Similarly, we can apply Bonferroni post-hoc tests to do the multiple comparison (not demonstrated in the example video), by checking the "Bonferroni" corrections option under the Post Hoc Tests tab.

Based on the results from jamovi, we can draw the following conclusion.

Conclusion/ Interpretation (APA format):

A repeated Measures ANOVA showed that:

the two-way interaction effect between session and task was statistically significant, F(1, 23) = 5.193, p = .032, partial η2 = 0.184.
the main effect of session was statistically significant, F(1, 23) = 7.857, p = .001, partial η2 = 0.255.
the main effect of task was statistically significant, F(1, 23) = 43.568, p < .001, partial η2 = 0.654.

(and report the post hoc or planned contrast results according to your hypothesis)

3. Mixed factorial design

3.1 What is a mixed factorial design?

A mixed factorial design has at least two factors, in which at least one is between-subjects and at least one is within-subjects.
In jamovi, the ANOVA for mixed factorial designs is conducted under "repeated-measures ANOVA".

3.2 Example: Repeated-measures ANOVA with both between- and within-subjects factors

Based on Section 2 above, we now add a new between-subject factor, Gender, into the model. Now we have a 2x2x2 (Session x Task x Gender) mixed factorial design:

Two within-subject factors
- Session (2-level): [First, Second]
- Task (2-level): [Congruent. Incongruent]
One between-subject factor
- Gender (2-level): [Female, Male]

Example 16.3.2 Repeated-measure ANOVA_between_subject.mp4

Interpretation

The 3-way interaction effect between Session, Task, and Gender was not significant, F(1,98) = 2.28, p = .135, partial η2 = 0.02.
Both of the 2-way interaction effects between Session and gender (F[1,98] = 0.03, p = .87) and between Task and gender (F[1,98] = 1.36, p = .25) were not significant. The 2-way interaction between Session and Task was significant, F(1,98) = 10.81, p < .001, partial η2 = 0.10.
The main effects of both within-subjects factors were significant, Session: F(1,98) = 41.04, p < .001. partial η2 = .30; Task: F(1,98) = 220.33, p < .001, partial η2 = .70.
The main effect of gender was not significant, F(1,98) = 2.25, p = .14.

You can conduct post-hoc analysis to further characterize the significant interaction effect(s) or main effect(s) [especially for those with 3 or more levels], or some planned comparisons according to your original hypotheses. Results from these comparisons are usually reported after the above standard reporting of the main and interaction effects.

Module Exercise (4% of total course assessment)

Complete the exercise!

- Now, if you think you're ready for the exercise, you can check your email for the link.
- Remember to submit your answers before the deadline in order to earn the credits!

Page updated

Google Sites

Report abuse