Research Question 2
What predicts how long a student will spend working on an individual test item?
What predicts how long a student will spend working on an individual test item?
Background:
Prior research has mainly looked at predictors of how long a student takes to complete a test. That's useful research, and we've done some of it ourselves in the past. But Project ETUDE's process data allowed us to see what predicted students' use of time on individual test items. This is a real advance, because the length of time that a student takes to complete a particular test is going to be dependent on the particular test items, and most tests are a mix of different item types with different amounts of reading, different levels of difficulty, etc. Understanding time use at the level of individual items allows us to give more tailored guidance around time limits and extended time accommodations, particularly to test makers and test users.
Methods:
In this project, we looked at 3 groups of predictors of item completion time:
Student-level predictors (features of a student that will be stable throughout the whole test for that student)
(a) disability category
(b) race/ethnicity
(c) gender
(d) socioeconomic status, measured by whether the student qualified for free/reduced lunch
(e) English Learner status - whether the student was classified as needing services for students for students still becoming fluent in the English language,
(f) NAEP proficiency level - the student's level of math proficiency, measured on the basis of all of the NAEP items that they saw [this was only available for 8th graders]
(g) the student's self-reported beliefs about how important it was to do well on the test
(h) the student's self-reported level of effort put forth on the test
(i) the student's self-reported level of motivation to learn math/show high performance in math
Item-level predictors (features of a test item that will be the same for all students)
(a) item difficulty - each item was rated as easy, medium, or difficult by experts
(b) the word count for the text in the item (whether the item was long or short)
(c) the number of text-to-speech boxes that the item contained
(d) item location (how early or late in the block of 15 items a particular item was)
Student-item interaction predictors
(a) whether a student used text-to-speech features on a particular item
(b) whether a student used the Zoom feature to enlarge text on a particular item
(c) whether a student used digital scratch paper ("Scratchwork") on a particular item
(d) whether a student used any other accessibility features
We used a statistical analysis that allowed us to see whether each of the predictors mattered while controlling for all of the other predictors. For instance, we could see if gender predicted item completion time among students of the same ethnicity, taking items of the same difficulty level, using the same accessibility features. This allows us to isolate the possible effects of each predictor, and know that any associations that we find are not actually due to other predictors. For instance, if Asian students spent more time on items, this could be due to Asian students being more likely to be English learners, but our analyses looked at the associations between being Asian and item time, with all other factors (including English learner status) controlled.
Results:
In the 4th grade sample, we found that:
Some student-level predictors were not associated with the time that students spent on test items. Specifically, ethnicity and English learner status were not. However:
Gender was a significant predictor, with girls spending more time on test items than boys.
Socio-economic status was a significant predictor, with students who were eligible for free or reduced lunch spending less time on test items.
Disability category was a significant predictor:
Students who were classified as speech/language impaired were more likely to take longer on items than other students. This may be because these students had milder disabilities and were more engaged during the test, or because these students needed more time to process the language in the test items.
Students who were classified as having an emotional disturbance tended to spend less time on items than other students. This may be due to students with emotional disturbances having more difficulty regulating negative emotions (making them more likely to give up), or it may signal more noncompliant behavior.
Students who perceived the test as important, those who reported putting forth more effort, and those who reported higher levels of math motivation all spent more time on items. These student groups were all likely to have been more engaged during the testing session.
None of the item-level predictors were significant.
All of the student-item interaction predictors were significant; use of any type of accessibility feature was associated with taking substantially more time (these were the largest significant effects that we observed).
In the 8th grade sample we found that:
Most student-level predictors were not associated with the time that students spent on test items. Gender, ethnicity, disability type, and English learner status were not significant predictors. Perceived test importance and self-reported math motivation were not significant predictors either. However:
Socio-economic status was a significant predictor, with students who were eligible for free or reduced lunch spending less time on test items.
Self-reported effort was a significant predictor, with higher effort associated with taking longer on test items. This finding is consistent with other research finding that rapid responding is associated with random guessing (low effort), and taking a longer time on test items may indicate being thoughtful and engaged.
Two item-level predictors were significant:
Higher word counts per item (i.e., more text to read in an item) was associated with significantly longer item times. This seems fairly intuitive; students took longer to read items with more words, and perhaps also took more time to think about these items, if items with more text were more complex.
Items of medium-difficulty were associated with longer item times than easy or difficult items. It may be that easy items are so simple that they take less time, whereas high-difficulty items lead many students to give up quickly.
For student-item interaction predictors, use of any and all accessibility features was associated with significantly higher item time. This is, again, intuitive; using these accessibility features does take additional time.
We also used these data to explore students' "visits" to different test items after the standard time limit was up. Fewer than half of students used any of their extra time (i.e., most students finished within the standard time limit). But of those who used the extra time, most visited just a few of the items in their item block. Moreover, most students who used their extended time to (re)visit items did not experience a score change--that is, if they had been cut off without receiving the extended time, their score would have been identical.
Implications:
When accessibility features such as text-to-speech, scratchwork, and zoom are included as universal features within test forms, time expectations may need to be increased to allow sufficient time for students to use them.
If a test does not have the aforementioned accessibility features but a student has them as accommodations, extended time should be considered an appropriate accommodation.
It may be important, particularly on tests with low-stakes like the NAEP, to monitor student effort and if low, find ways to motivate the student to engage more in the test. It may be helpful to review with students any items they did not attempt or for which they rapidly guessed to determine if the student should have spent more time on the item in order to better show their underlying knowledge and skill.