(see my paper about this issue, recently accepted at IJAIED)
While I and others have shown that adaptively scheduling practice can provide significant learning benefits, there is an issue of systematic signed error that is inherent to model-based practice scheduling (model-based anything in general). This issue is present in all educational datasets in which practice was scheduled according to a model of practice (i.e., "Practice new item after X successes" or X% probability correctness). It isn't due to the parameter estimates being biased (although they frequently are). It isn't due to the model including the wrong features, or being trained on the wrong data.
The problem is simply due to the fact that if the model parameters are population-level and represent the mean learning rate from some activity (e.g., the beta in the equation Y~ b*(prior.practice)), and students true learning rate vary around that mean estimate, there will be systematic error for most students. By systematic, I mean that the model error will be signed (either tending to be positive or negative). The model will be systematically under- or over-estimating learning.
Why does this matter? It matters because along with the model predictions there are(is) some decision rule(s) that use the predictions to make pedagogical decisions. For instance, Cognitive Tutor/ASSISTments etc use "mastery criterion" type rules to decide when to switch from one topic to another. If a student is predicted to be at 95% probability or higher to correctly answer a question regarding a particular knowledge component (KC, or concept), a new KC is chosen to be practiced. So we have a system consistently under/over predicting learning and using those faulty estimates to make pedagogical decisions. Some students will be moved on to new content before they have truly "mastered" content (whether we believe that metric is appropriate is another issue) and others that has been held back too long due to underestimation of their learning.
How can we deal with this? The answer is not to improve the learner model, unless by improve we mean estimate individual learning rates, because that is the only way to engineer a suitably accurate model. This is typically challenging to achieve for a variety of reasons, namely that usually there is insufficient data to estimate learning rates at the individual level using standard techniques. Below I show via simulation just how pervasive this issue is, even when the true model (the real model of learning we wish we knew) is very similar to the estimated learner model (that we built). Then I'll show some examples of possible solutions that involve one or more of the following: including an additional parameter outside the learner model (a "meta feature") that adaptively adjusts predictions according to past prediction errors, continually re-estimating the learning rate itself, and finally repurposing an existing predictor (from R-PFA, Galyart & Goldin, 2015) for use within the learner model.
Below is a simulation comparing efficacy of typical approaches versus my adjustments. I had simulated students vary in their learning rate, but only gave the other models the average learning rate (which is basically a best-case scenario). So the question is: how good can the standard prediction (furthest left) do predicting the (simulated) actual outcomes (furthest right)? Not well!
The x axis is the trial number (up to 100). The y axis is the predicted probability. So clearly the leftmost model is overly optimistic about how similar all learners will be. So what did I do in the middle panels? I had the "learner models" in the middle panels adjust to their own error. One adjusted the prediction up or down based on the average error. Note that I averaged the signed error, so if it is not zero, then the model adjusts up or down accordingly to reduce error. The other model does that as well as re-evaluating the learning rate itself each trial, by trying slightly lower or higher learning rates and seeing how they fit prior data. If lower or higher was better, it would choose that learning rate moving forward. In short, a crude gradient descent.
Important takeaway: Adjusting to errors due the model construction (not errors made by the student) GREATLY improved model fit.
So what practical implications does this have? Well, the predicted learning is frequently used to decide when to start learning something else. So being accurate matters, lest learners waste their time on topics they know (underestimating learning) or move them on too quick (overestimating learning).
Below I show how these adjustments above make big differences in when the model thinks someone has "mastered" a topic (frequently 95% probability of correctness).
So perfectly calibrated (saying they've mastered a topic exactly when they have) would be at the zero line above. Above or below is over- or under-estimating which trial they mastered content on. Clearly, adjusting based on model error (not student error) is vital to avoid wasting student time, or overwhelming them with content that they are not prepared for.