Cognitive Load Theory

Bibliography

1.         Grace Hudson (2021) How learning happens (Twitter thread)

2.          Aygil Takir (2012) The Effect of an Instruction Designed by Cognitive Load Theory Principles on 7th Grade Students’ Achievement in Algebra Topics and Cognitive Load

3.          Fred G (1993) The Efficiency of Instructional Conditions: An Approach to Combine Mental Effort and Performance Measures

4.          Justin Sung (2022) Practical Considerations for Independent Student Learning: A Research Synthesis

5.          Justin Sung (2023) I got called out by a best-selling author | Scott H. Young

6.          Pollock (2002) Assimilating complex information. Learning and instruction

7.          Chi (1982). Expertise in problem solving.

8.          Paul A. Kirschner (2002) Cognitive load theory: Implications of cognitive load theory on the design of learning

9.          John Sweller (2011) Cognitive Load Theory

10.      Geary (2008) An evolutionarily informed education science

11.      Schnotz (2007) A Reconsideration of Cognitive Load Theory

12.      Paas (1994) Variability of worked examples and transfer of geometrical problem-solving skills: A cognitive-load approach

13.      Geary (2005) The Origin of Mind: Evolution of Brain, Cognition, and General Intelligence

14.      de Groot (2008) Thought and Choice in Chess

15.      Bandura (1986) Social foundations of thought and action: A social cognitive theory

16.      Xie (2000) Prediction of mental workload in single and multiple tasks environments

17.      Clark (2011) Efficiency in learning: Evidence-based guidelines to manage cognitive load.

18.      Armstrong (2010) Natural learning in higher education

19.   Ericsson (1995) Long-term working memory

20.   McCurdy (2020) Theories of the generation effect and the impact of generation constraint: A meta-analytic review

21.   Eysenck (2020) Cognitive psychology a students handbook

22.   Paas (2010) cognitive load measurement as a means to advance cognitive load theory


Script & Further Reading

Introduction

The most hyped theory in education for the last few decades, might be a myth.

This video will cover what Cognitive load theory is, why I think the theory is overhyped, it's issues, and why this matters to everyone, not just teachers if it is actually a myth

You might not have heard about the theory, but have probably heard or used related terms.

mental effort, mental load, brain power, brain capacity, brain storage, brain RAM or loads of other terms.

the idea that we have a cognitive load limit.

Not unique to cognitive load theory, but what you do to help with that limit, probably comes from the theories instructional procedures.

things you could do to reduce the load.

These procedures and related effects are used by educators all the way through formal education, [1] online courses [4]

to life advice for organizing your time, attention, and general experiences.

but the theory is not only unproven, it can't be proved right or wrong.

What CLT is

John Sweller, alongside many others, have developed and researched cognitive load theory since the late 1970s.

Looking at how students solve problems, they realised students performed better when there was less to think about.

After plenty of experiments, they noticed patterns in performance.

They suggested this was due to certain effects.

Then suggested instructional procedures to utilize the effects, emphasizing the effects work when they are all considered together. Not separate.

However, there needed to be some explanation for all of this.

By the end of the 1980s cognitive load referred to demands on working memory storage and processing.

And that cognitive load was built by 2 different types.

Intrinsic and extraneous load.

people problem solving would use means-ends analysis.

Holding the problem, the desired goal, sub goals and other related information together in working memory.

That resulted in a cognitive load.

By removing the goal, goal free problems.

or removing parts of problem solving, worked-out examples.

the load would decrease because the extraneous load was reduced.

The split attention and modality effects were discussed early on.

Mentally integrating information is harder when it's split by space or time.

Decreasing information distance by putting text explanations next to images would make things easier.

Lower the cognitive load.

The modality effect is the same but doesn't need to be the same type of information.

Someone speaking instructions while you look at an image.

Sweller found that reducing this extraneous load helped problem solving.

The focus up until the late 1990s was about reducing extraneous load focused on instructional procedures because

intrinsic load refers to the inherent nature of the learning task and therefore was assumed to be fixed [11]

However, studies looking at variability in tasks found that more variability leads to better learning transfer. [12]

But more variability would have been more extraneous, bad cognitive load, so it shouldn't have helped. But it did.

This is where germane load was introduced. A good cognitive load as it were.

But that isn't the only issue the original theory has.

Reducing split attention doesn't always work.

“In fact, a split-attention effect only occurs when different sources of information are unintelligible in isolation and therefore need to be mentally integrated.” [11]

however, some information can be understood in isolation.

After more testing, they introduced the redundancy and expertise reversal effects.

If information says the same thing, or something already known, it is redundant and therefore is extraneous load and should be avoided.

but not always.

the instructional design should match an individuals level of expertise, with room for variation.

Then in the late 2000's Sweller combined Geary's ideas around knowledge to create a broader evolutionary framework. [13]

And the framework is what people try to share and implement, built from cognitive load theory.

an instructional theory based on knowledge of human cognition.

Why it is overhyped

The theory, now suggesting we acquire biological primary and secondary knowledge.

Biological primary knowledge relating to biological primary skills like listening or speaking. [10]

The theory being we learn to speak unconsciously, without explicit instruction, and are internally motivated.

Speech therapists were used as a caveat here.

But as I mentioned in the video about reading, everyone has educators to help them learn to speak, and lots need further instruction.

The deaf being pretty good examples.

So speech does need explicit instruction sometimes.

Some deaf people don't learn to speak because they use sign language.

They are internally motivated to learn, a way of communicating, not necessarily speaking.

So when Sweller says:

We do not need to be motivated by others to acquire these language skills [9]

I disagree. Other people are reasons why we learn language skills.

The environments we are in and experiences we have.

In addition to this, unconscious learning or implicit learning, happens in lots of experiences including reading, which is categorized as a biological secondary skill.

Alongside writing because

“Biologically secondary knowledge is knowledge that has become culturally important and needs to be acquired in order to function appropriately in a society” [9]

But I would argue speaking and listening is culturally important and is required to function in society. 

Also reading expertise is far from a simple yes or no skill.

The theory argues biological secondary skills are unlikely to be learned without explicit instruction.

However, unlikely doesn't mean impossible.

and we learn from experiences, empiricism, so we can learn the skills without explicit instruction.

Effectiveness will differ but it's still possible.

The reasons described to separate these skills is ease of acquisition.

We may not be motivated to learn to read and write and so learning reading and writing is likely to require considerable conscious effort over long periods of time.” [9]

another quote that I think is useful context here

We do not need educational systems and procedures to teach people to listen and speak. In contrast, without schools, most people will not learn to read and write [9]

However, home schooling is a thing.

Much like a parent teaching a child early year skills before going into formal education.

So if the example skills could fit into both biological primary and secondary knowledge categories, depending on the individual and their situation.

the rules aren't always followed because it is, most of the time.

Meaning this categorization isn't lawful.

a term in science used to describe something that happens the same way regardless of circumstances or context.

My question.

Why are we separating these skills or knowledges?

This is important because the theory focuses on secondary knowledge and skills.

The argument being the skills are acquired differently.

we acquire biologically primary information in a manner that is very different from the manner in which we acquire biologically secondary information [9]

So how do you apply the theory to practice for primary skills, if they are acquired so differently?

Cognitive load theory claims only validity for the acquisition of biologically secondary knowledge, because this is where working memory is needed. [11]

Yet the instructional design principles work for so called primary skills like speaking.

If the theory should only be applied to secondary skills, but works for primary skills and the effects are inconsistent, with each person ideally needing different learning experiences,

that sounds like a lot of limitations and challenges to put the theory into practice.

I don't get what is the hype all about?

What are it's issues

I am not a qualified teacher. But the foundations of this theory are still applied to me.

And every organism, but mostly intended for humans.

It states 5 principles that underpin all of this work.

We store information. - information store principle

We obtain information from others. - borrowing and organizing principle

We generate novel information. - randomness as gensis principle

We restrict generating novel information to protect stored information - narrow limits of change principle

We use stored information to determine how we behave. - environment organsing and linking principle

Store information

Long term memory is a well documented theory in cognitive psychology.

One idea is that genomes store information, but there is no agreed way to measure the size or information density of a genome.

de Groot's original research from 1965 was translated from Dutch to English 20 years later and has since been re-published. [14]

It was about chess grandmasters.

showing that better chess players are better at remembering chess positions

unless the positions are random piece placements.

this was put down to better players having lots of board positions in long term memory.

Thus long term memory must be used to help with performance

“All expertise, on this view, is determined by what is stored in the long term memory” [9]

I am not sure how that applies to reading expertise tho.

Only memorizing sight words is potentially dangerous as I mentioned in the last video.

Remembering things is obviously important, but what gets stored is another question that can't really be checked.

We can't take out a long term memory store to see what was stored, it's all in theory.

Borrowing & organizing

Borrowing or acquiring information from others is also a little, improvable.

Borrow suggests we can give it back which obviously we can't.

but acquire suggests others have a store which we have gained access to somehow, to either make a copy or adjust.

The theory suggesting we reorganize it.

But if we have access to their store, and they are an expert, why don't we copy it.

In the case of chess that would be pretty handy.

Sweller references Bandura's social learning theory here. [15]

We learn from others. Which is learning from experience, empiricism.

But observation isn't explicit instruction.

Again, this challenges ideas about biological primary and secondary skills, because if we learn mostly through observation, why is instruction so important.

“the theory assumes that learners acquire domain-specific information that is best obtained from other people. All the cognitive load instructional effects depend on these assumptions.” [9]

but if we acquire without interacting with them, are we acquiring from their long term memory store?

or are we developing our expertise from our experience watching them?

that would suggest we are generating novel information which in cognitive load theory says we rarely do.

Sweller mentions schema, the stores of information in long term memory, are different for each person.

because of the reorganization.

I am going to assume reorganization isn't a choice because we can't copy and use schema from other people.

but the theory suggest these schema grow from acquired information and that constructing knowledge just happens naturally.

“We have evolved to construct knowledge. It is a biologically primary skill.” [9]

arguing that theories about discovery learning for knowledge construction, is no better than acquired knowledge from explicit instruction.

going on to say

“We have neither theoretical reasons nor empirical evidence that withholding information from learners results in better learning.” [8]

However desirable difficulty is exactly that.

experts and novices requiring different levels of problems which is adjusted by witholding information. Adding constraints.

The worked example effect they started with.

worked examples or variations of partial completion.

Unless I am missing something, they have done the empirical research, with evidence, they said no one has done.

with this framework being built from natural processes, most of the ideas come from things like evolution.

however, there isn't really a distinction between information and knowledge, or what different knowledge is.

apart from the biological primary and secondary categorizations.

Generative novel information

the third principle, randomness as genesis

talks about generating novel information. If information is useful, its stored.

Much like the borrowed and organizing principle.

“Dealing with familiar problems in this manner is critical to problem-solving skills but is unlikely to result in the generation of new knowledge. In contrast, dealing with novel, unfamiliar problems has the potential to create new knowledge. New knowledge can be generated when we discover a new procedure or concept during problem solving” [9]

a new procedure or concept.

But it doesn't say how different a procedure needs to be, to be new.

In maths, is a different number enough. negative numbers, franctions, decimals.

If you add numbers together, procedures will differ with each number. When if ever is it new?

As borrowing information requires reorganising, isn't that new because no one else has that reorganisation?

“We need to understand that teaching learners to be flexible and creative requires us to teach them to engage in random generate and test.” [9]

flexible and creative at solving problems yes.

but why separate learning experiences into borrowed and generated when information and knowledge is unique to each individual?

“Simply asserting that encouraging learners to engage in generative, constructivist, creative activities will be beneficial is inappropriate in the absence of data.” [9]

again proving whether information is new or novel is impossible as we can't check what is stored in long term memory. It is a theory.

and what we test is practice related to performance.

And the generation effect shows, with data, that information is remembered better when we generate it, rather than it being given to us. [20

Narrow limits

moving to the narrow limits of change principle, this relies on the work around working memory limits in capacity and duration.

limited to 7 items for around 20 seconds for storage, then 3-4 items when processing.

“Processing refers to combining, contrasting, or dealing in some manner with multiple elements.” [9]

unless I am missing something, all we can do to test processing, is to look at brain activation, which isn't very specific.

I hope the brain is active when we are thinking.

As to the item limits, what counts as an item?

A number. A year which is 4 numbers. A phone number which is 10 plus numbers.

Schemas and chunks have been suggested ways to group information together.

How the memory palace and other memory techniques have been suggested to work.

But this means although the limit of 7 might be accurate, 7 what's, will change for each person, often impacted by their level of expertise.

Some items going through working memory straight into long term memory, because of their significance. [21]

Environment organizing and linking

When considering the relationship between working memory and long term memory this is where there is again lots of theories.

long term working memory has been suggested alongside other ideas about how these work together. [19]

for those unaware, short term memory theories have been rejected by most cognitive psychologists, as theories have evolved into working memory.

but these can't be proven either.

the fifth principle is that

“Working memory uses signals from the environment to determine which aspects of long-term memory are relevant to current processing.” [9]

linking and organizing information perceived from the environment, to current thinking.

whether that is working memory, long term working memory or something else is up in the air.

Whatever it is is, It must be different from working memory because

“there are no known limits to the amount of organized information held in long-term memory that can be cued by appropriate environmental signals” [9]

so assuming information is stored, we don't know what, when, or how specific the information is.

we can't measure specifics about where it comes from, only that we learn from our experiences

and item limits vary with each individual case.

Those limits also don't apply to cued information from prior experiences which would therefore impact our experienced cognitive load, which is what the theory is built from.

Why it matter to everyone

This is where another term is brought into the theory, element interactivity.

Again, unless I am missing something, there's no specifics on what an element is, just that lots of them interacting is high cognitive load.

less interaction is lower cognitive load.

with the intrinsic load being fixed by the complexity of knowledge being acquired. Element interactivity can't be changed.

But extraneous load is how knowledge is acquired, altered by experience, so element interactivity can be changed.

If a problem has elements that need to interact, like math symbols and numbers, then it has an intrinsic load.

That could be high or low, depending on a persons level of expertise.

but it is a fixed load.

“This heavy working memory load is not caused by the need to process many elements, but rather by the need to process many elements simultaneously” [9]

However, I don't see any mention of some element interaction requiring more or less processing than others.

So instead of amount of elements interacting, the severity or difficulty of the interaction.

Again as this is all in theory, we can only measure brain activation, which doesn't tell us anything specific.

but these interactions are what is used to explain understanding.

“the difference between knowing a correct symbol and knowing how to deal with an equation can be expressed entirely in element interactivity terms” [9]

epistemology as the philosophical study of knowledge shares a few different views on what it means to understand something.

but taking this view of element interactivity.

“further understanding consists of more information stored in long-term memory.” [9]

Thus understanding something requires more element interactivity.

3 x 4 = 12 but is also 4 + 4 + 4 and 6 x 2 etc...

I would personally say depth of understanding rather than levels of element interactivity.

But essentially, different practice has a different intrinsic cognitive load.

Emphasis on practice.

Practice is over time, not a snapshot.

We all practice, and what that looks like is what the procedures are all about.

variable practice was mentioned earlier as a reason for germane load to be introduced to the theory.

“Rather than just learning how to use an equation, a task that is relatively low in element interactivity, learners also had to learn which equations were appropriate at which time, a task that requires the processing of many more interacting elements.” [9]

But the intrinsic load doesn't change, it is fixed.

The difference is the grouped intrinsic load for the tasks of a certain period in time.

A practice session having higher intrinsic load relative to another.

The isolated elements effect is also attributed to intrinsic load.

Not by task intrinsic load, but by the order in which tasks were done.

“The students who were presented with the elements in isolated form first performed better on subsequent test problems, providing an example of the isolated elements effect.” [9]

So making a task simpler ends in better performance.

An observation found many years before this theory.

“extraneous load is under the control of instructors and so the interacting elements due to extraneous cognitive load can be reduced or eliminated by changing instructional procedures.” [9]

goal free problems or removing the goal was an early effect.

But as most problems don't meet the requirements for it to be effective, they recommend the worked examples effect instead.

“It is demonstrated when students learn more by studying a problem and its solution rather than solving the problem themselves.” [9]

However, from what I have read it's not that students learn more or better, rather quicker.

Emphasizing a point that Sweller has said,

“None of the effects should be considered in isolation from the theoretical constructs that gave rise to them.” [9]

Combining worked examples with other principles of practice.

To me, this effect could be explained by experience of effective problem solving.

The borrowing and generating principles. Or just learning from experience. Empiricism.

It also serves as a form of corrective feedback, which when problem solving alone, you may not get.

So seeing solutions and feedback ends in better performance.

Another observation found years before this theory. Demonstration.

The other mentioned effects also relate to observations seen and expressed before this theory.

Element interactivity is what cognitive load is.

It's what the mental effort, mental load, or brain power we refer to, is.

But how do we measure it?

If we can't measure it, there is a lot of guess work putting these procedures into practice.

Well we can't.

This graph is the best I have found which estimates different loads.

Peak, average, total over a time limit, and the different types of cognitive load but.

This doesn't consider residual load, so where people are when we start a task.

and more importantly, it's based of off subjective questionnaires.

Likart scales of 1-9 of how much effort was this. [2]

1-9 how difficult was that task.

Some have looked for substitutes we could use instead of cognitive load like, mental efficiency. [17]

Pasted image 20220131145942.png

Higher perceived mental effort and lower performance is lower efficiency.

Lower perceived mental effort and higher performance is higher efficiency.

But the effort is still subjective, and performance is specific to the metrics measured.

practice is what we all do to learn, but these instructional procedures only work in very specific cases.

and the cognitive architecture used as evidence is at the moment unprovable.

If a myth is a misrepresentation of the truth or widely held false belief or idea.

I think the Cognitive load theory is full of myths.

Natural learning

With clear and well stated objectives, learners can use natural learning to develop skills. [18]

Creating clear well stated objectives is a skill learners should develop in formal education.

But if people have set a goal, manage available resources, seek help and feedback, while doing related tasks, ie practice.

They will develop the skills.

yes, explicit instruction can help, but when that isn't available, like after school, how do people learn.

If they have always learned by following instructions, or using worked out examples built for specific performance metrics, like standardized tests.

their learning abilities in other situations will be less effective, which is adult life, most of peoples life times.

when doing the wrong thing better it makes it worse.

Being better at following instructions, can make you worse at figuring out complex problems.

for me this means practice design is more important than instructional procedures

I take an ecological approach to learning science so watch this video if that interests you, and subscribe to hear my thought on other related topics.