Data structures are used to combine related data in a computer program. The structure you need for IGCSE is the array, which is one of the most fundamental data structures in computing.
The key attributes of arrays are:
The latter works because the single data type and contiguous memory allows the calculation of the memory address of any element in the array
(nth element address) = (start address) + n * (element size)
There are a range of standard algorithms you need to know about arrays.
An array consists of a fixed number of elements of the same data type. E.g., [1, 4, 5, 6, 2]
or ['a', 'd', 'e']
or [1.1, 2.3, 3.5, 4.7]
. Elements are indexed from 1
to length(array)
The official IGCSE way to declare an array and then populate it is like:
DECLARE marks: ARRAY[1:4] OF Integers
marks[1] ← 60
marks[2] ← 78
...
You could also declare and initialize an array using an array literal (the DECLARE
might be optional)
DECLARE marks ← [60, 78, 67, 92]
Indexing and reassigning elements:
OUTPUT marks[1] // prints 60
marks[1] ← 62 // [62, 78, 67, 92]
In Python, the similar structure that is most commonly used is a list
, which is sequence of data, but not necessarily of the same type and the list is allowed to change length. Elements are indexed from 0
to length(array)-1
In pseudocode, stick to using arrays of a fixed length and type.
You can then initialise lists to be "blank" such as
marks = [0]*4 # [0, 0, 0 ,0]
Or just initialize with an list literal
marks = [60, 78, 67, 92]
Indexing and reassigning elements:
print(marks[0]) # prints 60
marks[0] = 62 # [62, 78, 67, 92]
Appending, inserting, deleting and slicing elements from an list
Don't do this in IGCSE pseudocode!
marks.append(81) # [62, 78, 67, 92, 81]
marks.remove(78) # [62, 67, 92, 81]
print(marks[1:3])# is [67, 92]
A key skill with arrays is looping through them. Given their size is known, they are a perfect example of where to use a FOR loop. For example, looping through a linked pair of arrays
DECLARE names ← ["anika", "barb", "cat"]
DECLARE marks ← [60, 78, 92]
FOR idx ← 1 TO length(marks) STEP 1
OUTPUT names[idx], " scored ", marks[idx]
NEXT idx
Python can loop through arrays using similar for loops
names = ["anika", "bob", "cat"]
marks = [60, 78, 92]
for idx in range(len(marks)):
print(names[idx], "scored", marks[idx])
However, it can also use nicer methods like enumerate and zip for such loops
names = ["anika", "bob", "cat"]
marks = [60, 78, 92]
for name, mark in zip(names, marks):
print(name, "scored", mark)
Many of the algorithms we've seen before can be altered to work with arrays.
Total and/or Average:
DECLARE marks ← [60, 81, 78, 92, 65]
total ← 0
FOR idx ← 1 to length(marks)
total ← total + marks[idx]
NEXT idx
average ← total / length(marks)
OUTPUT total, average
Count, e.g., marks above 70
DECLARE marks ← [60, 81, 78, 92, 65]
count ← 0
FOR idx ← 1 to length(marks)
IF marks[idx] > 70 THEN
count ← count + 1
END IF
NEXT idx
OUTPUT count
Max (Exercise: modify to Min)
DECLARE marks ← [60, 81, 78, 92, 65]
max ← 0 // or use value of 1st element
FOR idx ← 1 to length(marks)
IF marks[idx] > max THEN
max ← marks[idx]
END IF
NEXT idx
OUTPUT max
New things we can do is say where in an array an item is found.
For example, modify the Max/Min algorithm to also return the index of the max/min element:
DECLARE marks ← [60, 81, 78, 92, 65]
max ← marks[1]
maxPos ← 1
FOR idx ← 2 to length(marks)
IF marks[idx] > max THEN
max ← marks[idx]
maxPos ← idx
END IF
NEXT idx
OUTPUT max, " at index ", maxPos
The linear search walks through an array looking for a particular element
DECLARE marks ← [60, 81, 78, 92, 65]
item = 78 // what we're looking for
FOR idx ← 1 to length(marks)
IF marks[idx] > item THEN
OUTPUT "Found at index ", idx
END IF
NEXT idx
OUTPUT count
This can be optimised if you only care about the first occurrence of item (a common use case) by breaking out of the loop early using BREAK, or a flag variable and a WHILE loop, or by returning the index from a function/subprocedure.
Linear search can be very slow for large data sets. Provided your data is sorted then binary search is the much better choice! It exploits a sorted list by checking the middle element and then discarding the side that does not contain the element being searched for. This halves the list at each step and makes the algorithm really fast.
NOTE: Although this is not in the curriculum, it did turn up in October 2019 Paper 22 as an algorithm to analyze with a trace table, so it is worth seeing.
left ← 1
right ← length(array)
found ← false
WHILE left < right AND NOT found DO
mid ← (left + right) DIV 2 // integer division
IF item ← array(mid) THEN
found ← true
ELSE IF item < array(mid) THEN
right ← mid - 1
ELSE
left ← mid + 1
END IF
END WHILE
IF found THEN
OUTPUT "Found at", mid
ELSE
OUTPUT "Item not in array"
END IF
Binary Search (and lots of other things) require the data is sorted.
A bubble sort is not the best choice in almost any case, but it is simple! The idea of a bubble sort is that
The following algorithm is not a bad bubble sort but can be optimised by using fact that top of array becomes sorted so the internal FOR loop does not need to go to the end of the array each time.
DECLARE data ← [60, 81, 78, 92, 65]
REPEAT
swapped ← false // flag to stop the sort
FOR idx ← 1 TO length(data)-1
IF data[idx] > data[idx+1] THEN
temp ← data[idx]
data[idx] ← data[idx+1]
data[idx+1] ← temp
END IF
NEXT idx
UNTIL NOT swapped
OUTPUT data
Selection sort works by searching the array for the biggest element and moving it to the top, then the next biggest to the second top and so on.
TODO