Learning Outcomes
Students should be able to:
Demonstrate understanding of and use the functionality of the following constructs in a programming language:
• simple sorting techniques such as the bubble sort and the insertion sort;
• simple searching techniques such as linear and binary searching; and
• string manipulating functions, including splitting, concatenating, character searching and substring searching.
Imagine that you have three, different sized blocks and that you are required to arrange them in order of size – small to large. You could probably do this without much thought, and with no particular method. If you had five or ten blocks instead of three it would still be a relatively straightforward task. But what if you had fifty, or a hundred, or five hundred blocks? With more blocks you would need to be more systematic in how you approach the task – you would need a sorting algorithm.
Sorting Algorithms “[Sorting algorithms] are the processes implemented in a computer program to arrange the data in order.” BCS Glossary of Computing, 13th edition, p 346.
In computer systems we are concerned with sorting data rather than blocks. We might, for example, want to sort a collection of customer records by surname.
Data may be sorted in ascending order (low values to high) or descending order (high values to low). Fortunately, an algorithm for an ascending sort is easily adapted to perform a descending sort (and vice-versa).
Sorting and searching are very common and important operations in computing. Efficient database retrieval relies upon efficient searching algorithms, which, in turn, rely upon efficient sorting algorithms.
Imagine, for example, searching for a telephone number in a telephone directory that was not in alphabetical order (i.e. had not been sorted). Computer scientists have developed many different algorithms for sorting, some of which are explained below. In each case we assume that the data to be sorted is stored in an input array and the sorted data is returned in an output array.
The insertion sort algorithm works in the following way.
• Take the first item from the input array, and put it in the output array. At this stage it will be the only item in the output.
• Take the second item from the input array, and insert it into the output array. It must be inserted in the correct position to ensure that the output array remains sorted.
• Take the third item from the input array, and insert it into the output array.
As before, the insertion operation must respect the sorted order of the output array.
• etc…until the input array is empty...
The example opposite shows the steps taken during the execution of an insertion sort. The input array – [ 1 3 4 2 5 ] – is to be sorted in ascending order.
The bubble sort algorithm works in the following way.
• Scan the input array by comparing each pair of adjacent items.
– Compare the first two items in the input array
– swap them if they are not in the required order.
– Compare the next pair of items in the input array
– swap them if they are not in the required order.
– etc…until you reach the end of the input array...
• Repeat this until a complete scan of the input array is completed, during which no items need to be swapped. The example below shows the steps taken during the execution of an insertion sort.
The input array – [ 1 3 4 2 5 ] – is to be sorted in ascending order.
The purpose of a search algorithm is to locate a specified item, which we call the search term, in a list or array. A search algorithm might be used, to find, for example, the telephone number for a given individual in a telephone directory. In this case we search for a name and then return the corresponding number. In the examples below we assume that we are searching an array of characters and return the corresponding array index – i.e. the position in the array where the search term is found. If the term is not in the array then the value –1 is returned.
One version of a linear search works in the following way:
• Compare the first item in the array with the search term. If they match, terminate and return the position in the array.
• Compare the second item in the array with the search term. If they match, terminate and return the position in the array.
• Compare the third item in the array with the search term. If they match, terminate and return the position in the array. etc…until you reach the end of the array...
• If you have reached the end of the array then return –1. It is possible, however, to improve upon this algorithm providing we know that the array has been sorted.
In the version above, which we will call version 1, we must continue to search until either the search term has been matched or else we have reached the end of the array. If we know, however, that the array is sorted we may be able to stop sooner than this.
• Let’s say we are searching for the character d in the array [ a, b, c, e, g, k, m ].
• Linear search will compare the search term (d) with the array item a, then b, then c, then e.
• At this stage we know that there is no point in searching further because if d was in the array it would have been before e.
• Consequently we do not need to continue to the end of the list. A revised version of a linear search works like this.
• Compare the first item in the array with the search term. – If they match, terminate and return the position in the array. – If search item > array item, terminate and return –1
• Compare the second item in the array with the search term. – If they match, terminate and return the position in the array. – If search item > array item, terminate and return –1
• Compare the third item in the array with the search term. – If they match, terminate and return the position in the array. – If search item > array item, terminate and return –1
• etc...until you reach the end of the array... Remember that this version of linear search, which we will call version 2, only works if the array is sorted. We say that version 2 of the algorithm is more efficient than the first, because it has the potential to solve the search problem in fewer steps.
Here is an algorithm for a linear search (version 1).
begin linearSearch (array, term)
set i = 0
set found = False
while i < length of array AND not found
if ith array item = term
then set found = True
i = i + 1
end while
if not found i = –1
return = i
end linearSearch
Binary search also requires the array to be sorted, but it is more efficient than either version of the linear search above. Instead of starting at the beginning of the array and working sequentially through it abinary search starts in the middle of the array.
To see how this enables a binary search to be more efficient, imagine that we are searching an array of length 100, and consider what has been achieved after the first comparison has been made.
• Linear search starts by examining the first item in the array. If this doesn’t match the search term then the algorithm proceeds to the next item. At this stage the algorithm has, at most, a further 99 items left to process.
• Binary search starts by examining the 50th item in the array. This effectively divides the array into two halves: items 1–49 and items 51–100. If the selected item does not match the search term then, because the array is sorted, it is possible to identify one half of the array and say that it definitely does not contain the search term. At this stage the algorithm has, at most, a further 49 items left to process. Binary search works like this.
• Compare the middle item in the array with the search term. This divides the array into two smaller arrays, which we will call left and right. – If the middle item matches the search term, terminate and return the position in the array. – If middle item < search term, do a binary search on the right array. – If middle term > search term, do a binary search on the left array.
You can see that at each step the binary search algorithm reduces the effective size of the array to be processed by a factor of 0.5. This is why it is much more efficient than linear search. These last two processes are repeated until the search term can be found. For a binary search to work, it is mandatory for the target array to be sorted. We shall learn the process of binary search with a pictorial example. The following is our sorted array and let us assume that we need to search the location of value 31 using binary search.
Some more explanation will be helpful here.
• The variables hi and lo are used to store the upper and lower bounds of the array.
As the algorithm converges on the solution the values are adjusted to focus on ever smaller sections of the array.
• The variable mid is the index to the middle item in the array. As the algorithm converges on the solution, its value is adjusted to correspond to smaller sections of the array.
• The variable found is a flag to indicate whether or not the search term has been found. It is initially set to FALSE, and then updated if and when the search term is found. • The repeat loop terminates when either the search term is found or the condition hi
Possible Exam Questions
Perform Insertion sort and bubble sort on the following arrays:[8 Marks]
a.[1,4,7,5,6,10,9,8]
b.[dog, ant, elephant, zebra, fish,cat, monkey,giraffe]
Evaluate the effectiveness of each sort algorithms in finding the soilution[4 Marks]
Explain which search algorithm would be best for the array [2,1,4,3,9,10,7,6] [2 Marks]
Show how Binary search would find the value 17 in the array [1,2,4,6,8,9,11,15,16,17,18] [4 Marks]