Binary Search
Linear Search reminder
A reminder that a linear search will iterate through an array, checking if each element matches a goal. an example algorithm is shown below.
What is a Binary Search?
A Binary Search algorithm works by splitting the list in two and comparing the mid point. Similar to using a dictionary:
You find a page - if the current word is smaller (earlier in the alphabet) you pick another page after your current point
if the current word is smaller (later in the alphabet) you pick another page before your current point
This is a Divide and Conquer algorithm. It is generally faster than linear search but the main drawback is the list has to be sorted first (more info on sorting Bubble Sorts and Insertion Sorts).
A Binary Search requires the array of data to be sorted in order before it can be executed.
Binary Search Walkthrough
Binary Search Pseudocode
found = FALSE
goal = Input from User
REPEAT
mid = INTEGER((startpos + endpos)/2)
IF array(mid) = goal
Found = TRUE
ELSE IF array(mid) < goal
startpos = mid + 1
ELSE
endpos = mid - 1
UNTIL found = TRUE OR (startpos>endpos)
This example uses a REPEAT UNTIL loop so the condition isn't checked until there is one iteration through the array.
found = False
goal = Input from User
WHILE NOT FOUND and startpos <= endpos
mid = INTEGER ((startpos+ endpos)/2)
IF array(mid) = goal
found = True
position TO mid
ELSE IF array(mid) < goal
startpos = mid + 1
ELSE
endpos = mid - 1
END IF
END WHILE
This version uses a WHILE loop so the condition is checked before the array is traversed.
It is important to note that we cannot rely on just found = FALSE as an exit condition as it is a possibility that the goal value is not in the list. If the start pointer ever reaches the end pointer then we know we have searched the entire list.
Binary Search - Python Implementation
Note: The // operator in Python is integer division so 5//2 would equal 2. The int() function could easily have been used.
As the main function of a Binary search is usually to return the position it is important to consider what value can be interpreted as a no result
Operation of a Binary Search
For a list of N data items the average search length would be Log2N. So, for a list of 64 items, the average search length would be 6.
Further Reading
There can be a potential problem when using the binary search on extremely large lists
When the low + high value exceeds the storage limits for an integer
https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
To correct a possible overflow you can change the midpoint calculation to:
middle = low + ((high - low) / 2)
This will avoid any overflow.