Binary Search
You’ve seen how linear search can be made a bit more efficient if you know you are searching a sorted list. In this step, you’ll look at a much more efficient method for locating items in a sorted list.
The algorithm is called binary search, or sometimes binary chop. To understand how it works, I’ll take you through a fictitious example.
Once again, you’ll have a go at finding a specific sports ball. Imagine the programmed computer is in control of a robotic arm. The computer can’t visually recognise different sports balls, but the arm can measure the diameter of a ball.
The balls are lined up on a long shelf, sorted from smallest to largest.
0. table tennis - 40
1. golf - 43
2. pool - 56
3. tennis - 67
4. cricket - 72
5. baseball - 76
6. volleyball - 210
7. bowling - 218
8. soccer - 223
9. basketball - 239
Now imagine that the computer is told to fetch a bowling ball, which it knows has a diameter of 218 mm.
What’s your strategy for getting the robot arm to find the bowling ball in as few tries as possible?
The binary search algorithm is perfect for this type of problem. It looks at a ball in the middle, and if that ball is too large (or too small), it knows that it now only needs to look at balls further left (or right). You need to track two numbers: the leftmost position the computer still needs to search, and the rightmost position it still needs to search. As it hasn’t started searching yet, the left is initially zero and the right is nine (the number of balls).
To start, the computer needs to choose its first ball to check. To do this, it takes the average of the left and right values, by adding them and dividing by two: 0+9 is nine, and dividing by two gives four and a half. You can round in either direction. Let’s round downwards, so the first position to check will be position four.
Sliding right to the ball at position four, the arm drops and measures the ball’s diameter. It finds that the ball there has a diameter of 72 mm, making it a cricket ball, so the arm leaves the ball where it is.
Now the computer knows that the bowling ball has a diameter greater than 72 mm, so it must lie further to the right. That means that the new leftmost position is five, and the rightmost position is still nine.
Another calculation needs to be performed to find a new search position of the arm. Again, you take the average of left and right: half of (5+9) is seven, so the new search position is seven.
The arm moves to position seven, drops, and measures the ball which has a diameter of 218 mm. This matches the bowling ball, so the algorithm is complete and the ball has been found. The bowling ball can be sent to the user.
A search for the tennis ball, which is 67 mm, works in the same way:
The left position is 0 and the right is 9. Averaging, (0 + 9) / 2 = 4.5.
The arm moves to position 4 and checks the ball, which is 72 mm. This is too large, so the tennis ball must be to the left.
The left position is still 0 and the right is now 3. Averaging, (0 + 3) / 2 = 1.5.
The arm moves to position 1 and checks the ball, which is 43 mm. This is too small, so the tennis ball must be to the right.
The left position is now 2 and the right is still 3. Averaging, (2 + 3) / 2 = 2.5.
The arm moves to position 2 and checks the ball, which is 56 mm. This is too small, so the tennis ball must be to the right.
The left and right position are now both 3, so this is the new search position.
The arm moves to position 3, and checks this ball which is 67 mm. The tennis ball has been found and can be returned.
How efficient do you think this algorithm is? How many moves does it take if the ball that the computer is looking for isn’t in the collection of balls? How does the algorithm compare to linear search when the target ball is the basketball or the table tennis ball? Share your thoughts in the comments.