Algorithmic Efficiency (Big O)

This page is taken from a previously written Google Site for AQA Computing. There is some detail on this page that does not form part of the specification for CIE or will be examined in any way. However, this is useful background for more general computer science understanding and covers the expectation that you are able to understand that different algorithms can be compared (as required in 19.1). Any references made are to the AQA textbook, not your CIE textbook.

In terms of the CIE specification, the video below should give sufficient depth. This video is the final part of the series, so if you want further background, read the page below or follow the YouTube video link and look for the earlier videos.

9618 specification regarding algorithmic efficiency:

Show understanding that different algorithms which perform the same task can be compared by using criteria (e.g. time taken to complete the task and memory used)
- Including use of Big O notation to specify time and space complexity

Computational Complexity

It is important to be able to compare algorithms as there may be multiple algorithms for achieving a solution to a problem and they may vary considerably in their speed and use of memory. The relative speed of an algorithm is referred to as its time complexity. The relative amount of memory an algorithm uses is referred to as its space complexity. Combining these two forms of complexity gives us the algorithm’s computational complexity.

Most algorithms transform input objects into output objects. The running time of an algorithm or a data structure method typically grows with the input size, although it may also vary for different inputs of the same size. Also, the running time is affected by a lot of factors, such as the hardware environment and the software environment. In spite of the possible variations that come from different environmental factors, we would like to focus on the relationship between the running time of an algorithm and the size of its input. In the end, we would like to characterize an algorithm's running time as a function f(n) of the input size n.

Why Do We Bother?

Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time and space requirements. See Wikipedia. The practical goal of algorithm analysis is to predict the performance of different algorithms in order to guide design decisions.

During the 2008 United States Presidential Campaign, candidate Barack Obama was asked to perform an impromptu analysis when he visited Google. Chief executive Eric Schmidt jokingly asked him for “the most efficient way to sort a million 32-bit integers.” Obama had apparently been tipped off, because he quickly replied, “I think the bubble sort would be the wrong way to go.” Here's the clip .

This is true: bubble sort is conceptually simple but slow for large datasets. The answer Schmidt was probably looking for is “radix sort” (see radix sort ).

So the goal of algorithm analysis is to make meaningful comparisons between algorithms, but there are some problems:

- The relative performance of the algorithms might depend on characteristics of the hardware, so one algorithm might be faster on Machine A, another on Machine B. The general solution to this problem is to specify a machine model and analyze the number of steps, or operations, an algorithm requires under a given model.
- Relative performance might depend on the details of the dataset. For example, some sorting algorithms run faster if the data are already partially sorted; other algorithms run slower in this case. A common way to avoid this problem is to analyze the worst case scenario. It is also sometimes useful to analyze average case performance, but it is usually harder, and sometimes it is not clear what set of cases to average over.
- Relative performance also depends on the size of the problem. A sorting algorithm that is fast for small lists might be slow for long lists. The usual solution to this problem is to express run time (or number of operations) as a function of problem size, and to compare the functions asymptotically as the problem size increases.

The good thing about this kind of comparison that it lends itself to simple classification of algorithms. For example, if you know that the run time of Algorithm A tends to be proportional to the size of the input, n, and Algorithm B tends to be proportional to n², then you would expect A to be faster than B for large values of n.

Extract from greenteapress

Expressing Complexity

Measuring the complexity of an algorithm is not as straightforward as it may first appear. One possible way of measuring it is for the algorithm to be written in a programming language and then run on a computer and timed. However the timings generated by this method are dependent on the speed of the computer and the efficiency of the programming language. Therefore this is a crude way to measure an algorithm’s time complexity. Instead it is better to measure the speed of the algorithm based on the number of operations it requires to be carried out.

A useful way of demonstrating different methods of solving a problem is to sum all the natural numbers between 1 and n. E.g. 1+2+3...+49+50 where n=50.

The A2 textbook shows two algorithms (page 14/15) for doing this. In VB.NET we could use the following program:

Module Module1

Dim SWatch As New System.Diagnostics.Stopwatch ' use this to time the algorithm's execution time

Sub Main()

' There is no validation in this code

Console.WriteLine("Please enter a positive, whole number")

Dim num As ULong = Console.ReadLine

Console.WriteLine("Answer: " & SumNaturalNumbersLoop(num) & " in time:" & SWatch.Elapsed.ToString)

SWatch.reset() ' clear the stopwatch

Console.WriteLine("Answer: " & SumNaturalNumbersGauss(num) & " in time:" & SWatch.Elapsed.ToString)

Console.ReadKey() ' pause the program before termination

End Sub

Function SumNaturalNumbersLoop(n As Long) As ULong

' This method uses a loop to add the numbers together

SWatch.Start()

Dim a As Integer

Dim count As ULong

For a = 1 To n

count += a

SWatch.Stop()

Return count

End Function

Function SumNaturalNumbersGauss(n As Long) As ULong

' This methos uses Gauss' method of adding sequential integers together

SWatch.Start()

Dim count As ULong = (n * (n + 1)) / 2

SWatch.Stop()

Return count

End Function

End Module

Which method is more efficient? Run it and see. Is there a difference given larger increases in values for n?

Another example is the VB implementation of the prime factor algorithm from page 17 (incorrectly stated that it returns factors, when it really returns prime factors)

Module Module1

Sub main()

Dim j As Integer

For j = 2 To 100

Console.Write(j & "=")

FindFactors(j)

Console.WriteLine()

Console.ReadLine()

End Sub

Sub FindFactors(n As Integer)

Dim Factor As Integer

While n > 1

Factor = LeastFactor(n)

Console.Write(Factor & " ")

n = n \ Factor ' \ is an integer division

End While

End Sub

Function LeastFactor(n As Integer) As Integer

Dim i As Integer = 1

i += 1

Loop Until n Mod i = 0

Return i

End Function

End Module

Order of Growth

Suppose you have analyzed two algorithms and expressed their run times in terms of the size of the input: Algorithm A takes 100 n + 1 steps to solve a problem with size n; Algorithm B takes n² + n + 1 steps.

The following table shows the run time of these algorithms for different problem sizes:

At n=10, Algorithm A looks pretty bad; it takes almost 10 times longer than Algorithm B. But for n=100 they are about the same, and for larger values A is much better.

The fundamental reason is that for large values of n, any function that contains an n² term will grow faster than a function whose leading term is n. The leading term is the term with the highest exponent.

For Algorithm A, the leading term has a large coefficient, 100, which is why B does better than A for small n. But regardless of the coefficients, there will always be some value of n where a n² > b n.

The same argument applies to the non-leading terms. Even if the run time of Algorithm A were n + 1000000, it would still be better than Algorithm B for sufficiently large n.

In general, we expect an algorithm with a smaller leading term to be a better algorithm for large problems, but for smaller problems, there may be a crossover point where another algorithm is better. The location of the crossover point depends on the details of the algorithms, the inputs, and the hardware, so it is usually ignored for purposes of algorithmic analysis. But that doesn’t mean you can forget about it.

If two algorithms have the same leading order term, it is hard to say which is better; again, the answer depends on the details. So for algorithmic analysis, functions with the same leading term are considered equivalent, even if they have different coefficients.

An order of growth is a set of functions whose asymptotic growth behavior is considered equivalent. For example, 2n, 100n and n + 1 belong to the same order of growth, which is written O(n) in Big-Oh notation and often called linear because every function in the set grows linearly with n.

All functions with the leading term n² belong to O(n²); they are quadratic, which is a fancy word for functions with the leading term n².

The following table shows some of the orders of growth that appear most commonly in algorithmic analysis, in increasing order of badness.

For the logarithmic terms, the base of the logarithm doesn’t matter; changing bases is the equivalent of multiplying by a constant, which doesn’t change the order of growth. Similarly, all exponential functions belong to the same order of growth regardless of the base of the exponent. Exponential functions grow very quickly, so exponential algorithms are only useful for small problems.

VB.NET program to rearrange a sequence of input discs. The program is based on the A2 book's algorithm on page 21. However, while the book uses 2 inputs, this is pointless when one can determine n from the length of input.

Module Module1

Dim swatch As New System.Diagnostics.Stopwatch

Sub Main()

Console.WriteLine("Please enter sequence of discs (@ as black, O as white)")

Dim sDiscs As String = Console.ReadLine

' create a new character array and split the input into characters

Dim chDiscs As Char() = sDiscs.ToCharArray

' reform the character array elements back into one string

Dim sNewDiscs As String = New String(RearrangeDiscs(chDiscs))

Console.WriteLine(sNewDiscs & " in a time of " & swatch.Elapsed.ToString)

Console.ReadLine()

End Sub

Function RearrangeDiscs(LineOfDiscs As Char()) As Char()

swatch.Start()

Dim n As Integer = UBound(LineOfDiscs) - 1

' The algorithm on page 21 is modified to account for array elements starting at 0, not 1

For NumberOfPairs As Integer = n To 1 Step -1

For j As Integer = 0 To NumberOfPairs

If LineOfDiscs(j) = "@" And LineOfDiscs(j + 1) = "O" Then

' because of the scenario, we do NOT have to physically swap them over

LineOfDiscs(j) = "O"

LineOfDiscs(j + 1) = "@"

End If

swatch.Stop()

Return LineOfDiscs

End Function

End Module

Big-O Notation (AKA Big-Oh)

The following extract is taken from an excellent StackOverflow response.

Big-O shows how an algorithm scales.

O(n^2):

- 1 item: 1 second
- 10 items: 100 seconds
- 100 items: 10000 seconds

Notice that the number of items increases by a factor of 10, but the time increases by a factor of 10^2. Basically, n=10 and so O(n^2) gives us the scaling factor n^2 which is 10^2.

O(n):

- 1 item: 1 second
- 10 items: 10 seconds
- 100 items: 100 seconds

This time the number of items increases by a factor of 10, and so does the time. n=10 and so O(n)'s scaling factor is 10.

O(1):

- 1 item: 1 second
- 10 items: 1 second
- 100 items: 1 second

The number of items is still increasing by a factor of 10, but the scaling factor of O(1) is always 1.

Big-O notation is a relative representation of the complexity of an algorithm.

There are some important and deliberately chosen words in that sentence:

- relative: you can only compare apples to apples. You can't compare an algorithm to do arithmetic multiplication to an algorithm that sorts a list of integers. But two algorithms that do arithmetic operations (one multiplication, one addition) will tell you something meaningful;
- representation: Big-O (in its simplest form) reduces the comparison between algorithms to a single variable. That variable is chosen based on observations or assumptions. For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering). This assumes that comparison is expensive. But what if comparison is cheap but swapping is expensive? It changes the comparison; and
- complexity: if it takes me one second to sort 10,000 elements how long will it take me to sort one million? Complexity in this instance is a relative measure to something else.

Come back and reread the above when you've read the rest.

The best example of Big-O I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learnt in school were:

- addition;
- subtraction;
- multiplication; and
- division.

Each of these is an operation or a problem. A method of solving these is called an algorithm.

Addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.

Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.

See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits n in the larger number. We call this O(n) or linear complexity.

Subtraction is similar (except you may need to borrow instead of carry).

Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.

If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (10¹²) multiplications and two million adds.

As the algorithm scales with n-squared, this is O(n²) or quadratic complexity. This is a good time to introduce another important concept:

We only care about the most significant portion of complexity.

The astute may have realized that we could express the number of operations as: n² + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.0002% of the total operations by that stage).

One can notice that we've assumed the worst case scenario here. While multiplying 6 digit numbers if one of them is 4 digit and the other one is 6 digit, then we only have 24 multiplications. Still we calculate the worst case scenario for that 'n', i.e when both are 6 digit numbers. Hence Big-O notation is about the Worst-case scenario of an algorithm

The Telephone Book

The next best example I can think of is the telephone book, normally called the White Pages or similar but it'll vary from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.

Now if you were instructing a computer to look up the phone number for "John Smith" in a telephone book that contains 1,000,000 names, what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?

A typical implementation might be to open up to the middle, take the 500,000^th and compare it to "Smith". If it happens to be "Smith, John", we just got real lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.

This is called a binary search and is used every day in programming whether you realize it or not.

So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 20 times. In comparing search algorithms we decide that this comparison is our 'n'.

- For a phone book of 3 names it takes 2 comparisons (at most).
- For 7 it takes at most 3.
- For 15 it takes 4.
- …
- For 1,000,000 it takes 20.

That is staggeringly good isn't it?

In Big-O terms this is O(log n) or logarithmic complexity. Now the logarithm in question could be ln (base e), log₁₀, log₂ or some other base. It doesn't matter it's still O(log n) just like O(2n²) and O(100n²) are still both O(n²).

It's worthwhile at this point to explain that Big O can be used to determine three cases with an algorithm:

- Best Case: In the telephone book search, the best case is that we find the name in one comparison. This is O(1) or constant complexity;
- Expected Case: As discussed above this is O(log n); and
- Worst Case: This is also O(log n).

Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.

Back to the telephone book.

What if you have a phone number and want to find a name? The police have a reverse phone book but such lookups are denied to the general public. Or are they? Technically you can reverse lookup a number in an ordinary phone book. How?

You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is unordered (by phone number anyway).

So to find a name:

- Best Case: O(1);
- Expected Case: O(n) (for 500,000); and
- Worst Case: O(n) (for 1,000,000).

The Travelling Salesman

This is quite a famous problem in computer science and deserves a mention. In this problem you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Travelling Salesman problem is to find the shortest tour that visits every town.

Sounds simple? Think again.

If you have 3 towns A, B and C with roads between all pairs then you could go:

- A → B → C
- A → C → B
- B → C → A
- B → A → C
- C → A → B
- C → B → A

Well actually there's less than that because some of these are equivalent (A → B → C and C → B → A are equivalent, for example, because they use the same roads, just in reverse).

In actuality there are 3 possibilities.

- Take this to 4 towns and you have (iirc) 12 possibilities.
- With 5 it's 60.
- 6 becomes 360.

This is a function of a mathematical operation called a factorial. Basically:

- 5! = 5 × 4 × 3 × 2 × 1 = 120
- 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
- 7! = 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5040
- …
- 25! = 25 × 24 × … × 2 × 1 = 15,511,210,043,330,985,984,000,000
- …
- 50! = 50 × 49 × … × 2 × 1 = 3.04140932 × 10⁶⁴

So the Big-O of the Travelling Salesman problem is O(n!) or factorial or combinatorial complexity.

By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.

Something to think about.

Polynomial Time

Another point I wanted to make quick mention of is that any algorithm that has a complexity of O(n^a) is said to have polynomial complexity or is solvable in polynomial time.

Traditional computers can solve polynomial-time problems. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.

Anyway, that's it for my (hopefully plain English) explanation of Big O (revised).

How to Determine Complexity

Taken from MIT lecture (link below)

In general, how can you determine the running time of a piece of code? The answer is that it depends on what kinds of statements are used.

Sequence of statements

statement 1;

statement 2;

... statement k;

The total time is found by adding the times for all statements:

total time = time(statement 1) + time(statement 2) + ... + time(statement k)

If each statement is "simple" (only involves basic operations) then the time for each statement is constant and the total time is also constant: O(1).

If-Then-Else

if (cond) then

block 1 (sequence of statements)

else

block 2 (sequence of statements)

end if;

Here, either block 1 will execute, or block 2 will execute. Therefore, the worst-case time is the slower of the two possibilities:

max(time(block 1), time(block 2))

If block 1 takes O(1) and block 2 takes O(N), the if-then-else statement would be O(N).

Loops

for I in 1 .. N

loop sequence of statements

end loop;

The loop executes N times, so the sequence of statements also executes N times. If we assume the statements are O(1), the total time for the for loop is N * O(1), which is O(N) overall.

Nested loops

for I in 1 .. N loop

for J in 1 .. M loop sequence of statements

end loop;

The outer loop executes N times. Every time the outer loop executes, the inner loop executes M times. As a result, the statements in the inner loop execute a total of N * M times. Thus, the complexity is O(N * M).

In a common special case where the stopping condition of the inner loop is J<N instead of J<M (i.e., the inner loop also executes N times), the total complexity for the two loops is O(N²). This is also true for bubble sorts where the inner loop decreases by 1 each pass of the outer loop.

Statements with function/ procedure calls

When a statement involves a function/ procedure call, the complexity of the statement includes the complexity of the function/ procedure. Assume that you know that function/ procedure f takes constant time, and that function/procedure g takes time proportional to (linear in) the value of its parameter k. Then the statements below have the time complexities indicated.

f(k) has O(1)

g(k) has O(k)

When a loop is involved, the same rule applies. For example:

for J in 1 .. N loop

g(J);

end loop;

has complexity (N²). The loop executes N times and each function/procedure call g(N) is complexity O(N).

Further links

Page updated

Google Sites

Report abuse