Trang chủ‎ > ‎IT‎ > ‎Programming‎ > ‎Algorithms Design‎ > ‎

Big-O, Little-O, Theta, Omega

Big-O, Little-O, Theta, Omega

Big-O, Little-o, Omega, and Theta are formal notation methods for stating the growth of resource needs (efficiency and storage) of an algorithm. There are four basic notations used when describing resource needs. These are: O(f(n)), o(f(n)), Ω(f(n)), and Θ(f(n)). (Pronounced, Big-O, Little-O, Omega and Theta respectively)

Formally:

"T(n) is O(f(n))" iff for some constants c and n0, T(n)<=cf(n) for all n>=n0

"T(n) is Ω(f(n))" iff for some constants c and n0, T(n)>=cf(n) for all n>=n0

"T(n) is Θ(f(n))" iff T(n) is O(f(n)) AND T(n) is Ω(f(n))

"T(n) is o(f(n))" iff T(n) is O(f(n)) AND T(n) is NOT Θ(f(n))

Informally:

"T(n) is O(f(n))" basically means that f(n) describes the upper bound for T(n)

"T(n) is Ω(f(n))" basically means that f(n) describes the lower bound for T(n)
"T(n) is Θ(f(n))" basically means that f(n)f(n)f(n) describes the exact bound for T(n)
"T(n) is o(f(n))" basically means that f(n)f(n)f(n) is the upper bound for T(n) but that T(n)T(n)T(n) can never be equal to f(n)

Another way of saying this:

"T(n) is O(f(n))" growth rate of T(n) <= growth rate of f(n)

"T(n) is Ω(f(n))" growth rate of T(n) >= growth rate of f(n)
"T(n) is Θ(f(n))" growth rate of T(n) == growth rate of f(n)
"T(n) is o(f(n))" growth rate of T(n) < growth rate of f(n)

An easy way to think about big-O

The math in big-O analysis can often be intimidates students. One of the simplest ways to think about big-O analysis is that it is basically a way to apply a rating system for your algorithms (like movie ratings). It tells you the kind of resource needs you can expect the algorithm to exhibit as your data gets bigger and bigger. From best (least resource requirements ) to worst, the rankings are: O(1), O(logn), O(n), O(nlogn), O(n2), O(n3), O(2n). Think about the graphs in the grow rate section. The way each curve looks. That is the most important thing to understand about algorithms analysis

What all this means

Let's take a closer look a the formal definition for big-O analysis

"T(n) is O(f(n))" if for some constants c and n0, T(n)<=cf(n) for all n>=n0
The way to read the above statement is as follows.
  • n is the size of the data set.
  • f(n) is a function that is calculated using n as the parameter.
  • O(f(n)) means that the curve described by f(n) is an upper bound for the resource needs of a function.

This means that if we were to draw a graph of the resource needs of a particular algorithm, it would fall under the curve described by f(n). What's more, it doesn't need to be under the exact curve described by f(n). It could be under a constant scaled curve for f(n)... so instead of having to be under the n2 curve, it can be under the 10n2 curve or the 200n2 curve. In fact it can be any constant, as long as it is a constant. A constant is simply a number that does not change with n. So as n gets bigger, you cannot change what the constant is. The actual value of the constant does not matter though.

The other portion of the statement n>=n0 means that T(n)<=cf(n) does not need to be true for all values of n. It means that as long as you can find a value n0 for which T(n)<=cf(n) is true, and it never becomes untrue for all n larger than n0, then you have met the criteria for the statement T(n) is O(f(n))

In summary, when we are looking at big-O, we are in essence looking for a description of the growth rate of the resource increase. The exact numbers do not actually matter in the end.

Algorithmic complexity is a very important topic in computer science. Knowing the complexity of algorithms allows you to answer questions such as
  • How long will a program run on an input?
  • How much space will it take?
  • Is the problem solvable?

These are important bases of comparison between different algorithms. An understanding of algorithmic complexity provides programmers with insight into the efficiency of their code. Complexity is also important to several theoretical areas in computer science, including algorithms, data structures, and complexity theory.

Asymptotic Analysis

When analyzing the running time or space usage of programs, we usually try to estimate the time or space as function of the input size. For example, when analyzing the worst case running time of a function that sorts a list of numbers, we will be concerned with how long it takes as a function of the length of the input list.  For example, we say the standard insertion sort takes time T(n) where T(n)= c*n2+k for some constants c and k.  In contrast, merge sort takes time T '(n) = c'*n*log2(n) + k'.

The asymptotic behavior of a function f(n) (such as f(n)=c*n or f(n)=c*n2, etc.) refers to the growth of f(n) as n gets large. We typically ignore small values of n, since we are usually interested in estimating how slow the program will be on large inputs. A good rule of thumb is: the slower the asymptotic growth rate, the better the algorithm (although this is often not the whole story).

By this measure, a linear algorithm (i.e., f(n)=d*n+k) is always asymptotically better than a quadratic one (e.g., f(n)=c*n2+q). That is because for any given (positive) c,k,d, and q there is always some n at which the magnitude of c*n2+q overtakes d*n+k. For moderate values of n, the quadratic algorithm could very well take less time than the linear one, for example if c is significantly smaller than d and/or k is significantly smaller than q. However, the linear algorithm will always be better for sufficiently large inputs. Remember to THINK BIG when working with asymptotic rates of growth.

Worst-Case and Average-Case Analysis

When we say that an algorithm runs in time T(n), we mean that T(n) is an upper bound on the running time that holds for all inputs of size n. This is called worst-case analysis. The algorithm may very well take less time on some inputs of size n, but it doesn't matter. If an algorithm takes T(n)=c*n2+k steps on only a single input of each size n and only n steps on the rest, we still say that it is a quadratic algorithm.

A popular alternative to worst-case analysis is average-case analysis. Here we do not bound the worst case running time, but try to calculate the expected time spent on a randomly chosen input. This kind of analysis is generally harder, since it involves probabilistic arguments and often requires assumptions about the distribution of inputs that may be difficult to justify. On the other hand, it can be more useful because sometimes the worst-case behavior of an algorithm is misleadingly bad. A good example of this is the popular quicksort algorithm, whose worst-case running time on an input sequence of length n is proportional to n2 but whose expected running time is proportional to n log n.

Order of Growth and Big-O Notation

In estimating the running time of insert_sort (or any other program) we don't know what the constants c or k are. We know that it is a constant of moderate size, but other than that it is not important; we have enough evidence from the asymptotic analysis to know that a merge_sort (see below) is faster than the quadratic insert_sort, even though the constants may differ somewhat. (This does not always hold; the constants can sometimes make a difference, but in general it is a very good rule of thumb.)

We may not even be able to measure the constant c directly. For example, we may know that a given expression of the language, such as if, takes a constant number of machine instructions, but we may not know exactly how many. Moreover, the same sequence of instructions executed on a Pentium IV will take less time than on a Pentium II (although the difference will be roughly a constant factor). So these estimates are usually only accurate up to a constant factor anyway. For these reasons, we usually ignore constant factors in comparing asymptotic running times.

Computer scientists have developed a convenient notation for hiding the constant factor. We write O(n) (read: ''order n'') instead of ''cn for some constant c.'' Thus an algorithm is said to be O(n) or linear time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn on inputs of size n. An algorithm is said to be O(n2) or quadratic time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn2 on inputs of size n. O(1) means constant time.

Polynomial time means nO(1), or nc for some constant c. Thus any constant, linear, quadratic, or cubic (O(n3)) time algorithm is a polynomial-time algorithm.

This is called big-O notation. It concisely captures the important differences in the asymptotic growth rates of functions.

One important advantage of big-O notation is that it makes algorithms much easier to analyze, since we can conveniently ignore low-order terms. For example, an algorithm that runs in time

10n3 + 24n2 + 3n log n + 144

is still a cubic algorithm, since

10n3 + 24n2 + 3n log n + 144
<= 10n3 + 24n3 + 3n3 + 144n3
<= (10 + 24 + 3 + 144)n3
= O(n3)
.

Of course, since we are ignoring constant factors, any two linear algorithms will be considered equally good by this measure. There may even be some situations in which the constant is so huge in a linear algorithm that even an exponential algorithm with a small constant may be preferable in practice. This is a valid criticism of asymptotic analysis and big-O notation. However, as a rule of thumb it has served us well. Just be aware that it is only a rule of thumb--the asymptotically optimal algorithm is not necessarily the best one.

Some common orders of growth seen often in complexity analysis are

O(1) constant
O(log n) logarithmic
O(n) linear
O(n log n) "n log n"
O(n2) quadratic
O(n3) cubic
nO(1) polynomial
2O(n) exponential

Here log means log2 or the logarithm base 2, although the logarithm base doesn't really matter since logarithms with different bases differ by a constant factor. Note also that 2O(n) and O(2n) are not the same!

Comparing Orders of Growth

O
Let f and g be functions from positive integers to positive integers. We say f is O(g(n)) (read: ''f is order g'') if g is an upper bound on f:  there exists a fixed constant c and a fixed n0 such that for all n≥n0,

f(n) ≤ cg(n).

Equivalently, f is O(g(n)) if the function f(n)/g(n) is bounded above by some constant.

o
We say f is o(g(n)) (read: "f is little-o of g'') if for all arbitrarily small real c > 0, for all but perhaps finitely many n,

f(n) ≤ cg(n).

Equivalently, f is o(g) if the function f(n)/g(n) tends to 0 as n tends to infinity. That is, f is small compared to g. If f is o(g) then f is also o(g)

Ω
We say that f is Ω(g(n)) (read: "f is omega of g") if g is a lower bound on f for large n. Formally, f is Ω(g) if there is a fixed constant c and a fixed n0 such that for all n>n0,

cg(n) f(n)

For example, any polynomial whose highest exponent is nk is Ω(nk). If f(n) is Ω(g(n)) then g(n) is O(f(n)). If f(n) is o(g(n)) then f(n) is not Ω(g(n)).

Θ
We say that f is Θ(g(n)) (read: "f is theta of g") if g is an accurate characterization of f for large n: it can be scaled so it is both an upper and a lower bound of f. That is, f is both O(g(n)) and Ω(g(n)). Expanding out the definitions of  Ω and O, f is Θ(g(n)) if there are fixed constants c1 and c2 and a fixed n0 such that for all n>n0,

c1g(n) f(n) c2 g(n)

For example, any polynomial whose highest exponent is nk is  Θ(nk). If f is Θ(g), then it is O(g) but not o(g). Sometimes people use O(g(n)) a bit informally to mean the stronger property Θ(g(n)); however, the two are different.

Here are some examples:

  • n + log n is O(n) and Q(n), because for all n > 1, n < n + log n < 2n.
  • n1000 is o(2n), because n1000/2n tends to 0 as n tends to infinity.
  • For any fixed but arbitrarily small real number c, n log n is o(n1+c), since n log n / n1+c tends to 0. To see this, take the logarithm

    log(n log n / n1+c)
    = log(n log n) - log(n1+c)
    = log n + log log n - (1+c)log n
    = log log n - c log n

    and observe that it tends to negative infinity.

The meaning of an expression like O(n2) is really a set of functions: all the functions that are O(n2). When we say that f(n) is O(n2), we mean that f(n) is a member of this set. It is also common to write this as f(n) = O(g(n)) although it is not really an equality.

We now introduce some convenient rules for manipulating expressions involving order notation. These rules, which we state without proof, are useful for working with orders of growth. They are really statements about sets of functions. For example, we can read #2 as saying that the product of any two functions in O(f(n)) and O(g(n)) is in O(f(n)g(n)).

  1. cnm = O(nk) for any constant c and any m ≤ k.
  2. O(f(n)) + O(g(n)) = O(f(n) + g(n)).
  3. O(f(n))O(g(n)) = O(f(n)g(n)).
  4. O(cf(n)) = O(f(n)) for any constant c.
  5. c is O(1) for any constant c.
  6. logbn = O(log n) for any base b.

All of these rules (except #1) also hold for Q as well.

Shortcomings of asymptotic analysis

In practice, other considerations beside asymptotic analysis are important when choosing between algorithms. Sometimes, an algorithm with worse asymptotic behavior is preferable. For the sake of this discussion, let algorithm A be asymptotically better than algorithm B. Here are some common issues with algorithms that have better asymptotic behavior:

  • Implementation complexity
    Algorithms with better complexity are often (much) more complicated. This can increase coding time and the constants.
  • Small input sizes
    Asymptotic analysis ignores small input sizes. At small input sizes, constant factors or low order terms could dominate running time, causing B to outperform A.
  • Worst case versus average performance
    If A has better worst case performance than B, but the average performance of B given the expected input is better, then B could be a better choice than A. Conversely, if the worst case performance of B is unacceptable (say for life-threatening or mission-critical reasons), A must still be used.

Complexity of algorithms from class

Average Worst case
Binary search tree --O(max height of tree)
Red-black tree --O(log n)
Splay tree O(log n)O(n)
DFS, BFS --O(m+n)
Insert Minimum Extract Min Union Decrease Delete
Binary heap O(log(n))O(1)O(log(n))O(n)O(log(n))O(log(n))
Binomial heap O(log(n))O(log(n))O(log(n))O(1)O(log(n))O(log(n))

Hi there!  This webpage covers the space and time Big-O complexities of common algorithms used in Computer Science.  When preparing for technical interviews in the past, I found myself spending hours crawling the internet putting together the best, average, and worst case complexities for search and sorting algorithms so that I wouldn't be stumped when asked about them.  Over the last few years, I've interviewed at several Silicon Valley startups, and also some bigger companies, like Google, Facebook, Yahoo, LinkedIn, and eBay, and each time that I prepared for an interview, I thought to myself "Why hasn't someone created a nice Big-O cheat sheet?".  So, to save all of you fine folks a ton of time, I went ahead and created one.  Enjoy! - Eric

Big-O Complexity Chart

O(log n), O(1) O(n) O(n log n) O(n^2) O(2^n) O(n!) Operations Elements

Common Data Structure Operations

Data Structure Time Complexity Space Complexity
Average Worst Worst
Access Search Insertion Deletion Access Search Insertion Deletion
Array Θ(1) Θ(n) Θ(n) Θ(n) O(1) O(n) O(n) O(n) O(n)
Stack Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)
Queue Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)
Singly-Linked List Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)
Doubly-Linked List Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)
Skip List Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n log(n))
Hash Table N/A Θ(1) Θ(1) Θ(1) N/A O(n) O(n) O(n) O(n)
Binary Search Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n)
Cartesian Tree N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A O(n) O(n) O(n) O(n)
B-Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)
Red-Black Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)
Splay Tree N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A O(log(n)) O(log(n)) O(log(n)) O(n)
AVL Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)
KD Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n)

Array Sorting Algorithms

Algorithm Time Complexity Space Complexity
Best Average Worst Worst
Quicksort Ω(n log(n)) Θ(n log(n)) O(n^2) O(log(n))
Mergesort Ω(n log(n)) Θ(n log(n)) O(n log(n)) O(n)
Timsort Ω(n) Θ(n log(n)) O(n log(n)) O(n)
Heapsort Ω(n log(n)) Θ(n log(n)) O(n log(n)) O(1)
Bubble Sort Ω(n) Θ(n^2) O(n^2) O(1)
Insertion Sort Ω(n) Θ(n^2) O(n^2) O(1)
Selection Sort Ω(n^2) Θ(n^2) O(n^2) O(1)
Tree Sort Ω(n log(n)) Θ(n log(n)) O(n^2) O(n)
Shell Sort Ω(n log(n)) Θ(n(log(n))^2) O(n(log(n))^2) O(1)
Bucket Sort Ω(n+k) Θ(n+k) O(n^2) O(n)
Radix Sort Ω(nk) Θ(nk) O(nk) O(n+k)
Counting Sort Ω(n+k) Θ(n+k) O(n+k) O(k)
Cubesort Ω(n) Θ(n log(n)) O(n log(n)) O(n)

Learn More

Get the Official Big-O Cheat Sheet Poster

In our previous articles on Analysis of Algorithms, we had discussed asymptotic notations, their worst and best case performance etc. in brief. In this article, we discuss analysis of algorithm using Big – O asymptotic notation in complete details.

Big-O Analysis of Algorithms

The Big O notation defines an upper bound of an algorithm, it bounds a function only from above. For example, consider the case of Insertion Sort. It takes linear time in best case and quadratic time in worst case. We can safely say that the time complexity of Insertion sort is O(n^2). Note that O(n^2) also covers linear time.

The Big-O Asymptotic Notation gives us the Upper Bound Idea, mathematically described below:

f(n) = O(g(n)) if there exists a positive integer n0 and a positive constant c, such that f(n)≤c.g(n) ∀ n≥n0

The general step wise procedure for Big-O runtime analysis is as follows:

  1. Figure out what the input is and what n represents.
  2. Express the maximum number of operations, the algorithm performs in terms of n.
  3. Eliminate all excluding the highest order terms.
  4. Remove all the constant factors.

Some of the useful properties on Big-O notation analysis are as follow:

Constant Multiplication:
If f(n) = c.g(n), then O(f(n)) = O(g(n)) ; where c is a nonzero constant.
Polynomial Function:
If f(n) = a0 + a1.n + a2.n2 + —- + am.nm, then O(f(n)) = O(nm).
Summation Function:
If f(n) = f1(n) + f2(n) + —- + fm(n) and fi(n)≤fi+1(n) ∀ i=1, 2, —-, m,
then O(f(n)) = O(max(f1(n), f2(n), —-, fm(n))).
Logarithmic Function:
If f(n) = logan and g(n)=logbn, then O(f(n))=O(g(n))
; all log functions grow in the same manner in terms of Big-O.

Basically, this asymptotic notation is used to measure and compare the worst-case scenarios of algorithms theoretically. For any algorithm, the Big-O analysis should be straightforward as long as we correctly identify the operations that are dependent on n, the input size.

Runtime Analysis of Algorithms

In general cases, we mainly used to measure and compare the worst-case theoretical running time complexities of algorithms for the performance analysis.
The fastest possible running time for any algorithm is O(1), commonly referred to as Constant Running Time. In this case, the algorithm always takes the same amount of time to execute, regardless of the input size. This is the ideal runtime for an algorithm, but it’s rarely achievable.
In actual cases, the performance (Runtime) of an algorithm depends on n, that is the size of the input or the number of operations is required for each input item.
The algorithms can be classified as follows from the best-to-worst performance (Running Time Complexity):

A logarithmic algorithm – O(logn)
Runtime grows logarithmically in proportion to n.
A linear algorithm – O(n)
Runtime grows directly in proportion to n.
A superlinear algorithm – O(nlogn)
Runtime grows in proportion to n.
A polynomial algorithm – O(nc)
Runtime grows quicker than previous all based on n.
A exponential algorithm – O(cn)
Runtime grows even faster than polynomial algorithm based on n.
A factorial algorithm – O(n!)
Runtime grows the fastest and becomes quickly unusable for even
small values of n.

Where, n is the input size and c is a positive constant.

asymtotic-analysis

Algorithmic Examples of Runtime Analysis:
Some of the examples of all those types of algorithms (in worst-case scenarios) are mentioned below:

Logarithmic algorithm – O(logn) – Binary Search.
Linear algorithm – O(n) – Linear Search.
Superlinear algorithm – O(nlogn) – Heap Sort, Merge Sort.
Polynomial algorithm – O(n^c) – Strassen’s Matrix Multiplication, Bubble Sort, Selection Sort, Insertion Sort, Bucket Sort.
Exponential algorithm – O(c^n) – Tower of Hanoi.
Factorial algorithm – O(n!) – Determinant Expansion by Minors, Brute force Search algorithm for Traveling Salesman Problem
.

Mathematical Examples of Runtime Analysis:
The performances (Runtimes) of different orders of algorithms separate rapidly as n (the input size) gets larger. Let’s consider the mathematical example:

If n = 10,                  If n=20,
    log(10) = 1;                log(20) = 2.996;
    10 = 10;                    20 = 20;
    10log(10)=10;               20log(20)=59.9;
    102=100;                    202=400;
    210=1024;                    220=1048576;
    10!=3628800;                20!=2.432902e+1818;

Memory Footprint Analysis of Algorithms

For performance analysis of an algorithm, runtime measurement is not only relevant metric but also we need to consider the memory usage amount of the program. This is referred to as the Memory Footprint of the algorithm, shortly known as Space Complexity.
Here also, we need to measure and compare the worst case theoretical space complexities of algorithms for the performance analysis.

It basically depends on two major aspects described below:

  • Firstly, the implementation of the program is responsible for memory usage. For example, we can assume that recursive implementation always reserves more memory than the corresponding iterative implementation of a particular problem.
  • And the other one is n, the input size or the amount of storage required for each item. For example, a simple algorithm with a high amount of input size can consume more memory than a complex algorithm with less amount of input size.

Algorithmic Examples of Memory Footprint Analysis: The algorithms with examples are classified from the best-to-worst performance (Space Complexity) based on the worst-case scenarios are mentioned below:

 Ideal algorithm - O(1) - Linear Search, Binary Search,
    Bubble Sort, Selection Sort, Insertion Sort, Heap Sort, Shell Sort.
 Logarithmic algorithm - O(log n) - Merge Sort.
 Linear algorithm - O(n) - Quick Sort.
 Sub-linear algorithm - O(n+k) - Radix Sort.

Space-Time Tradeoff and Efficiency

There is usually a trade-off between optimal memory use and runtime performance.
In general for an algorithm, space efficiency and time efficiency reach at two opposite ends and each point in between them has a certain time and space efficiency. So, the more time efficiency you have, the less space efficiency you have and vice versa.
For example, Mergesort algorithm is exceedingly fast but requires a lot of space to do the operations. On the other side, Bubble Sort is exceedingly slow but requires the minimum space.

At the end of this topic, we can conclude that finding an algorithm that works in less running time and also having less requirement of memory space, can make a huge difference in how well an algorithm performs.


=============================================================================

Big-O notation explained by a self-taught programmer

This is the first in a three post series. The second post talks about how to calculate Big-O. The third article talks about understanding the formal definition of Big-O.

Big-O notation used to be a really scary concept for me. I thought this is how "real" programmers talked about their code. It was all the more scary because the academic descriptions (such as Wikipedia) made very little sense to me. This is frustrating because the underlying concepts aren't actually that hard.

Simply put, Big-O notation is how programmers talk about algorithms. Algorithms are another scary topic which I'll cover in another post, but for our purposes, let's say that "algorithm" means a function in your program (which isn't too far off). A function's Big-O notation is determined by how it responds to different inputs. How much slower is it if we give it a list of 1000 things to work on instead of a list of 1 thing?

Consider this code:

def item_in_list(to_check, the_list):
    for item in the_list:
        if to_check == item:
          return True
    return False

So if we call this function like item_in_list(2, [1,2,3]), it should be pretty quick. We loop over each thing in the list and if we find the first argument to our function, return True. If we get to the end and we didn't find it, return False.

The "complexity" of this function is O(n). I'll explain what this means in just a second, but let's break down this mathematical syntax. O(n) is read "Order of N" because the O function is also known as the Order function. I think this is because we're doing approximation, which deals in "orders of magnitude".

"Orders of magnitude" is YET ANOTHER mathematical term which basically tells the difference between classes of numbers. Think the difference between 10 and 100. If you picture 10 of your closest friends and 100 people, that's a really big difference. Similarly, the difference between 1,000 and 10,000 is pretty big (in fact, its the difference between a junker car and a lightly used one). It turns out that in approximation, as long as you're within an order of magnitude, you're pretty close. If you were to guess the number of gumballs in a machine, you'd be within an order of magnitude if you said 200 gumballs. 10,000 gumballs would be WAY off.

Figure 1: A gumball machine whose number of gumballs is probably within an order of magnitude of 200.

Back to dissecting O(n), this says that if we were to graph the time it takes to run this function with different sized inputs (e.g. an array of 1 item, 2 items, 3 items, etc), we'd see that it approximately corresponds to the number of items in the array. This is called a linear graph. This means that the line is basically straight if you were to graph it.

Some of you may have caught that if, in the code sample above, our item was always the first item in the list, our code would be really fast! This is true, but Big-O is all about the approximate worst-case performance of doing something. The worst case for the code above is that the thing we're searching for isn't in the list at all. (Note: The math term for this is "upper bound", which means its talking about the mathematic limit of awfulness).

If you wanted to see a graph for these functions, you ignore the O() function and change the variable n for x. You can then type that into Wolfram Alpha as "plot x" which will show you a diagonal line. The reason you swap out n for x is that their graphing program wants x as its variable name because it corresponds to the x axis. The x-axis getting bigger from left to right corresponds to giving bigger and bigger arrays to your function. The y-axis represents time, so the higher the line, the slower it is.

Figure 2: Runtime characteristics of an O(n) function

So what are some other examples of this?

Consider this function:

def is_none(item):
    return item is None

This is a bit of a silly example, but bear with me. This function is called O(1) which is called "constant time". What this means is no matter how big our input is, it always takes the same amount of time to compute things. If you go back to Wolfram and plot 1, you'll see that it always stays the same, no matter how far right we go. If you pass in a list of 1 million integers, it'll take about the same time as if you were going to pass in a list of 1 integer. Constant time is considered the best case scenario for a function.

Figure 3: Runtime characteristics of an O(1) function

Consider this function:

def all_combinations(the_list):
   results = []
   for item in the_list:
       for inner_item in the_list:
           results.append((item, inner_item))
   return results

This matches every item in the list with every other item in the list. If we gave it an array [1,2,3], we'd get back [(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part of the field of combinatorics (warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is considered O(n^2). This is because for every item in the list (aka n for the input size), we have to do n more operations. So n * n == n^2.

Below is a comparison of each of these graphs, for reference. You can see that an O(n^2) function will get slow very quickly where as something that operates in constant time will be much better. This is particularly useful when it comes to data structures, which I'll post about soon.

Figure 4: Comparison of O(n2) vs O(n) vs O(1) functions

This is a pretty high level overview of Big-O notation, but hopefully gets you acquainted with the topic. There's a coursera course which can give you a more in depth view into this topic, but be warned that it will hop into mathematic notation very quickly. If anything here doesn't make sense, send me an email.

Update: I've also written about how to calculate Big-O.

I'm thinking of writing a book on this topic. If this is something you'd like to see, please express your interest in it here.

A beginner's guide to Big O notation

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.

Anyone who's read Programming Pearls or any other Computer Science books and doesn’t have a grounding in Mathematics will have hit a wall when they reached chapters that mention O(N log N) or other seemingly crazy syntax. Hopefully this article will help you gain an understanding of the basics of Big O and Logarithms.

As a programmer first and a mathematician second (or maybe third or fourth) I found the best way to understand Big O thoroughly was to produce some examples in code. So, below are some common orders of growth along with descriptions and examples where possible.

O(1)

O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.

bool IsFirstElementNull(IList<string> elements)
{
    return elements[0] == null;
}

O(N)

O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

bool ContainsValue(IList<string> elements, string value)
{
    foreach (var element in elements)
    {
        if (element == value) return true;
    }

    return false;
}

O(N2)

O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N3), O(N4) etc.

bool ContainsDuplicates(IList<string> elements)
{
    for (var outer = 0; outer < elements.Count; outer++)
    {
        for (var inner = 0; inner < elements.Count; inner++)
        {
            // Don't compare with self
            if (outer == inner) continue;

            if (elements[outer] == elements[inner]) return true;
        }
    }

    return false;
}

O(2N)

O(2N) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2N) function is exponential - starting off very shallow, then rising meteorically. An example of an O(2N) function is the recursive calculation of Fibonacci numbers:

int Fibonacci(int number)
{
    if (number <= 1) return number;

    return Fibonacci(number - 2) + Fibonacci(number - 1);
}

Logarithms

Logarithms are slightly trickier to explain so I'll use a common example:

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.

This article only covers the very basics or Big O and logarithms. For a more in-depth explanation take a look at their respective Wikipedia entries: Big O Notation, Logarithms.