Big-O, Little-O, Theta, OmegaBig-O, Little-o, Omega, and Theta are formal notation methods for stating the growth of resource needs (efficiency and storage) of an algorithm. There are four basic notations used when describing resource needs. These are: O(f(n)), o(f(n)), Ω(f(n)), and Θ(f(n)). (Pronounced, Big-O, Little-O, Omega and Theta respectively) Formally:
"T(n) is O(f(n))" iff for some constants c and n_{0}_{}, T(n)<=cf(n) for all n>=n_{0} "T(n) is Ω(f(n))" iff for some constants c and n_{0}_{}, T(n)>=cf(n) for all n>=n_{0} "T(n) is Θ(f(n))" iff T(n) is O(f(n)) AND T(n) is Ω(f(n)) "T(n) is o(f(n))" iff T(n) is O(f(n)) AND T(n) is NOT Θ(f(n)) Informally:"T(n) is O(f(n))" basically means that f(n) describes the upper bound for T(n) "T(n) is Ω(f(n))" basically means that f(n) describes the lower bound for T(n) "T(n) is Θ(f(n))" basically means that f(n)f(n)f(n) describes the exact bound for T(n) "T(n) is o(f(n))" basically means that f(n)f(n)f(n) is the upper bound for T(n) but that T(n)T(n)T(n) can never be equal to f(n)
Another way of saying this:"T(n) is O(f(n))" growth rate of T(n) <= growth rate of f(n) "T(n) is Ω(f(n))" growth rate of T(n) >= growth rate of f(n) "T(n) is Θ(f(n))" growth rate of T(n) == growth rate of f(n) "T(n) is o(f(n))" growth rate of T(n) < growth rate of f(n)
An easy way to think about big-OThe math in big-O analysis can often be intimidates students. One of the simplest ways to think about big-O analysis is that it is basically a way to apply a rating system for your algorithms (like movie ratings). It tells you the kind of resource needs you can expect the algorithm to exhibit as your data gets bigger and bigger. From best (least resource requirements ) to worst, the rankings are: O(1), O(logn), O(n), O(nlogn), O(n^{2}), O(n^{3}), O(2^{n}). Think about the graphs in the grow rate section. The way each curve looks. That is the most important thing to understand about algorithms analysis What all this meansLet's take a closer look a the formal definition for big-O analysis "T(n) is O(f(n))" if for some constants c and n_{0}, T(n)<=cf(n) for all n>=n_{0} The way to read the above statement is as follows.
This means that if we were to draw a graph of the resource needs of a particular algorithm, it would fall under the curve described by f(n). What's more, it doesn't need to be under the exact curve described by f(n). It could be under a constant scaled curve for f(n)... so instead of having to be under the n^{2}^{} curve, it can be under the 10n^{2} curve or the 200n^{2} curve. In fact it can be any constant, as long as it is a constant. A constant is simply a number that does not change with n. So as n gets bigger, you cannot change what the constant is. The actual value of the constant does not matter though. The other portion of the statement n>=n_{0} means that T(n)<=cf(n) does not need to be true for all values of n. It means that as long as you can find a value n_{0} for which T(n)<=cf(n) is true, and it never becomes untrue for all n larger than n_{0}, then you have met the criteria for the statement T(n) is O(f(n))
In summary, when we are looking at big-O, we are in essence looking for a description of the growth rate of the resource increase. The exact numbers do not actually matter in the end. Algorithmic complexity is a very important topic in computer science. Knowing the complexity of algorithms allows you to answer questions such as
These are important bases of comparison between different algorithms. An understanding of algorithmic complexity provides programmers with insight into the efficiency of their code. Complexity is also important to several theoretical areas in computer science, including algorithms, data structures, and complexity theory. Asymptotic Analysis
When analyzing the running time or space usage of programs, we usually try to estimate the time or space as function of the input size. For example, when analyzing the worst case running time of a function that sorts a list of numbers, we will be concerned with how long it takes as a function of the length of the input list. For example, we say the standard insertion sort takes time T(n) where T(n)= c*n^{2}+k for some constants c and k. In contrast, merge sort takes time T '(n) = c'*n*log_{2}(n) + k'. The asymptotic behavior of a function f(n) (such as f(n)=c*n or f(n)=c*n^{2}, etc.) refers to the growth of f(n) as n gets large. We typically ignore small values of n, since we are usually interested in estimating how slow the program will be on large inputs. A good rule of thumb is: the slower the asymptotic growth rate, the better the algorithm (although this is often not the whole story). By this measure, a linear algorithm (i.e., f(n)=d*n+k) is always asymptotically better than a quadratic one (e.g., f(n)=c*n^{2}+q). That is because for any given (positive) c,k,d, and q there is always some n at which the magnitude of c*n^{2}+q overtakes d*n+k. For moderate values of n, the quadratic algorithm could very well take less time than the linear one, for example if c is significantly smaller than d and/or k is significantly smaller than q. However, the linear algorithm will always be better for sufficiently large inputs. Remember to THINK BIG when working with asymptotic rates of growth. Worst-Case and Average-Case Analysis
When we say that an algorithm runs in time T(n), we mean that T(n) is an upper bound on the running time that holds for all inputs of size n. This is called worst-case analysis. The algorithm may very well take less time on some inputs of size n, but it doesn't matter. If an algorithm takes T(n)=c*n^{2}+k steps on only a single input of each size n and only n steps on the rest, we still say that it is a quadratic algorithm. A popular alternative to worst-case analysis is average-case analysis. Here we do not bound the worst case running time, but try to calculate the expected time spent on a randomly chosen input. This kind of analysis is generally harder, since it involves probabilistic arguments and often requires assumptions about the distribution of inputs that may be difficult to justify. On the other hand, it can be more useful because sometimes the worst-case behavior of an algorithm is misleadingly bad. A good example of this is the popular quicksort algorithm, whose worst-case running time on an input sequence of length n is proportional to n^{2} but whose expected running time is proportional to n log n. Order of Growth and Big-O Notation
In estimating the running time of We may not even be able to measure the constant c directly. For example, we may know that a given expression of the language, such as if, takes a constant number of machine instructions, but we may not know exactly how many. Moreover, the same sequence of instructions executed on a Pentium IV will take less time than on a Pentium II (although the difference will be roughly a constant factor). So these estimates are usually only accurate up to a constant factor anyway. For these reasons, we usually ignore constant factors in comparing asymptotic running times. Computer scientists have developed a convenient notation for hiding the constant factor. We write O(n) (read: ''order n'') instead of ''cn for some constant c.'' Thus an algorithm is said to be O(n) or linear time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn on inputs of size n. An algorithm is said to be O(n^{2}) or quadratic time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn^{2} on inputs of size n. O(1) means constant time. Polynomial time means n^{O(1)}, or n^{c} for some constant c. Thus any constant, linear, quadratic, or cubic (O(n^{3})) time algorithm is a polynomial-time algorithm. This is called big-O notation. It concisely captures the important differences in the asymptotic growth rates of functions. One important advantage of big-O notation is that it makes algorithms much easier to analyze, since we can conveniently ignore low-order terms. For example, an algorithm that runs in time 10n^{3} + 24n^{2} + 3n log n + 144 is still a cubic algorithm, since 10n^{3} + 24n^{2} + 3n log n + 144 Of course, since we are ignoring constant factors, any two linear algorithms will be considered equally good by this measure. There may even be some situations in which the constant is so huge in a linear algorithm that even an exponential algorithm with a small constant may be preferable in practice. This is a valid criticism of asymptotic analysis and big-O notation. However, as a rule of thumb it has served us well. Just be aware that it is only a rule of thumb--the asymptotically optimal algorithm is not necessarily the best one. Some common orders of growth seen often in complexity analysis are
Here log means log_{2} or the logarithm base 2, although the logarithm base doesn't really matter since logarithms with different bases differ by a constant factor. Note also that 2^{O(n)} and O(2^{n}) are not the same! Comparing Orders of Growth
Here are some examples:
The meaning of an expression like O(n^{2}) is really a set of functions: all the functions that are O(n^{2}). When we say that f(n) is O(n^{2}), we mean that f(n) is a member of this set. It is also common to write this as f(n) = O(g(n)) although it is not really an equality. We now introduce some convenient rules for manipulating expressions involving order notation. These rules, which we state without proof, are useful for working with orders of growth. They are really statements about sets of functions. For example, we can read #2 as saying that the product of any two functions in O(f(n)) and O(g(n)) is in O(f(n)g(n)).
All of these rules (except #1) also hold for Q as well. Shortcomings of asymptotic analysisIn practice, other considerations beside asymptotic analysis are important when choosing between algorithms. Sometimes, an algorithm with worse asymptotic behavior is preferable. For the sake of this discussion, let algorithm A be asymptotically better than algorithm B. Here are some common issues with algorithms that have better asymptotic behavior:
Complexity of algorithms from class
Hi there! This webpage covers the space and time Big-O complexities of common algorithms used in Computer Science. When preparing for technical interviews in the past, I found myself spending hours crawling the internet putting together the best, average, and worst case complexities for search and sorting algorithms so that I wouldn't be stumped when asked about them. Over the last few years, I've interviewed at several Silicon Valley startups, and also some bigger companies, like Google, Facebook, Yahoo, LinkedIn, and eBay, and each time that I prepared for an interview, I thought to myself "Why hasn't someone created a nice Big-O cheat sheet?". So, to save all of you fine folks a ton of time, I went ahead and created one. Enjoy! - Eric Big-O Complexity Chart
O(log n), O(1)
O(n)
O(n log n)
O(n^2)
O(2^n)
O(n!)
Operations
Elements
Common Data Structure Operations
Array Sorting Algorithms
Learn More
Get the Official Big-O Cheat Sheet Poster
In our previous articles on Analysis of Algorithms, we had discussed asymptotic notations, their worst and best case performance etc. in brief. In this article, we discuss analysis of algorithm using Big – O asymptotic notation in complete details.
Big-O Analysis of Algorithms The Big O notation defines an upper bound of an algorithm, it bounds a function only from above. For example, consider the case of Insertion Sort. It takes linear time in best case and quadratic time in worst case. We can safely say that the time complexity of Insertion sort is O(n^2). Note that O(n^2) also covers linear time. The Big-O Asymptotic Notation gives us the Upper Bound Idea, mathematically described below:
The general step wise procedure for Big-O runtime analysis is as follows:
Some of the useful properties on Big-O notation analysis are as follow:
Basically, this asymptotic notation is used to measure and compare the worst-case scenarios of algorithms theoretically. For any algorithm, the Big-O analysis should be straightforward as long as we correctly identify the operations that are dependent on n, the input size. Runtime Analysis of Algorithms In general cases, we mainly used to measure and compare the worst-case theoretical running time complexities of algorithms for the performance analysis.
Where, n is the input size and c is a positive constant. Algorithmic Examples of Runtime Analysis:
Mathematical Examples of Runtime Analysis: If n = 10, If n=20, log(10) = 1; log(20) = 2.996; 10 = 10; 20 = 20; 10log(10)=10; 20log(20)=59.9; 10^{2}=100; 20^{2}=400; 2^{10}=1024; 2^{20}=1048576; 10!=3628800; 20!=2.432902e+18^{18}; Memory Footprint Analysis of Algorithms For performance analysis of an algorithm, runtime measurement is not only relevant metric but also we need to consider the memory usage amount of the program. This is referred to as the Memory Footprint of the algorithm, shortly known as Space Complexity. It basically depends on two major aspects described below:
Algorithmic Examples of Memory Footprint Analysis: The algorithms with examples are classified from the best-to-worst performance (Space Complexity) based on the worst-case scenarios are mentioned below: Ideal algorithm - O(1) - Linear Search, Binary Search, Bubble Sort, Selection Sort, Insertion Sort, Heap Sort, Shell Sort. Logarithmic algorithm - O(log n) - Merge Sort. Linear algorithm - O(n) - Quick Sort. Sub-linear algorithm - O(n+k) - Radix Sort. Space-Time Tradeoff and Efficiency There is usually a trade-off between optimal memory use and runtime performance. At the end of this topic, we can conclude that finding an algorithm that works in less running time and also having less requirement of memory space, can make a huge difference in how well an algorithm performs. ============================================================================= Big-O notation explained by a self-taught programmerThis is the first in a three post series. The second post talks about how to calculate Big-O. The third article talks about understanding the formal definition of Big-O. Big-O notation used to be a really scary concept for me. I thought this is how "real" programmers talked about their code. It was all the more scary because the academic descriptions (such as Wikipedia) made very little sense to me. This is frustrating because the underlying concepts aren't actually that hard. Simply put, Big-O notation is how programmers talk about algorithms. Algorithms are another scary topic which I'll cover in another post, but for our purposes, let's say that "algorithm" means a function in your program (which isn't too far off). A function's Big-O notation is determined by how it responds to different inputs. How much slower is it if we give it a list of 1000 things to work on instead of a list of 1 thing? Consider this code: def item_in_list(to_check, the_list): for item in the_list: if to_check == item: return True return False
So if we call this function like
The "complexity" of this function is "Orders of magnitude" is YET ANOTHER mathematical term which basically tells the difference between classes of numbers. Think the difference between 10 and 100. If you picture 10 of your closest friends and 100 people, that's a really big difference. Similarly, the difference between 1,000 and 10,000 is pretty big (in fact, its the difference between a junker car and a lightly used one). It turns out that in approximation, as long as you're within an order of magnitude, you're pretty close. If you were to guess the number of gumballs in a machine, you'd be within an order of magnitude if you said 200 gumballs. 10,000 gumballs would be WAY off.
Figure 1: A gumball machine whose number of gumballs is probably within an order of magnitude of 200.
Back to dissecting Some of you may have caught that if, in the code sample above, our item was always the first item in the list, our code would be really fast! This is true, but Big-O is all about the approximate worst-case performance of doing something. The worst case for the code above is that the thing we're searching for isn't in the list at all. (Note: The math term for this is "upper bound", which means its talking about the mathematic limit of awfulness).
If you wanted to see a graph for these functions, you ignore the So what are some other examples of this? Consider this function: def is_none(item): return item is None
This is a bit of a silly example, but bear with me. This function is
called Consider this function: def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return results
This matches every item in the list with every other item in the
list. If we gave it an array
Below is a comparison of each of these graphs, for reference. You can
see that an This is a pretty high level overview of Big-O notation, but hopefully gets you acquainted with the topic. There's a coursera course which can give you a more in depth view into this topic, but be warned that it will hop into mathematic notation very quickly. If anything here doesn't make sense, send me an email. Update: I've also written about how to calculate Big-O. I'm thinking of writing a book on this topic. If this is something you'd like to see, please express your interest in it here. A beginner's guide to Big O notationBig O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm. Anyone who's read Programming Pearls or any other Computer Science books and doesn’t have a grounding in Mathematics will have hit a wall when they reached chapters that mention O(N log N) or other seemingly crazy syntax. Hopefully this article will help you gain an understanding of the basics of Big O and Logarithms. As a programmer first and a mathematician second (or maybe third or fourth) I found the best way to understand Big O thoroughly was to produce some examples in code. So, below are some common orders of growth along with descriptions and examples where possible. O(1)O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set. bool IsFirstElementNull(IList<string> elements) { return elements[0] == null; } O(N)O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the bool ContainsValue(IList<string> elements, string value) { foreach (var element in elements) { if (element == value) return true; } return false; } O(N^{2})O(N^{2}) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N^{3}), O(N^{4}) etc. bool ContainsDuplicates(IList<string> elements) { for (var outer = 0; outer < elements.Count; outer++) { for (var inner = 0; inner < elements.Count; inner++) { // Don't compare with self if (outer == inner) continue; if (elements[outer] == elements[inner]) return true; } } return false; } O(2^{N})O(2^{N}) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2^{N}) function is exponential - starting off very shallow, then rising meteorically. An example of an O(2^{N}) function is the recursive calculation of Fibonacci numbers: int Fibonacci(int number) { if (number <= 1) return number; return Fibonacci(number - 2) + Fibonacci(number - 1); } LogarithmsLogarithms are slightly trickier to explain so I'll use a common example: Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set. This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets. This article only covers the very basics or Big O and logarithms. For a more in-depth explanation take a look at their respective Wikipedia entries: Big O Notation, Logarithms. |