Coming into this course, you should already be familiar with Abstract Data Types and with the way that C++ classes are used to implement them.
To be certain that everyone is on the same page, however, this module examines some critical information about data and functions in C++.
Most of this material should be review for you, though it may be presented from a different perspective than you are used to. If you find yourself completely lost in these readings and/or unable to complete the first assignment, you may need to plan on doing some very rapid but intense review of the prerequisite material (the Resources page has useful links) or even reconsider whether you are prepared to take this course.
Templates are a mechanism for writing algorithmic patterns that can be applied to a wide variety of different data types.
Iterators are a data abstraction for the notion of a position within a container of data. Iterators allow us to express many simple algorithms in a simple form, regardless of whether the underlying container is an array, a linked list, a tree, or some other data structure.
Templates and iterators are often used together to provide patterns for code that can be applied to a wide range of underlying data structures.
An important theme throughout this semester will be viewing the process of developing software as an engineering process. Now, engineers in traditional engineering disciplines, civil engineers, electrical engineers, and the like, face trade offs in developing a new product, trade offs in cost, performance, and quality.
Software developers face the same kinds of choices. Early on, you may have several alternative designs and need to make a decision about which of those designs to actually pursue. It’s no good waiting until the program has already been implemented, written down in code. By then you’ve already committed to one design and invested significant resources into it.
In this module, we’ll look at mathematical techniques for analyzing algorithms to determine what their speed will be, or, more precisely, how badly their speed will degrade as we apply them to larger and larger amounts of data. The key to doing this will be to analyze the code for its worst case complexity.
A substantial amount of the data that we work with is arranged into a simple linear ordering, one thing after another. Of course, you are already quite familiar with one way of doing this, by putting the data into arrays.
In this module we explore the two most common variations on ADTs for maintaining data in a sequence: vectors and lists.
Vectors provide a mechanism for array-like sequences that can expand to accommodate the amount of data to be stored.
Lists allow for efficient insertion and removal of data from any location in the sequence, at the cost of limiting access to moving sequentially from one end of the list to the other.
An important theme throughout this semester will be viewing the process of developing software as an engineering process. Now, engineers in traditional engineering disciplines, civil engineers, electrical engineers, and the like, face trade offs in developing a new product, trade offs in cost, performance, and quality.
Software developers face the same kinds of choices. Early on, you may have several alternative designs and need to make a decision about which of those designs to actually pursue. It’s no good waiting until the program has already been implemented, written down in code. By then you’ve already committed to one design and invested significant resources into it.
In this module, we’ll look at mathematical techniques for analyzing algorithms to determine what their speed will be, or, more precisely, how badly their speed will degrade as we apply them to larger and larger amounts of data. The key to doing this will be to analyze the code for its worst case complexity.
Sometimes one can achieve more readable, expressive algorithms by using ADTS that limit one’s choices.
Stacks and queues do not do anything that a vector or list cannot, but they limit us to access and modify their contents only at the ends of the sequence, never in the interior. There are a number of useful algorithms that work perfectly within these limitations.
In Part I, we analyzed the speed of algorithms exclusively from the point of view of the worst case. One might argue that this is unnecessarily pessimistic on our part. There are some algorithms for the worst case input is rare enough that we might not be worried about it, particularly if we believe that typical inputs can be handled much more quickly.
We therefore next turn to the idea of average case complexity a measure of how the average behavior of a program degrades as the input sets get larger and larger.
Sorting algorithms arrange data stored in a sequence into a new desired order.
Because the data structures involved are elementary (arrays, vectors, and, occasionally, linked lists) and because the need for sorted data arises in so many practical applications, you probably learned learned one or more sorting algorithms in your earliest programming classes.
But sorting is actually a fairly subtle problem, and the sorting algorithms taught to beginning programmers are chosen for simplicity, not performance. They are often slow and rather clumsy.
In this section we’ll look at more sophisticated sorting algorithms. We’ll also consider the fundamental limits on just how fast a sorting algorithm can get, and we’ll see that some practical algorithms actually approach that upper speed limit.
Most of the data structures we have looked at so far have been devoted to keeping a collection of elements in some linear order.
Trees are the most common non-linear data structure in computer science. Trees are useful in representing things that naturally occur in hierarchies (e.g., many company organization charts are trees) and for things that are related in a “is-composed-of” or "contains manner (e.g., this country is composed of states, each state is composed of counties, each county contains cities, each city contains streets, etc.)
Trees also turn out to be exceedingly useful in searching. Properly implemented, a tree can be both searched and inserted into in O(log N) time. Compare this to the data structures we’ve seen so far, which may allow us to search in O(log N) time but insert in O(N), or insert in O(1) but search in O(N).
We have sen that trees are an efficient data structure for both searching and updating collections of data.
These can serve as the underlying data structure to implement associative containers like sets (collections of items with no duplicates) and maps (lookup “tables” that can search for data associated with keys).
Hashing is an alternative to trees for providing fast associative containers (sets and maps).
Hashing stored data in arrays (primarily), but does not store them in any predictable order, or even contiguously. Instead, hashing uses a special “hash function” to compute a desired location for any key we want to insert. If you don’t actually know the internal details of the hash function, its choices of locations would seem arbitrary, almost random.
Nonetheless, it works, and in many cases works well. Hash tables can often store and search for data in O(1) average time.
By this point in the semester, you’ve learned a lot of algorithms. Many practical problems can be solved by direct application of these. But what do you do when faced with an unfamiliar problem, one for which none of the “canned” algorithms in your personal toolbox are suitable?
When you have to design your own algorithms, you should consider some of the common patterns or styles that are available to you. This lesson looks at these styles, many of which we have seen before, and a few new ones as well.
A priority queue is an ADT that allows us to repeatedly find and remove the largest (or smallest) item from a colleciton of data. They take their name from the idea that they implement a “queue” of items awaiting processing, but one in which some items have higher priority than others and so get to jump to the head of the line if nothing ahead has even higher priority.
Priority queues are generally implemented using heaps, a tree with very special ordering properties.
A graph is a collection of vertices (nodes) connected by edges in arbitrary fashion. Graphs are used to represent data relationships that are far more complicated than could be represented using trees or lists.