Data Structures Algorithms

revision

ü Data structures: conceptual and concrete ways to organize data for efficient storage and efficient manipulation

ü Employment of this data structures in the design of efficient algorithms.

ü

Describe an algorithm – pseudo-codeü Describe a data structure – ADT (Abstract Data Type)

ü Fundamental Data Structure:

l list

u Array (2D or more, row-major or column-major)

u linked list

u string

l stack

l queue

l priority queue (heaps)

More data structures

l hashing

l graph

l tree

l set and dictionary

ü Important Problems:

l sorting

l searching

l string processing

l graph problems

l combinatorial problems

l geometric problems

l numerical problems

ü Types of algorithms:

l Greedy algorithms

l Divide and Conquer

l Dynamic programming

l Randomized algorithms

l Backtracking

ü Growth rate:

l Linear Growth T(n) = n

l Quadratic Growth T(n) = n2

l Cubic Growth T(n) = n3

l Exponential Growth T(n) = 2n

l Logarithmic Growth T(n) = n log n

ü Asymptotic Notations:

l O notation: asymptotic “less than”: f(n) “≤” g(n)

l Ω notation: asymptotic “greater than”: f(n) “≥” g(n)

l Θ notation: asymptotic “equality”: f(n) “=” g(n)

f(n) is O(g(n)), f(n) is Ω(g(n)), or f(n) is Θ(g(n)).

ü Example Recurrences:

n T(n) = T(n-1) + n

Recursive algorithm that loops through the input to eliminate one item

n T(n) = T(n/2) + c

Recursive algorithm that halves the input in one step

n T(n) = T(n/2) + n

Recursive algorithm that halves the input but must examine every item in the input

n T(n) = 2T(n/2) + 1

Recursive algorithm that splits the input into 2 halves and does a constant amount of other work

ü Methods for Solving Recurrences:

n Iteration method

n Substitution method

n Recursion tree method

n Master method

ü Iteration Method – Example

T(n) = n + 2T(n/2)

= n + 2(n/2 + 2T(n/4))

= n + n + 4T(n/4)

= n + n + 4(n/4 + 2T(n/8))

= n + n + n + 8T(n/8)

… = in + 2^iT(n/2^i)

= kn + 2^kT(1)

= nlgn + nT(1) = Θ(nlgn)

ü Insertion Sort:

u Insertion sort is very similar in that after the kth iteration, the first k elements in the array are in sorted order. Insertion sort's advantage is that it only scans as many elements as it needs in order to place the k + 1st element, while selection sort must scan all remaining elements to find the k + 1st element.

u It can be seen as an advantage for some real-time applications that selection sort will perform identically regardless of the order of the array, while insertion sort's running time can vary considerably. However, this is more often an advantage for insertion sort in that it runs much more efficiently if the array is already sorted or "close to sorted."

InsertionSort(A)for i = 2 to length(A) key = A[i]

j = i - 1

while (j > 0) and (A[j] > key)

A[j+1] = A[j] // swap of O(n2)

j = j - 1

A[j+1] = key

ü Selection Sort:

u Selection sort confirms the place of i th element in i th iteration. And that will be the final sorted place for that i th element.

u While selection sort is preferable to insertion sort in terms of number of writes (Θ(n) swaps versus Ο(n2) swaps). This can be important if writes are significantly more expensive than reads, such as with EEPROM or Flash memory, where every write lessens the lifespan of the memory.

u Finally, selection sort is greatly outperformed on larger arrays by Θ(n log n) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays (i.e. fewer than 10-20 elements).

public static void selectionSort(int[] arr) {

for (int i = 0; i < arr.length - 1; i++) {

int maxIndex = i;

int max = arr[maxIndex];

for (int j = i + 1; j < arr.length; j++) {

if (arr[j] > max) {

maxIndex = j;

max = arr[maxIndex];

}

}

Swap arr[i] to arr[maxIndex]; // swap of O(n)

}

}

ü Bubble Sort:

bubbleSort( A)

for j = 1, j++ , length(A)

for i = 1 , i++, length(A) - 1

if A[i-1] > A[i]

swap( A[i-1] , A[i] )

ü Merge Sort:

MERGE-SORT(A, p, r)

if p < r Check for base case

then q ← ë(p + r)/2û Divide

MERGE-SORT(A, p, q) Conquer

MERGE-SORT(A, q + 1, r) Conquer

MERGE(A, p, q, r) Combine

MERGE(A, p, q, r)

i ← 1; j ← 1

for k ← p to r

do if L[ i ] ≤ R[ j ]

then A[k] ← L[ i ]

i ←i + 1

else A[k] ← R[ j ]

j ← j + 1

u Divide: compute q as the average of p and r: D(n) = Q(1)

u Conquer: recursively solve 2 subproblems, each of size n/2 Þ 2T (n/2)

u Combine: MERGE on an n-element subarray takes Q(n) time Þ C(n) = Q(n)

Q(1) if n =1

T(n) = 2T(n/2) + Q(n) if n > 1

ü Quick Sort:

n Rearrange the list so that all the elements in the first s positions are smaller than or equal to the pivot and all the elements in the remaining n-s positions are larger than or equal to the pivot (see next slide for an algorithm)

n Choosing the pivot is an essential step.

Depending on the pivot the algorithm may run very fast, or in quadric time.

n [5, 3, 2, 6, 1] [ 8 ] [10, 25, 18, 17]

n Start 2 finger test, one from left searching for element greater than pivot element and other from right searching for element smaller than pivot element. And when both are fount then swap them. Repeat it until all left element to pivot are smaller and right elements to pivot are greater than pivot.

quicksort(A, lo, hi)

if lo < hi

p = pivot(A, lo, hi)

left, right = partition(A, p, lo, hi) // note: multiple return values

quicksort(A, lo, left)

quicksort(A, right, hi)

ü Bucket Sort:

u Suppose we need to sort an array of positive integers {3,11,2,9,1,5}. A bucket sort works as follows: create an array of size 11. Then, go through the input array and place integer 3 into a second array at index 3, integer 11 at index 11 and so on. We will end up with a sorted list in the second array.

u Suppose we are sorting a large number of local phone numbers, for example, all residential phone numbers in the 412 area code region (about 1 million) We sort the numbers without use of comparisons in the following way. Create an a bit array of size 107. It takes about 1Mb. Set all bits to 0. For each phone number turn-on the bit indexed by that phone number. Finally, walk through the array and for each bit 1 record its index, which is a phone number.

u We immediately see two drawbacks to this sorting algorithm.

u Firstly, we must know how to handle duplicates.

u Secondly, we must know the maximum value in the unsorted array.

u Thirdly, we must have enough memory - it may be impossible to declare an array large enough on some systems.

u Not based on Comparison

ü Tree Traversal:s

ü Height and the number of nodes in a Binary Tree:

u If the height of a binary tree is h then the maximum number of nodes in the tree is 2h+1 -1

u If the number of nodes in a complete binary tree is n, then

2h+1 - 1 = n

2h+1 = n + 1

h+1 = log(n+1)

H = log(n+1) – 1 ---- O(logn)

ü Heap sort:

n Four basic procedures on heap are

u Heapify, which runs in O(lg n) time.

u Build-Heap, which runs in linear time.

u Heap Sort, which runs in O(n lg n) time.

u Extract-Max, which runs in O(lg n) time.

HEAPSORT (A)

BUILD_HEAP (A)

for i ← length (A) down to 2 do

exchange A[1] ↔ A[i]

heap-size [A] ← heap-size [A] - 1

Heapify (A, 1)

BUILD_HEAP (A)

heap-size (A) ← length [A]

For i ← floor(length[A]/2) down to 1 do

Heapify (A, i)

Heapify (A, i)

l ← left [i]

r ← right [i]

if l ≤ heap-size [A] and A[l] > A[i]

then largest ← l

else largest ← i

if r ≤ heap-size [A] and A[i] > A[largest]

then largest ← r

if largest ≠ i

then exchange A[i] ↔ A[largest]

Heapify (A, largest)

ü Hashing:

1. Seperate chaining

2. Open addressing

a) Linear Probing (Hi(x)= (H(x)+i) mod B)

b) Quadratic Probing (Hi(x)= (H(x)+i2) mod B)

c) Double hashing (Hi(x)= (H(x)+ i * H2(x)) mod B)

ü Binary Search Tree:

n successor (x): (Eg: successor (15) = 17)

n Case 1: right (x) is non empty

l successor (x ) = the minimum in right (x)

n Case 2: right (x) is empty

l go up the tree until the current node is a left

child: successor (x ) is the parent of

the current node

l if you cannot go further (and you

reached the root):

x is the largest element

n predecessor (15): (eg: predecessor (15) = 13)

n Case 1: left (x) is non empty

l predecessor (x ) = the maximum in left (x)

n Case 2: left (x) is empty

l go up the tree until the current node is a right child: predecessor (x ) is the parent of the current node

l if you cannot go further (and you reached the root): x is the smallest element

n Deletion:

u Case 1: z has no children

Delete z by making the parent of z point to NIL, instead of to

u Case 2: z has one child

Delete z by making the parent of z point to z’s child, instead of to z

Update the parent of z’s child to be z’s parent

u Case 3: z has two children

z’s successor (y) is the minimum node in z’s right subtree

y has either no children or one right child (but no left child)

Delete y from the tree (via Case 1 or 2)

Replace z’s key and satellite data with y’s.

ü AVL Tree:

n An AVL tree is a binary search tree in which

n for every node in the tree, the height of the left and right subtree differ by at most 1.

n Height of subtree: Max # of edges to a leaf

n Height of an empty subtree: -1

n Balance property

u b = height (left subtree) - height(right subtree)

u – balance of every node is: -1 ≤b ≤1

u – depth is θ(log n)