Merge Sort

What is Merge Sort?

The merge sort algorithm is one of the fastest sorting techniques available, with a time complexity of O(n log n), and space complexity of O(n). It achieves this by using the 'divide-and-conquer' strategy, breaking a list into the smallest units possible before sorting and 'merging' them into a single sorted list.

Code Implementation

Multiple functions are used to help implement this algorithm:

merge(): merges two halves of an array.
1. Compares elements of both halves. The smaller element is copied into an auxiliary array. Continue doing so until one of the halves is exhausted.
2. Copy any remaining elements from the other half to the auxiliary array, which is now sorted.
3. Copy the sorted elements back from to the original array.
mSort(): divides an array into smaller subarrays, then merges and sorts them.
1. Recursively divide the array into halves until the subarray is a single element, which is the base case and returns the function.
2. Use merge() on the two halves to merge them into a single sorted subarray. Once all the recursive calls are resolved you will end up with one sorted array.
mergeSort(): entry point for the merge sort algorithm.
1. Creates an auxiliary array to temporarily hold elements to assist in swapping.
2. Calls mSort() to start the algorithm.

Big O Complexity

Best Case O(n log n)

Average Case O(n log n)

Worst Case O(n log n)

Regardless of how sorted the input array is, merge sort still divides the list into two halves n times. So the time complexity across all scenarios is the same: O(n log n)

Applications

Software Libraries and APIs

Merge sort offers a stable and efficient way of sorting for standard libraries of many high-level programming languages and APIs.

Big Data and Data Science

Since the time complexity for merge sort is O(n log n) regardless of the scenario, it is best used for large data sets.

Parallel Computing and Distributed Systems

Due to merge sort's divide-and-conquer strategy, different subarrays of data can be sorted independently in parallel. This is very useful in environments with multicore processors or distributed systems, where large data sets can be divided and processed simultaneously.

Page updated

Google Sites

Report abuse