Sorting
9/6/2023 10:27 AM Copyright @ gdeepak.com® 1
Deliverables
Reasons and
Counting
parameters Merge Sort
Sort
of sorting
Bubble Sort Shell Sort Bucket Sort
Selection Insertion
Radix Sort
Sort Sort
9/6/2023 10:27 AM Copyright @ gdeepak.com® 2
Sorting
Given N nos. , rearrange them so that
a1 ≤ a2 ≤ a3≤a4 ≤a5 ≤a6 ≤ a7≤a8 ……≤an-1 ≤an
There are n! different permutations possible for
this, and only one is the right sequence
9/6/2023 10:27 AM Copyright @ gdeepak.com® 3
Why Sorting
Sorting solves togetherness problem
or to find the similar elements
Matching items in two or more files
Searching by key values
9/6/2023 10:27 AM Copyright @ gdeepak.com® 4
Comparison is the key
Date comparison
CompareDate(A,B){
if (A.year < B.year) return -1;
if (A.year > B.year) return +1;
if (A.Month < B.Month) return -1;
if (A.Month > B.Month) return +1;
if (A.Day < B.Day) return -1;
if (A.Day > B.Day) return +1;
}
Single date comparison is doing 6 integer comparisons. Similarly
we should think of file names, Domain names
9/6/2023 10:27 AM Copyright @ gdeepak.com® 5
Inversions and Runs
How many numbers are in reverse order
Inversions of a permutation: if ai > aj and i<j the pair (ai,aj) is
called the inversion of the permutation.
e.g. 3 1 4 2 has three inversion namely (3,1) (3,2) and (4,2). Each
inversion is a pair of elements that are out of sort.
Runs in a given list represent sorted segments of the data in the
said order
The sequence 3 5 7 | 1 6 8 9 | 4 | 2 has four runs
9/6/2023 10:27 AM Copyright @ gdeepak.com® 6
Tableaux
• An array of integers with left justified rows is called
tableaux such that entries in each row are in increasing
order from left to right and entries in each column are
increasing from top to bottom.
1 2 5 9 10 15
3 6 7 13
4 8 12 14
11
9/6/2023 10:27 AM Copyright @ gdeepak.com® 7
Factors that may affect sorting
Range of
Numbers
To use extra
Count of
memory or in
Numbers
place
Want to retain
Uniqueness of
stableness of
Numbers
numbers
Type of data to be Already Sorted or
sorted not
9/6/2023 10:27 AM Copyright @ gdeepak.com® 8
Stable and unstable sort
Ram 3 Ram 3 Ram 3
Krishan 4 Krishan 4 Budha 4
Shyam 5 Vaman 4 Krishan 4
Narhari 8 Budha 4 Vaman 4
Vaman 4 Shyam 5 Shyam 5
Kapil 5 Kapil 5 Kapil 5
Budha 4 Narhari 8 Balram 8
Balram 8 Balram 8 Narhari 8
9/6/2023 10:27 AM Copyright @ gdeepak.com® 9
Bubble Sort
It works by comparing adjacent elements of an array and exchanges
them if they are not in order.
After each iteration (pass) the largest element sinks to the last
position of the array. In next iteration, second largest element sinks to
second last position and so on
for i = 1 to n-1 do
for j = 1 to n-i do
If (a[j+1] < a[j]) then swap a[j] and a[j+1]
9/6/2023 10:27 AM Copyright @ gdeepak.com® 10
6 5 3 1 1 1 1 1
5 3 1 3 3 2 2 2
3 1 5 5 2 3 3 3
1 6 6 2 4 4 4 4
8 7 2 4 5 5 5 5
7 2 4 6 6 6 6 6
2 4 7 7 7 7 7 7
4 8 8 8 8 8 8 8
9/6/2023 10:27 AM Copyright @ gdeepak.com® 11
Bubble Sort
Takes relatively less swaps for sorting the nearly sorted data
Slowest algorithm for sorting nearly reverse sorted data
Speed in which largest element sinks is high - it occupies its proper
place in one pass
Speed in which small element bubbles up is slow- only one exchange is
done on each path
Largest elements are rabbits and smallest tortoises
It is stable, in place and popular due to tiny code
9/6/2023 10:27 AM Copyright @ gdeepak.com® 12
Bubble sort-improvements
Worst case and average case complexity is O(n2)
Best case complexity is O(n)
Traversing in opposite direction in alternate passes; then
it is bidirectional bubble, cocktail or shaker sort.
If there are no exchanges, then the sorting should be
complete
Each next pass ends at the last swap of the previous pass
9/6/2023 10:27 AM Copyright @ gdeepak.com® 13
Recurrence
T(n) = T(n-1) + n T(n-1) = T(n-2) + (n-1)
T(n)= T(n-2)+n+ (n-1)
T(n)= T(n-3)+n+ (n-1)+ (n-2)
.
.
.
T(n)=T(1)+ n+(n-1)+ (n-2) + …+3+2 and T(1)=0
= n(n+1)/2 = O(n2)
9/6/2023 10:27 AM Copyright @ gdeepak.com® 14
Selection Sort
Selection sort works by selecting the maximum from the
given sequence and swapping it with last element in the
list. Then again selecting maximum element out of the first
n-1 elements and swapping it with the second last element
and so on.
For i = 1 to n-1 do
set min = i
For j = i+1 to n do
If (a[j] < a[min]) then set min = j
If (i < min) then swap a[i] and a[min]
9/6/2023 10:27 AM Copyright @ gdeepak.com® 15
9/6/2023 10:27 AM
Selection Sort
Complexity is O(n2) for best, average and worst case.
Stable and in place sorting algorithm. It is stable if we bring the minimum
value on the top and then shift all the elements down. If we swap the
minimum element with the first position then it is not stable.
Min max can be applied.
It outperforms bubble sort.
Number of swap operations is very less (Linear).
Independent of the initial ordering of the elements.
As n increases, selection sort keeps slowing down.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 17
Key points
Movements are less. So when we have heavy file of
records to be sorted based on some small index, then this
type of sort may be useful. e.g. To reorganize your video
files or music files, because cost of comparison will be
negligible in terms of data movement.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 18
Insertion Sort
It Considers the numbers one by one from the start. If it is a single
number then it is sorted by default.
Then we consider the next number and put it at a proper position and
now the list of two is sorted. Then we consider the third number and so
on.
Before examining record Rj , we assume that the preceding records R1 to
Rj-1 have already been sorted and we insert Rj into its proper place
among the previously sorted records
9/6/2023 10:27 AM Copyright @ gdeepak.com® 19
6 5 3 1 1 1 1 1
5 6 5 3 3 3 2 2
3 3 6 5 5 5 3 3
1 1 1 6 6 6 5 4
8 8 8 8 8 7 6 5
7 7 7 7 7 8 7 6
2 2 2 2 2 2 8 7
4 4 4 4 4 4 4 8
P1 P2 P3 P4 P5 P6 P7
9/6/2023 10:27 AM Copyright @ gdeepak.com® 20
Insertion Sort Algorithm
Insertion Sort(A)
1. for j2 to length[A]
2. do key A[j] //insert A[j] into the sorted sequence A[1..j-1]
3. i j-1
4. while i>0 and A[i]>key
5. do A[i+1] A[i]
6. i i-1
7. A[i+1]=key
Insertion-Sort invariant: At the start of each iteration of the for
loop of lines 1-7 the sub array A[1..j −1] is the sorted version of
the original elements of A.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 21
Insertion Sort-improvements
Binary Search with in sorted sequence
O(n2) in worst and average case. Inserting in between is costly because
we have to move remaining elements.
Good algorithm for small list up to 25-30 numbers. It is stable and in
place. It outperforms bubble and selection.
Best case O(n). Works well for nearly sorted lists.
Good for cases where whole data is not available initially and new data
values keep on adding in the list
9/6/2023 10:27 AM Copyright @ gdeepak.com® 22
Key points
Number of shifts is equal to the number of inversions
and number of comparisons is (no of inversions +n-1)
An array is partially sorted if the number of inversions
is O(N).
A small array appended to a large sorted array.
An array with only a few elements out of place.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 23
Shell Sort
If we have a sorting algorithm which moves items only one position at a
time, its average running time will be at best proportional to n2.
For substantial improvements over straight insertion, we need a
mechanism by which records can take long leaps instead of short steps
One implementation of shell sort is arranging the data sequence in a
two-dimensional array and then sorting the columns of array using
insertion sort.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 24
Shell Sort
Sort sequence can be 16-sort, 8-sort, 4-sort, 2-sort.
This does not allows the interaction between even and odd keys, so we
can try 7-sort,5-sort,3-sort and 1-sort
Shell Sort performs better then bubble, selection or insertion sort.
The Worst case complexity is O(nlog2n) or O(n3/2) or others depending
upon the Gap sequence.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 25
Shell Sort Example
32 95 16 82 24 66 35 19 75 54 40 43 93 68 Original data
32 95 16 82 24 66 35 19 75 54 40 43 93 68 Apply 5 sort
32 35 16 68 24 40 43 19 75 54 66 95 93 82 After 5 sort
32 35 16 68 24 40 43 19 75 54 66 95 93 82 Apply 3 sort
32 19 16 43 24 40 54 35 75 68 66 95 93 82 After 3-sort
16 19 24 32 35 40 43 54 66 68 75 82 93 95 After 1-sort
9/6/2023 10:27 AM Copyright @ gdeepak.com® 26
Insertion sort will take 38 swaps while Shell sort is taking 26 swaps
32 95 16 82 24 66 35 19 75 54 40 43 93 68
32 95 16 82 24 66 35 19 75 54 40 43 93 68
32 35 16 68 24 40 43 19 75 54 66 95 93 82 6 swaps
32 35 16 68 24 40 43 19 75 54 66 95 93 82
32 19 16 43 24 40 54 35 75 68 66 95 93 82 5 swaps
16 19 24 32 35 40 43 54 66 68 75 82 93 95 15 swaps
9/6/2023 10:27 AM Copyright @ gdeepak.com® 27
Merge Sort
Merge sort takes advantage of the ease of merging already sorted lists
into a new sorted list.
Its worst and average case running time is O(n log n). It is better than
previously discussed algorithms.
It can be implemented in place or out of place.
It works better because it deals with smaller lists instead of the full list
Only O(m+n) comparisons are required to merge two sorted lists.
because you have to traverse each list once if they're already sorted
9/6/2023 10:27 AM Copyright @ gdeepak.com® 29
6 5 1 1
5 6 3 2
3 1 5 3
Merge Sort Example
1 3 6 4
8 7 2 5
7 8 4 6
2 2 7 7
4 4 8 8
P1 P2 P3
9/6/2023 10:27 AM Copyright @ gdeepak.com® 30
Merge Sort
Merge-Sort(A)
1. If length[A]==1
2. Return A
3. Else
4. q ← ⌊Length[A]/2⌋
5. create arrays L[1..q] and R[q+1…length[A]]
6. copy A[1..q] to L
7. copy A[q+1.. Length[A]] to R
8. LS ← Merge-Sort(L)
9. LS ← Merge-Sort(R)
10. return Merge(LS,RS)
9/6/2023 10:27 AM Copyright @ gdeepak.com® 31
Merge Sort
Merge (L,R)
1. Create array B of length[L] + length [R]
2. i ←1
3. j ←1
4. For k ← 1 to length[B]
5. if j>length[R] or (i ≤ length [L] and L[i] ≤ R[j])
6. B[k] ← L[i]
7. i ← i+1
8. else
9. B[k] ← R[j]
10. j ← j+1
11. Return B
At the end of each iteration of for loop of lines 4-10 the sub array B[1..k]
contains smallest k elements from L and R in sorted order.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 32
Merge Sort-Example
9/6/2023 10:27 AM Copyright @ gdeepak.com® 33
Merge Sort
It is a divide and conquer algorithm. it can be easily parallelized. It
takes advantage of the fact if numbers are also sorted. It uses extra
memory of O(n).
If list is of length 0 or 1, then it is already sorted otherwise we divide the
unsorted list into two sub lists of about half the size. Sort each
sublist recursively by re-applying merge sort.
Merge the two sub lists back into one sorted list.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 34
Key points
It is too heavy for small size arrays. So another sort can
be implemented when merge sort reaches a threshold.
Normal threshold ≈ 10
It is optimal as compared to previous algorithms but
only in time. For space complexity in place sorts were
better.
Duplicate keys does not downgrade the performance.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 35
Recurrence Tree for Merge Sort
9/6/2023 10:27 AM Copyright @ gdeepak.com® 36
Lower Bound of nlgn for comparison sort
9/6/2023 10:27 AM Copyright @ gdeepak.com® 37
Comparison Sort Lower Bound Example
9/6/2023 10:27 AM Copyright @ gdeepak.com® 38
Comparison Sort Lower Bound Example
9/6/2023 10:27 AM Copyright @ gdeepak.com® 39
Comparison Sort Lower Bound Example
9/6/2023 10:27 AM Copyright @ gdeepak.com® 40
Nlogn bound
Any Tree that can sort n elements must have height Ω(nlgn)
Pf: Tree must contain ≥ n! leaves because there are n! possible
permutations.
A height h binary tree has ≤ 2h leaves
Thus 2h ≥ n!
h ≥ lg(n!)
𝑛
≥ lg( )n (By Sterling's Formula)
𝑒
= nlgn- nlge = nlgn- 1.44n
= Ω(nlgn)
9/6/2023 10:27 AM Copyright @ gdeepak.com® 41
Counting Sort
Idea of counting the numbers of elements greater then a particular
element in the list
Works in situations where keys are not significantly larger then the
number of items in the list.
An integer sorting algorithm and not a comparison sort.
Used as a subroutine in other sorting algorithms
Complexity is O(k+n) if k=O(n) where k is range of elements
9/6/2023 10:27 AM Copyright @ gdeepak.com® 42
Counting Sort Algorithm
Counting-Sort(A, B, k)
1 for i ← 0 to k
2 do C[i] ← 0
3 for j ← 1 to length[A]
4 do C[A[j]] ← C[A[j]] + 1 //C[i] contains no of elements = i
5 for i ← 1 to k
6 do C[i] ← C[i] + C[i - 1] //C[i] contains no of elements ≤ i
7 for j ← length[A] down to 1
8 do B[C[A[j]]] ← A[j]
9 C[A[j]] ← C[A[j]] - 1
9/6/2023 10:27 AM Copyright @ gdeepak.com® 43
A1 6 C1 0 C1 1 C1 1 B1 1
A2 5 C2 0 C2 1 C2 2 B2 2
A3 6 C3 0 C3 0 C3 2 B3 5
A4 1 C4 0 C4 0 C4 2 B4 6
A5 8 C5 0 C5 1 C5 3 B5 6
A6 6 C6 0 C6 3 C6 6 B6 6
A7 2 C7 0 C7 0 C7 6 B7 8
A8 8 C8 0 C8 2 C8 8 B8 8
List Frequency of occurrence Sum of Sorted List
preceding
values
9/6/2023 10:27 AM Copyright @ gdeepak.com® 44
Counting Sort (Special case of Bucket Sort)
Look each of the array index as a bucket where we are
throwing the number corresponding to that index and
increasing the count by one.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 45
Bucket Sort
Also known as bin sort or postman sort
Works by partitioning data into a number of buckets.
Each bucket is then sorted individually, either using a
different sorting , or recursively applying bucket sorting
Average complexity is O(n+k) and worst case is O(n2)
where k is number of bits used to represent numbers and
n is number of buckets.
9/6/2023 10:27 AM Copyright @ gdeepak.com® 46
Bucket Sort-approach
Set up an array of initially empty buckets.
Scatter: Go over the original array, putting each object in its bucket.
Sort each non-empty bucket.
Gather: Visit the buckets in order and put all elements back into the
original array
9/6/2023 10:27 AM Copyright @ gdeepak.com® 47
Bucket Sort Example
32 95 16 82 24 66 35 19 75 54 40 43 93 68
a1 a2 a3 a4 a5 a6 a7 a8 a9
16 24 32 40 54 66 75 82 95
19 35 43 68 93
16 19 24 32 35 40 43 54 66 68 75 82 93 95
9/6/2023 10:27 AM Copyright @ gdeepak.com® 48
Radix Sort
It uses positional value of the data to sort. Digits or characters are
compared having same significant position.
LSD (Least significant digit) based is stable
MSD (Most significant digit) based is non-stable but can be made
stable using additional memory
MSD sort involves less movements as compared to LSD because it
keeps on fixing positions, MSD need to make groups of specific bits
For each round use counting sort of O(k+n) or use bucket or other
stable sort. Non-comparative sort and similar to bucket sort
9/6/2023 10:27 AM Copyright @ gdeepak.com® 49
Radix Sort- LSD Example
Original 1st LSD sorted 2nd LSD sorted 3rd LSD sorted
329 720 720 329
457 355 329 355
657 436 436 436
839 457 839 457
436 657 355 657
720 329 455 720
355 839 657 839
9/6/2023 10:27 AM Copyright @ gdeepak.com® 50
Radix Sort- MSD Example
Original 1st MSD sorted 2nd MSD sorted 3rd MSD sorted
329 329 329 329
457 355 355 355
657 457 436 436
839 436 457 457
436 657 657 657
720 720 720 720
355 839 839 839
9/6/2023 10:27 AM Copyright @ gdeepak.com® 51
Radix Sort complexity
Suppose we have n integers each b bits ranging from 0 to 2b-1
e.g. we have 256 numbers from range 0 to 216-1 then
split into b/r digits each r bits long
0100 1000 0101 0100
1010 1010 0101 0101
0011 1100 0101 1010 e.g. we split into 16/4 digits each 4 bits long
Counting sort takes Θ(n + k) time to sort n numbers in the range from
0 to k – 1. If each b-bit word is broken into r-bit pieces, each pass of
counting sort takes Θ(n + 2r) time. Since there are b/r passes, we have
9/6/2023 10:27 AM Copyright @ gdeepak.com® 52
Radix Sort-Complexity
Complexity will be O(b/r(n+k) = O(b/r(n+2r)
(b/r)*n wants r to be big for overall term to be small (b/r)*2r wants r to
be small , We wish n term to dominate 2r Choose r maximum subject to
n≥2r
r=lgn so the term O(b/r(n+2r) becomes O(b/lgn(n+2lgn )
which is = 2bn/lgn = O(bn/lgn)
if numbers are in range 0…..2b then for numbers in range 0…..nd-1
e.g. for 256 numbers with no of digits as 5; b=5*lg256 =5*8=40
240 = (28)5 b=dlgn so complexity = O(dn) if d=O(1) then
complexity=O(n)
9/6/2023 10:27 AM Copyright @ gdeepak.com® 53
Questions, Feedback and Suggestions
9/6/2023 10:27 AM Copyright @ gdeepak.com® 54
Question 1
Which is the category of bubble sort algorithm.
A) Sorting by distribution
B) Sorting by exchange
C) Sorting by insertion
D) Sorting by partition
9/6/2023 10:27 AM Copyright @ gdeepak.com® 55
Question 2
If insertion sort runs in 8n2 steps and merge sort runs in
64nlgn steps, for which values of n does insertion sort
becomes slower than merge sort
A) 8
B) 32
C) 64
D) 128
9/6/2023 10:27 AM Copyright @ gdeepak.com® 56
Question 3
Heap sort, quicksort, and merge sort are all asymptotically
optimal, stable comparison based sort algorithms.
A) True
B) False
C) true only for quicksort & merge sort
D) true only for heap sort & quicksort
9/6/2023 10:27 AM Copyright @ gdeepak.com® 57