Algorithms and Designs
Algorithms and Designs
Chapter 1
Introduction to algorithms
Algorithms
An algorithm is a tool for solving a given problem. An algorithm is the finite sequence
of instructions such that each instruction is
Input: The algorithm must have input values from the specified set.
Output: Each algorithm is written to produce the output. So, the algorithm must give output
on given set of input values.
Definiteness: Every algorithm must be well define and has clear meaning.
Effectiveness: Every algorithm is effective which performs some specific action on the
given data.
Efficient: Algorithm should be efficient. It should not use unnecessary memory location.
Number of input:
The number of input values can affect the speed of algorithm. For example if we
use 100 as a input value, it take lot of time as compare to if we use 10 as a input value.
So, there is different speed on different input values.
Processor speed:
The speed of algorithm is also affected by the processor speed. For example the
speed of processor is slow then the speed of algorithm is also slow and if the speed of
processor is high then it can increase the speed of algorithm.
Simultaneous programs:
Location of items:
The location of items affects the speed. For example, an item is located in memory
on 5 position; it takes less time to access as compare to an item located on 1000th
th
position.
Different implementations:
Different softwares are also affecting the speed. For example, speed of a program
is different on window XP operating system and on window8 operating system.
Optimality of Algorithms
1) Environment specific
2) General techniques
Use of Indexing (e.g in searching, prefer to binary search due to use of indexing)
Spreadsheet are a 'special case' of algorithm that self optimize by virtue of their
dependency trees that are inherent in the design of spreadsheets in order to reduce
re-calculations when a cell changes.
4) Searching strings
A compiled algorithm will, in general, execute faster than the equivalent interpreted
algorithm
8) Just-in-time compilers
Algorithm analysis:
Algorithm analysis is to study the efficiency of algorithms when the input size grow,
based on the number of steps, the amount of computer time and space. We can measure
the performance of an algorithm by computing following factors:
1. Time complexity
2. Space complexity
3. Measuring an input size
4. Measuring running time
5. Order of growth
6. Best case, worst case and average case analysis
Space complexity:
To compute the space complexity we use two factors: constant and instance
characteristics. The space requirement S(p) can be given as:
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
4
S(p) = C + Sp
Where C is the constant i.e. fixed part and it denotes the space of inputs and
outputs. This space is an amount of space taken by instruction, variables and identifiers.
And Sp is a space dependent upon instance characteristics. This is a variable part whose
space requirement depends on particular problem instance.
Time complexity:
1. System load
2. Number of other programs running
3. Instruction set used
4. Speed of underlying hardware
The time complexity is therefore given in terms of frequency count. Frequency count is a
count denoting number of times of execution of statement.
It is observed that if the input is longer, then algorithm runs for a longer time. Hence
we can compute the efficiency of an algorithm as a function to which input size is passed
as a parameter. Sometimes to implement an algorithm we require prior knowledge of input
size.
We have already studied that the time complexity is measured in terms of a unit
called frequency count. The time which is measured for analyzing an algorithm is generally
running time.
Where
Order of growth:
Measuring the performance of an algorithm in relation with the input size n is called
order of growth. For example, the order of growth for varying input size of n is given below:
N log n n log n n2 2n
1 0 0 1 2
2 1 2 4 4
4 2 8 16 16
8 3 24 64 256
16 4 64 256 65,536
32 5 160 1024 4,294,967,296
If an algorithm takes minimum amount of time to run to completion for a specific set
of input then it is called best case time complexity.
The time complexity that we get from certain set of inputs is as an average same.
Therefore, corresponding input such a time complexity is called average case time
complexity.
Growth of Functions:
To choose the best algorithm, we need to check the efficiency of each algorithm.
The efficiency can be measured by computing the time complexity of each algorithm.
Asymptotic notation is a shorthand way to represent the time complexity.
Big oh notation:
The big oh notation is denoted by “O”. it is the method of representing the upper
bound of algorithm’s running time. Using big oh notation we can give longest amount of
time taken by the algorithm to complete.
Let f(n) and g(n) be non-negative functions. Let no and constant c are two integers
such that no denotes some value of input and n> no.
F(n) ≤ c * g(n)
F(n) is the big oh of g(n). it is also denoted as f(n) є O(g(n)). In other words f(n) is less than
g(n) if g(n) is multiple of some constant c.
Example:
Consider function f(n) = 2n+2 and g(n) = n2. We have to find some constant c, so that F(n)
≤ c * g(n).
f(n) = 2n+2
f(1) = 2(1)+2
f(1) = 4
and
g(n) = n2
g(1) = 12
g(1) = 1
If n = 2 then,
f(n) = 2n+2
f(2) = 2(2)+2
f(1) = 6
and
g(n) = n2
g(2) = 22
g(2) = 4
If n = 3 then,
f(n) = 2n+2
f(3) = 2(3)+2
f(1) = 8
and
g(n) = n2
g(3) = 32
g(3) = 9
Omega notation:
The omega notation is denoted by “Ω”. It is the method of representing the lower
bound of algorithm’s running time. Using omega notation we can give shortest amount of
time taken by the algorithm to complete.
The function f(n) is said to be in Ω(g(n)) if f(n) is bounded below by some positive
constant multiple of g(n) such that
F(n) is the omega of g(n). it is also denoted as f(n) є Ω(g(n)). Graphically we can represent
it as:
Example:
Consider function f(n) = 2n2+2 and g(n) = 7n. We have to find some constant c, so that
F(n) ≥ c * g(n).
f(n) = 2n2+2
f(1) = 2(1)2+2
f(1) = 4
and
g(n) = 7n
g(1) = 7(1)
g(1) = 7
If n = 2 then,
f(n) = 2n2+2
f(2) = 2(2)2+2
f(2) = 10
and
g(n) = 7n
g(2) = 7(2)
g(2) = 14
If n = 3 then,
f(n) = 2n2+2
f(3) = 2(3)2+2
f(3) = 20
and
g(n) = 7n
g(3) = 7(3)
g(3) = 21
If n = 4 then,
f(n) = 2n2+2
f(4) = 2(4)2+2
f(4) = 34
and
g(n) = 7n
g(4) = 7(4)
g(4) = 28
Theta notation:
The theta notation is denoted by “θ”. By this method the running time is between
lower and upper bound.
Let f(n) and g(n) be non-negative functions. There are two positive constants
namely c1 and c2. Such that
General plan:
General plan:
depends upon the worst case, average case and best case then that has to
be analyzed separately.
• Set up a recurrence relation with some initial condition and expressing the
basic operation.
• Solve the recurrence or at least determine the order of growth. While solving
the recurrence we will use the forward and backward substitution method.
And then correctness of formula can be proved with the help of mathematical
induction method.
OR
Example:
x(n) = x(n - 1) + 5
x(n) = [x(n - 2) + 5] +5
x(n) = [x(n - 3) + 5] +5 +5
x(n) = x(n - 3) + 5 * 3
…….
x(n) = x(n - i) + 5 * i
if i = n -1 then
x(n) = x(n - i) + 5 * i
x(n) = x(n – n + 1) + 5 * (n - 1)
x(n) = x(1) + 5 * (n - 1)
x(n) = 0 + 5 * (n - 1)
x(n) = 5 * (n - 1)
x(n) = x(n/2) + n
x(2k) = x(2k - 1) + 2k
……..
x(2k) = x(2k- i) + 2k- i +1 + 2k- i +2 +…..+ 2k
if i = k then
x(2k) = 1 + 21 + 22 +…..+ 2k
x(2k) = 2k + 1 - 1
x(2k) = 2.2k - 1
x(2k) = 2.n - 1
• Mathematical analysis
• Empirical analysis
suitable input set, the algorithm is analyzed. Let us discuss the steps to be followed while
analyzing algorithm empirically.
Exercise:
Determine whether the following true or false:
Chapter 2
Divide and conquer
Introduction:
In divide and conquer strategy the problem is broken down into two sub problems
and solution to these sub problems is obtained.
General method:
In divide and conquer method, a given problem is,
• Divided into smaller sub problems
• These problems are solved independently
• Combining all the solutions of the sub problems into solution of the whole.
If the sub problems are large enough then divide and conquer is reapplied. The generated
sub problems are usually of same type as the original problem. Hence recursive algorithms
are in divided and conquer strategy.
Binary search:
Binary search is an efficient searching method. While searching the elements using
this method the most essential thing is that the elements in the array should be sorted one.
An element which is to be searched from the list of elements stored in array A[0….n
-1] is called KEY element.
Let A[m] be the mid element of array A. Then there are three conditions that need
to be tested while searching the array using this method.
Example:
0 1 2 3 4 5 6
10 20 30 40 50 60 70
Low High
The KEY element (the element to be searched) is 60. Now to obtain middle element we
will apply a formula:
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
15
m = (low + high)/2
m = (0 + 6)/2
m=3
Then check A[m] = KEY, but A[3] = 40 which is less than 60. So, search is take place at
right side of the mid.
0 1 2 3 4 5 6
10 20 30 40 50 60 70
… 50 60 70
Now we will again divide this list and check the mid element.
m = (low + high)/2 … 50 60 70
m = (4 + 6)/2
m=5 Left m Right
Sublist Sublist
A[m] = KEY
A[5] = 60
The number is present in the list. Thus, we can search the desired number from the list of
elements.
Analysis:
The basic operation in binary search is comparison of search key with the array
elements. To analyze efficiency of binary search we count the number of times the search
key gets compared with the array elements. The comparison is also called a three way
comparison because algorithm makes the comparison to determine whether KEY is
smaller, equal to or greater than a[m].
In the algorithm after one comparison the list of n elements is divided into n/2 sub
lists. The worst case efficiency is that the algorithm compares all the array elements for
searching the desired element. In this method one comparison is made and based on this
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
16
comparison array is divided each time in n/2 sub lists. Hence the worst case time
complexity is given by:
Where
Cworst(n/2) is the time required to compare left sub list or right sub list. 1 is the one
comparison made with middle element.
Also, Cworst(1) = 1
But as we consider the rounded down value when array gets divided the above equation
can be written as
Cworst(1) = 1 …(2)
The above recurrence relation can be solved further. Assume n = 2 k the equation (1)
becomes,
Cworst(2k) = Cworst(2k/2) + 1
Cworst(2k - 1) = Cworst(2k - 2) + 1
Cworst(2k) = [Cworst(2k - 2) + 1] + 1
Cworst(2k) = Cworst(2k - 2) + 2
Then
Cworst(2k) = [Cworst(2k - 3) + 1] + 2
Cworst(2k) = Cworst(2k - 3) + 3
…..
…..
Cworst(2k) = Cworst(2k - k) + k
Cworst(2k) = Cworst(20) + k
Cworst(1) = 1
Cworst(2k) = 1 + k ….(5)
As we have assumed
n = 2k
log2 n = log2 2k
log2 n = k log2 2
log2 n = k (1)
log2 n = k
Cworst(n) = 1 + log2 n
Average case:
0 11
0 11
1 22
0 11
1 22
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
18
2 33
3 44
In this case there are three comparisons required to search KEY. Because
m = (low + high)/2
m = (0 + 3)/2
m=1
m = (low + high)/2
m = (2 + 3)/2
m=2
m = (low + high)/2
m = (3 + 3)/2
m=3
log2 n + 1 = c
log2 2 + 1 = c
1+1=c
2=c
Similarly if n = 8, then
log2 n + 1 = c
log2 8 + 1 = c
3+1=c
4=c
Thus, we can write that, average case time complexity of binary search is θ(log 2 n).
Merge sort:
The merge sort is a sorting algorithm that uses the divide and conquer strategy. In
this method division is dynamically carried out.
Divide: partition array into two sub lists s1 and s2 with n/2 elements each.
Example:
Analysis:
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
20
In merge sort algorithm the two recursive calls are made. Each recursive call
focuses on n/2 elements of the list. After two recursive calls one call is made to combine
two sub lists i.e. to merge all n elements. Hence we can write recurrence relation as:
T(n/2) is the time taken by left or right sub list to get sorted. And the cn is the time taken
for combining two sub lists.
We can obtain the time complexity of merge sort using two methods.
• Master theorem
• Substitution method
T(1) = 0 …(2)
As in equation 1,
and
a = bd
2 = 21
Hence the average and worst case time complexity of merge sort is θ(n log 2 n).
T(1) = 0 …(4)
If we put k = k -1 then,
……
……
T(1) = 0
T(2k) = 0 + k.c.2k
log2 n = k
Hence the average and worst case time complexity of merge sort is θ(n log 2 n).
Quick sort:
Quick sort is a sorting algorithm that uses the divide and conquer strategy. In this
method division is dynamically carried out. The three steps of quick sort are as follows:
• Divide: split the array into two sub array that each element in the left sub
array is less than or equal the middle element and each element in the right
sub array is greater than the middle element. The splitting of the array into
two sub arrays is based on pivot element. All the elements that are less than
pivot should be in left sub array and all the elements that are more than pivot
should be in right sub array.
• Conquer: recursively sort the two sub arrays.
• Combine: combine all the sorted elements in a group to form a list of sorted
elements.
Example:
Analysis:
Best case:
If the array is always partitioned at the mid, then it bring the best case efficiency of
an algorithm.
The recurrence relation for quick sort for obtaining best case time complexity
is,
And C(1) = 0
Where C(n/2) is the time required to sort left or right sub array. And n is the time required
for partitioning the sub array.
C(n) = 2C(n/2) + n
Now a = 2, b = 2
Thus, best case time complexity of quick sort is θ(n log2 n).
C(n) = 2C(n/2) + n
We assume n = 2k since each time the list is divided into two equal halves. Then equation
becomes,
C(2k) = 2C(2k/2) + 2k
C(2k) = 2C(2k - 1) + 2k
We get
C(2k) = 22 C(2k - 2) + 2k + 2k
….
….
As C(1) = 0, so
K = log2n
Thus it is proved that best case time complexity of quick sort is θ(n log2 n).
Worst case:
The worst case for quick sort occurs when the pivot is a minimum or maximum of
all the elements in the list. This can be graphically represented as
We can write as
C(n) = C(n - 1) + n
Or C(n) = n + (n - 1) + (n - 2) +….+ 2 + 1
But as we know
C(n) = θ(n2)
Sorting Techniques
Selection Sort:
This method is used to sort the array in ascending or descending order. If any array
has n elements, n -1 iterations are required to sort the array.
Scan the array to find its smallest element and swap it with the first element. Then,
starting with the second element scan the entire list to find the smallest element and swap
it with the second element. Then starting from the third element the entire list is scanned
in order to find the next smallest element. Continuing in this fashion we can sort the entire
list.
Example:
4 19 1 13
Iteration-1:
4 19 1 13
4 19 1 13
1 19 4 13
In iteration -1 the first element 4 is compared with 19 but 19 is greater than 4, so the
elements are not swapped. Then 4 is compared with 1, 1 is less than 4 so the elements
are swapped.
Iteration-2:
1 19 4 13
1 4 19 13
In iteration -2 the second element 19 is compared with 4, 4 is less than 19 so the elements
are swapped.
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
31
Iteration-3:
1 4 19 13
1 4 13 19
In iteration -3 the third element 19 is compared with 13, 13 is less than 19 so the elements
are swapped.
Sorted array:
1 4 13 19
Analysis:
• Step 1: The input size is n i.e. total number of elements in the list.
• Step 2: The basic operation is key comparison.
If A[ j ] < A[ min ]
• Step 3: This basic operation depend only on array size. Hence we can find
sum as
C(n) = inner loop with variable i × outer loop with variable j × basic operation
𝑛−2 𝑛−1
𝐶(𝑛) = ∑ ⬚ ∑ 1
𝑖=0 𝑗=𝑖+1
∑ 1 = (𝑛 − 0 + 1)
𝑖=0
∑ 1 = [(𝑛 − 1) − (𝑖 + 1) + 1]
𝑗=𝑖+1
𝑛−1
∑ 1 = (𝑛 − 1 − 𝑖)
𝑗=𝑖+1
𝑛−2
𝐶(𝑛) = ∑(𝑛 − 1 − 𝑖)
𝑖=0
𝐶(𝑛) = ∑(𝑛 − 1 − 𝑖)
𝑖=0
𝑛−2 𝑛−2
𝐶(𝑛) = ∑(𝑛 − 1) − ∑ 𝑖
𝑖=0 𝑖=0
We know that
𝑛
𝑛(𝑛 + 1)
∑𝑖 =
2
𝑖=1
So
𝑛−2
(𝑛 − 2)(𝑛 − 2 + 1)
∑𝑖 =
2
𝑖=0
Therefore,
𝑛−2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = ∑(𝑛 − 1) −
2
𝑖=0
As
𝑛
∑ 1 = (𝑛 − 1 + 1)
𝑖=1
We get,
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1)(𝑛 − 2 − 0 + 1) −
2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1)(𝑛 − 1) −
2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1)2 −
2
Prepared by Ambreen Sarwar
BSCS, Semester 5th (2013-17)
33
Thus time complexity of selection sort is θ(n2) for all input. But total number of key swaps
is only θ(n).
Bubble Sort:
The bubble sort method is used to arrange values of an array in ascending or
descending order.
Bubble sort method is a slow process. It is used for sorting only small amount of
data.
Example:
4 19 1 3
Iteration-1:
4 19 1 3
4 19 1 3
4 1 19 3
4 1 3 19
Iteration-2:
4 1 3 19
1 4 3 19
1 3 4 19
In iteration-2, 4 is compared with 1 and 1 is less than 4 so, elements are exchanged. Then
4 is compared with 3 and 3 is also less than 4 so elements are exchanged.
Iteration-3:
1 3 4 19
1 3 4 19
Sorted Array:
1 3 4 19
Analysis:
If A[ j ] > A[ j+1 ]
• Step 3: This basic operation depend only on array size. Hence we can find
sum as
C(n) = inner loop with variable i × outer loop with variable j × basic operation
𝑛−2 𝑛−2−𝑖
𝐶(𝑛) = ∑ ⬚ ∑ 1
𝑖=0 𝑗=0
As we know that,
∑ 𝟏 = (𝑛 − 0 + 1)
𝒊=𝟎
𝑛−2−𝑖
∑ 1 = (𝑛 − 2 − 𝑖 − 0 + 1)
𝑖=0
𝑛−2−𝑖
∑ 1 = (𝑛 − 1 − 𝑖)
𝑗=0
𝑛−2
𝐶(𝑛) = ∑(𝑛 − 1 − 𝑖)
𝑖=0
𝐶(𝑛) = ∑(𝑛 − 1 − 𝑖)
𝑖=0
𝑛−2 𝑛−2
𝐶(𝑛) = ∑(𝑛 − 1) − ∑ 𝑖
𝑖=0 𝑖=0
𝑛−2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = ∑(𝑛 − 1) −
2
𝑖=0
𝑛−2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1) ∑ 1 −
2
𝑖=0
As
𝑛
∑ 1 = (𝑛 − 1 + 1)
𝑖=1
We get,
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1)(𝑛 − 2 − 0 + 1) −
2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1)(𝑛 − 1) −
2
(𝑛 − 2)(𝑛 − 1)
𝐶(𝑛) = (𝑛 − 1)2 −
2
Solving this equation we will get
Insertion Sort:
Insertion sort is a typical example of decrease by one and conquer method. In this
method the element is inserted at its appropriate position. Generally the insertion sort
maintains a zone of sorted elements. If any unsorted element is obtained then it is
compared with the elements in the sorted zone and then it is inserted at the proper position
in the sorted zone. This process is continued until we get the complete sorted list. Hence
the name of this method is insertion sort.
There are three ways by which insertion sort is implemented.
1. In this method of implementation the sorted sub array is scanned from left to right
until the first element greater than or equal to A[n -1] is encountered and then insert
A[n -1] right before that element.
10 20 30 50 40
As we get element 40 which is lesser than 50 we will insert 40 before 50. Hence the
list will be
10 20 30 40 50
2. In this method of implementation the sorted sub array is scanned from right to left
until the first smaller than or equal to A[n -1] is encountered and then insert A[n -1]
right after that element.
10 20 30 50 40
As we get element 50 which is greater than 40 we will insert 50 after 40. Hence the
list will be
10 20 30 40 50
This method is also called as straight insertion sort or simply insertion sort.
3. This third implementation method is called binary insertion sort because in this
method the smaller element is searched using binary search method.
For example:
Consider the list of elements to be
10 20 30 50 40 60
Thus after binary insertion sortig method we get a sorted list as,
10 20 30 40 50 60
30 70 20 50 40 10 60
30 70 20 50 40 10 60
30 70 20 50 40 10 60
Compare 20 with the elements in sorted zone and insert it in that zone appropriate position
20 30 70 50 40 10 60
20 30 50 70 40 10 60
20 30 40 50 70 10 60
10 20 30 40 50 70 60
10 20 30 40 50 60 70
Analysis:
Best case:
When the basic operation executed only once then it results in best case time
complexity. It happens for almost sorted array. The number of key comparisons can be
n−1
Cbest (n) = ∑ 1
i=1
Cbest(n) = (n - 1) – 1+ 1
Cbest(n) = (n - 1)
Cbest(n) = θ(n)
Average case:
(n − 1)n 1
Cavg (n) = ×
2 2
n2 − n
Cavg (n) =
4
n2
Cavg (n) ≈
4
Cavg(n) = θ(n2)
Worst case:
When the list is sorted in descending order and if we want to sort such a list in
ascending order then it results in worst case behavior of algorithm.
𝐶𝑤𝑜𝑟𝑠𝑡 (𝑛) = ∑ ⬚ ∑ 1
𝑖=1 𝑗=0
𝑢
∑1 = 𝑢 − 𝑙 + 1
𝑙
𝑖−1
∑ 1 = (𝑖 − 1) − 0 + 1
𝑗=0
𝑖−1
∑1 = 𝑖 − 1−0 +1
𝑗=0
𝑖−1
∑1 = 𝑖
𝑗=0
𝑛−1
𝐶𝑤𝑜𝑟𝑠𝑡 (𝑛) = ∑ 𝑖
𝑖=1
𝑛
𝑛(𝑛 + 1)
∑𝑖 =
2
𝑖=1
𝑛(𝑛 − 1)
𝐶𝑤𝑜𝑟𝑠𝑡 (𝑛) =
2
Cworst(n) = θ(n2)
That means worst case time complexity of insertion sort is θ(n 2).
Count Sort:
The counting sort is not based on comparisons like most sorting methods are. This method
of sorting is used when all elements to be sorted fall in a known, finite and reasonably
small range.
In Counting sort, we are given array A[1 . . n] of length n. We required two more
arrays, the array B[1 . . n] holds the sorted output and the array c[1 . . k] provides
temporary working storage.
Example:
1 4 5 7 2 9
Here the maximum number is 9 so, the other array is created with size of 9. In this array
we write the frequencies of numbers or elements.
1 2 3 4 5 6 7 8 9
C 1 1 0 1 1 0 1 0 1
1 2 3 4 5 6 7 8 9
C’ 1 2 2 3 4 4 5 5 6
1st calculation:
1 2 3 4 5 6
B 1
1 2 3 4 5 6 7 8 9
C’ 0 2 2 3 4 4 5 5 6
2nd calculation:
1 2 3 4 5 6
B 1 4
1 2 3 4 5 6 7 8 9
C’ 0 2 2 2 4 4 5 5 6
3rd calculation:
1 2 3 4 5 6
B 1 4 5
1 2 3 4 5 6 7 8 9
C’ 0 2 2 2 3 4 5 5 6
4th calculation:
1 2 3 4 5 6
B 1 4 5 7
1 2 3 4 5 6 7 8 9
C’ 0 2 2 2 3 4 4 5 6
5th calculation:
1 2 3 4 5 6
B 1 2 4 5 7
1 2 3 4 5 6 7 8 9
C’ 0 1 2 2 3 4 4 5 6
6th calculation:
1 2 3 4 5 6
B 1 2 4 5 7 9
1 2 3 4 5 6 7 8 9
C’ 0 1 2 2 3 4 4 5 5
Sorted array:
1 2 4 5 7 9
Analysis:
time to read in the numbers and increment the appropriate element of counts= O(n)
In practice, we usually use counting sort algorithm when have k = O(n), in which case
running time is O(n).
Min Heap:
A min heap is a tree in which value of each node is less than or equal to value of its
children nodes.
Max Heap:
A max heap is a tree in which value of each node is greater than or qual to the value
of its children nodes.
Parent being greater or lesser in heap is called parental property. Thus heap has two
important properties.
Before understanding the construction of heap let us learn few basics that are required
while constructing heap.
Level of binary tree: the root of the tree is always at level 0. Any node is always at a level
one more than its parent nodes level. e.g
10 Level 0
20 30 Level 1
20 20 20 Level 2
Height of the tree: the maximum level is the height of the tree. The height of the tree is
also called depth of the tree. e.g. in above diagram the maximum level is 2 hence height
of the this tree is 2.
Complete binary tree: the complete binary tree is a binary tree in which all leaves are at
the same depth or total number of nodes at each level i are 2 i. e.g.
Almost complete binary tree: the almost complete binary tree is a tree in which:
• Each node has a left child whenever it has a right child. That means there is
always a left child, but for a left child there may not be a right child.
• The leaf in a tree must be present at height h or h -1. That means all the
leaves are on two adjacent levels.
Properties of heap:
Heap Sort:
• Heap construction
• Deletion of maximum key
Deletion of maximum key: Delete root key always for (n - 1) times to remaining heap.
Hence we will get the elements in decreasing order. For an array implementation of heap,
delete the element from heap and put the deleted element in the last position in array.
Thus after deleting all the elements one by one, if we collect these deleted elements in an
array starting from last index of array.
Example:
Analysis:
Running time = time required by 1st stage + time required by 2nd stage
Let, numbers of key comparisons needed for eliminating root keys from heap are for size
from n to 2. Then we can establish following relation.
𝐶2 (𝑛) ≤ 2 ∑ log 2 𝑖
𝑖=1
But a formula is
𝑛
∑ log 2 𝑖 = 𝑛 log 2 𝑛
𝑖=1
So
𝑛−1
∑ log 2 𝑖 = (𝑛 − 1) log 2 (𝑛 − 2)
𝑖=1
But
Time complexity of heap sort is θ(n log n) in both worst and average case.
Radix Sort:
This is a sorting method in which data is sorted by scanning each digit of every
element. We sort the list based on least significant bit (LSB) in first pass and then move
towards most significant bit (MSB) which results in sorted data.
Example:
Phase 1:
8 2
6 4
2 5
5 5
3 7
1 8
7 8
Phase 2:
1 8
2 5
3 7
5 5
6 4
7 8
8 2
Sorted array:
18 25 37 55 64 78 82
Analysis
The running time depends on the stable used as an intermediate sorting algorithm.
When each digit is in the range 1 to k, and k is not too large, counting sort is the obvious
choice. In case of counting sort, each pass over n d-digit numbers takes O(n + k) time.
There are d passes, so the total time for Radix sort is θ(n+k) time. There
are d passes, so the total time for Radix sort is θ(dn+kd). When d is constant then the best
and worst case time complexity of radix sort is θ(n).
Bucket Sort
Bucket sort is a technique of sorting the elements with the help of buckets. The
buckets are created for sorting the elements with some special data structure called linked
list. Like counting sort bucket sort keeps the knowledge of input data.
Example
.78, .17, .39, .26, .72, .94, .21, .12, .23, .68
A 1 .78 B 0 /
4 .26 3 .39 /
5 .72 4 /
6 .94 5 /
7 .21 6 .68 /
9 .23 8 /
10 .68 9 .94 /
Analysis
If we are using an array of buckets, each item gets mapped to the right bucket in
O(1) time, with uniformly distributed keys, the expected number of items per bucket is 1.
Thus, sorting each bucket takes O(1) times.
The total effort of bucketing, sorting buckets and concatenating the sorted buckets
together is O(n).
Chapter 3
Greedy Algorithms
Basics of graphs:
A graph is a collection of vertices or nodes, connected by a collection of edges.
Formally graph G is a finite set of (V, E) where V is a set of vertices and E is a set of adges.
Application of graph
Directed graph
A directed graph G consists of a finite set V-called the vertices or nodes and E set
of ordered pairs, called the edges of G.
Undirected graph
A directed graph G consists of a finite set V- set of vertices and E- a set of edges
of G.
In a graph, the number of edges coming out of a vertex is called the out-degree of
that vertex and the number of edges coming in is called the in-degree. In an undirected
graph we just talk about the degree of a vertex as the number of incident edges. By the
degree of a graph, we usually mean the maximum degree of its vertices.
Representation of graphs
1. Adjacency matrix
2. Adjacency list
Adjacency matrix:
In this method the graph can be represented by using matrix of n×n such that A[i][j].
if the graph has weights we can store the weights in the matrix.
Adjacency list:
In this method graph can be represented by using linked list. In this method there
is an array of linked list for each vertex V in the graph. If the edges have weights then
these weights can be stored in the linked list elements.
1 2 3
2 4
3 3 4
4 2
Greedy method
In an algorithmic strategy like Greedy, the decision of solution is taken based on
the information available. The greedy method is a straightforward method. This method is
popular for obtaining the optimized solutions. In greedy technique, the solution is
constructed through a sequence of steps, each expanding a partially constructed solution
obtained so far, until a complete solution to the problem is reached. At each step the choice
made should be,
Locally optimal: Among all feasible solutions the best choice is to be made.
Irrevocable: Once the particular choice is made then it should not get changed on
subsequent steps.
A spanning tree S is a subset of a tree T in which all the vertices of tree T are
present but it may not contain all the edges.
1. Prim’s algorithm
2. Kruskal’s algorithm
Prim’s algorithm
In Prim’s algorithm the pair with the minimum weight is to be chosen. Then adjacent
to these vertices whichever is the edge having minimum weight is selected. This process
will be continued till all the vertices are not covered. The necessary condition in this case
is that the circuit should not be formed.
Example:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Kurskal’s algorithm
In this algorithm the minimum weight is obtained. In this algorithm also the circuit
should not be formed. Each time the edge of minimum weight has to be selected, from the
graph. It is not necessary in this algorithm to have edges of minimum weights to be
adjacent.
Example:
Step 1:
{a b}{c}{d}{e}{f}{g}{h}{i}
Step 2:
{a b c}{d}{e}{f}{g}{h}{i}
Step 3:
{a b c d e}{f}{g}{h}{i}
Step 4:
{a b c d e f i}{g}{h}
Step 5:
{a b c d e f g h i}
Huffman’s Codes
Example:
Step 1:
e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
Step 2:
Next we use a Greedy algorithm to build up a Huffman Tree. We take start with
nodes for each character.
We then pick the nodes with the smallest frequency and combine them together to form a
new node.
Now we assign codes to the tree by placing a 0 on every left branch and a 1 on every right
branch.
e 00
d 010
u 011
l 100
Sp 101
i 1100
s 1101
k 1110
b 11110
v 11111
These codes are then used to encode the string. Thus, “duke blue devils” turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101
Dijkstra’s algorithm
The dijkstra’s algorithm suggests the shortest path from some source node to the
some other destination node. The source node or the node from we start measuring the
distance is called the start node and the destination node is called the end node. In this
algorithm we start finding the distance from the start node and find all the paths from it to
neighboring nodes. Among those whichever is the nearest node that path is selected. This
process of finding the nearest node is repeated till the end node. Then whichever is the
path that path is called the shortest path.
Since in this algorithm all the paths are tried and then we are choosing the shortest
path among them, this algorithm is solved by a greedy algorithm. One more thing is that
we are having all the vertices in the shortest path and therefore the graph doesn’t give the
spanning tree.
Example
T = remaining node
Step 1:
Dist (b) = min {old dist (b), dist (a) + w (a, b)}
Step 2:
Dist (b) = min {old dist (b), dist (d) + w (b, d)}
Step 3:
Dist (b) = min {old dist (b), dist (f) + w (f, b)}
Step 4:
Step 5:
Step 6:
Now the target vertex for finding the shortest path is z. hence the length of the shortest
path from the vertex a - z is 23. The shortest path is as follow:
Chapter 4
Dynamic Programming
Dynamic Programming
Dynamic programming is typically applied to optimization problem. This technique
is invented by a U.S Mathematician Richard Bellman in 1950. In the word dynamic
programming the word programming stands for planning and it does not mean by computer
programming.
Warshall Algorithm
The Warshall algorithm, actually called as Floyd Warshall algorithm is for finding
the shortest path between every pair of vertices in the graph. This algorithm works for both
directed as well as undirected graphs. This algorithm is invented by Robert Floyd and
Stephen Warshall hence it is often called as Floyd Warshall algorithm.
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 ∞ 10 48 ∞ M 1 2 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 ∞ 10 48 ∞ M 1 2 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 ∞ 10 48 ∞ M 1 2 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 ∞ M 1 1 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 ∞ M 1 1 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 ∞ M 1 1 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 ∞ M 1 1 3 4 5
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 ∞ 19 7 L 1 2 3 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 ∞ 7 20 ∞ X 1 2 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 ∞ X 1 1 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 ∞ X 1 1 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 ∞ X 1 1 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 ∞ X 1 1 3 4 5
D(0) R(0)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 36 X 1 2 3 4 1
D(1) R(1)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 36 X 1 2 3 4 1
D(1) R(1)
A M L B X A M L B X
A ∞ 23 10 ∞ 18 A 1 2 3 4 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 1 2 3 4 5
X 18 41 7 20 36 X 1 2 3 4 1
D(1) R(1)
A M L B X A M L B X
A 46 23 10 71 18 A 2 2 3 2 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B 71 48 19 96 20 B 2 2 3 2 5
X 18 41 7 20 36 X 1 2 3 4 1
D(2) R(2)
A M L B X A M L B X
A 46 23 10 ∞ 18 A 2 2 3 2 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 2 2 3 2 5
X 18 41 7 20 36 X 1 2 3 4 1
D(2) R(2)
A M L B X A M L B X
A 46 23 10 ∞ 18 A 2 2 3 2 5
M 23 46 10 48 41 M 1 1 3 4 1
L 10 10 20 19 7 L 1 2 1 4 5
B ∞ 48 19 ∞ 20 B 2 2 3 2 5
X 18 41 7 20 36 X 1 2 3 4 1
D(2) R(2)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 20 19 7 L 1 2 1 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
D(3) R(3)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 20 19 7 L 1 2 1 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
D(3) R(3)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 20 19 7 L 1 2 1 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
D(4) R(4)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 20 19 7 L 1 2 1 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
D(4) R(4)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 20 19 7 L 1 2 1 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
D(4) R(4)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 14 19 7 L 1 2 5 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
D(5) R(5)
A M L B X A M L B X
A 20 20 10 29 17 A 3 3 3 3 3
M 20 20 10 29 17 M 3 3 3 3 3
L 10 10 14 19 7 L 1 2 5 4 5
B 29 29 19 38 20 B 3 3 3 3 5
X 17 17 7 20 14 X 3 3 3 4 3
Chapter 5
String Matching
Introduction
String matching is a very important subject in the wider domain of text processing.
String matching algorithms are basic components used in implementation of practical
software existing under most operating systems. Moreover, they emphasize programming
methods that serve as paradigms in other fields of computer science such as system
programming or software design. In string matching algorithms data is memorized in
various ways.
Pattern matching
A string is a sequence of characters. Let text[0,…, n-1] be the string of length n,
and pattern[0,…, m-1] be some substring of length m, then pattern matching is a
technique of finding the substring in text which is equal to pattern. The pattern matching
problem is also called as PMP.
Knuth-Morris-Pratt’s algorithm compares the text with pattern in left to right. But in
this algorithm shift is made more intelligently if match is not found. The shift distance is
determined by the widest border of matching prefix of P.
Symbol from 0 to 4 are matched. But the symbols at position 5 are not matching. The shift
distance is determined by the widest border of matching prefix P. The matching prefix is
abcab, assume it to be m.
Example
The KMP algorithm makes use of information gained by previous symbol comparisons. It
never re-compares a text symbol that has matched a pattern symbol. Hence time
complexity of KMP algorithm is O(n).
When mismatch occurs then this idea is used. How much is to be shifted – is
decided by using bad symbol shift, which is basically based on mismatch that occurs on
text’s character.
• If the rightmost character of the pattern does not match then shift the pattern
to the right by pattern’s length.
• If the rightmost character of pattern matched with the text, then move from
right to left and compare each character with text. If at some point a
mismatch text’s character ‘T’ occurs after k matches (when k > 0) then first
build the shift table. And then compute bad symbol shift.
Let us first understand the procedure for building shift table. Following are the two steps
used to compute shift table.
For example
For a word RADAR the length of this pattern is 5. Hence all the entries of each
alphabet except R, A, D will be filled up by 5.
Now,
If we scan from right to left then character A appears at position 1. Hence make entry for
alphabet A as 1.
After building the shift table, the bad symbol shift can be computed. It is denoted by d1.
For example
If we want to search for the pattern EVENING from given text then we construct
shift table first as
Then,
This shift helps in shifting safely when a matched part of the pattern is obtained. It
is denoted by d2. Good suffix shift d2 is applied after 0 < k < m last characters were
matched. The good suffix shift d2 can be computed as
d2 = distance between matched suffix of size k and its rightmost occurrence in the pattern
that is not preceded by the same character as the suffix
Example:
d2 (2) = 4
If there is no such occurrence of suffix in the pattern then match the longest part of the k
character suffix with corresponding prefix. And if such a matching of prefix- suffix is not
there in the pattern then
Example:
d2 (1) = 5
d2 (3) = 3
d2 (4) = 3
K Pattern d2
1 TOMATO 6
2 TOMATO 4
3 TOMATO 4
4 TO MATO 4
5 TOMATO 4
The worst case efficiency of Boyer-Moore algorithm for searching the first occurrence of
the pattern is θ (n) where n is the length of the text.