Ada Notes
Ada Notes
INTRODUCTION
1.1 Notion of algorithm
The reference to instructions in the definition implies that there is something or some one
capable of understanding and following the instructions given .we call this computer keeping
in mind and that before electronic computer was invented, the word computer meant a human
being involved in performing numeric calculations.
As example illustrating a notion of algorithm, we consider in this section 3 methods for
solving the same problem: computing the greatest common integers
These examples help us to illustrate several important points
1. The non-ambiguity requirement for each step of an algorithm can’t be compromised
2. The range of inputs for which an algorithm works has to be specified carefully
3. The same algorithm can be represented in several different ways
4. Several algorithms for solving the same problem exits
5. Algorithms for the same problem can be based on very different ideas and can solve the
problem with dramatically different speeds
Euclid’s algorithm
Euclid (m, n)
//compute gcd (m, n) by Euclid’s algorithm
// Input: two non negative, not both 0 integers m and n
//output: greatest common divisor of m and n
While n! =0 do
r<- m mod n
m <-n
Understanding the problem: the first thing you need to do before designing the algorithm is
to understand completely the problem given. Read the problem carefully and ask questions if
you have any doubts. Do some small examples think about special cases and ask questions
again if we need.
The use of well chooses data structure is often a crucial factor in the designing of
efficient algorithms.
Many years ago the fundamental importance of algorithm and data structure for computer
programming is algorithms + data structure = programs. But in the new world of object-
oriented programming, data structures remain crucial importance for both design and
analysis of algorithms.
Once you have designed an algorithm, you need to specify it in some fashion. In
Euclid’s algorithm we specified in the words and in pseudo code.
Pseudo code is a mixture of a natural language and a programming language like
constructs.
Once an algorithm has been specified, we have to prove its correctness; we have to prove
that algorithm yields a required result for every legitimate input in a finite amount of
time.
For some algorithms proof of correctness is quite easy for others it might be some
complex. A common technique for providing correctness is to use mathematical
induction because algorithm iterations provide a natural sequence of steps needed for
such proofs.
Analyzing an algorithm:
After correctness the most important one is efficiency. There are two types of algorithm
efficiency
1. Time efficiency: indicates how fast the algorithm runs.
2. Space efficiency: indicates how much extra memory the problem needs.
1. Sorting.
2. Searching.
Sorting:
The sorting problem asks us to rearrange the items of a given list in ascending order. We
use this sorting technique in many places for example for library to arrange the books in
some order in companies about there employees.
Sorting makes many questions about the list easier to answer.
String processing:
A string is a sequence of characters from an alphabet. Strings of particular interest are
text strings, which comprise letters, numbers and special characters.
Text strings, which comprise zeros and ones. One particular problem that of searching
for a given word in a text has attracted special attention from researchers. They call it as
string matching.
Graph problems:
One of the oldest and most interesting areas in algorithmic is graph algorithms. A graph
can be a thought of collection of points called vertices, some of which are connected by
the line segment called edges. Graphs are used for modeling a wide variety of real – life
applications including transportation and communication networks, project scheduling
and games. One of the interest recent applications is an estimation of web diameter,
which is the max number of links one needs to follow to reach one web page from other
by the most direct route between them.
Basic graph algorithms include graph traversal algorithms, shortest path algorithms,
and topological sorting for graphs with directed edges.
Some graph problems are computationally very hard; there is less chances to solve
thus problem in a given amount of time some of those problems are traveling sales man
problem and graph coloring problem. Traveling sales man problem is to finding shortest
tour through n cities that visits every city exactly once. The graph coloring asks us to
assign the smallest number of colors to vertices of graph so that no two adjacent vertices
are the same color.
Combinatorial problems. :
These are the most difficult problems. Because of following facts those are difficult.
1. The number Combinatorial. Object typically grows with a problem size.
2. There are no such algorithms to solve such problems in an acceptable amount of time.
Geometric problems:
General framework is for analyzing the efficiency of the algorithms. There are two kinds of
efficiency time efficiency and space efficiency
Time efficiency: indicates how fast the algorithm runs.
Space efficiency: indicates how much extra memory the problem needs.
Orders of Growth
When a problem's input size is small, most algorithms will exhibit similar efficiencies. For
large input size, different algorithms will exhibit significantly different behavior. Therefore,
for large values of n, the function's order of growth becomes important. Values of several
functions important for analysis of algorithms
n log2 n n n log n n2 n3 2n n!
10 3.3 10 3.3*101 102 103 103 3*106
102 6.6 102 6.6*102 104 106 1.3*1030 9*10157
103 10 103 1.0*104 106 109
104 13 104 1.3*105 108 1012
105 17 105 1.7*106 1010 1015
106 20 106 2.0*107 1012 1018
Problem: Given a list of n elements and a search key K, find an element equal to K, if any.
Algorithm: Scan the list and compare its successive elements with K until either a matching
element is found (successful search) or the list is exhausted (unsuccessful search)
Worst case: Worst (n) = n
Best case: Chest (n) = 1
Average case:
Probability of a successful search = p
p = 1 for successful search and Chest(n) = (n+1)/2
†p = 0 for unsuccessful search and Chest(n) = n
Cave (n) = p (n +1) 2+ n (1- p)
Exact formula
e.g., C(n) = n(n-1)/2
Formula indicating order of growth with specific multiplicative constant
e.g., C(n) ! 0.5n2
Formula indicating order of growth with unknown multiplicative constant
e.g., C(n) ! cn2
Big-Omega Notation
t(n) OE Q(g(n)): A function t(n) is said to be in Q(g(n)), if it is bounded both above and
below by some constant multiples of g(n) for all large n n
There exist positive constants c1 and c2 and non-negative integer n0 such that
c2g(n) $ t(n) $ c1g(n) for every n $ n0
Example: (1/2)n(n-1) is Q(n2)
Compute the limit of the ratio of two functions under consideration using the limit-based
approach is more convenient than the one based in the definition. Use calculus techniques
such as L’Hôpital’s Rule and Sterling’s formula in computing the limits.
n! factorial
2n exponential
n3 cubic
n2 quadratic
n log n n log n
n linear
log n logarithmic
1 constant
Caution: In defining asymptotic efficiency classes, the values of multiplicative constants are
usually left unspecified.
Collect and analyze the empirical data (for basic counts or timings) Present the data in a
tabular or graphical form
Compute the ratios M(n)/g(n), where g(n) is a candidate to represent the efficiency of the
algorithm in question
Compute the ratios M(2n)/M(n) to see how the running time reacts to doubling of its input
size
Examine the shape of the plot:
A concave shape for the logarithmic algorithm
A straight line for a linear algorithm
Convex shapes for quadratic and cubic algorithms
An exponential algorithm requires a logarithmic scale for the vertical axis.
Recursive algorithm: It invokes (makes reference to) itself repeatedly until a certain
condition matches
Computing factorial function
Tower of Hanoi puzzle
Digits in binary representation
Example 1: MaxElement
One multiplication is executed on each repetition of the innermost loop (k) and we
Need to find the total number of multiplications made for all pairs of i and j
The algorithm computes n2 elements of the product matrix. Each element is
Computed as the scalar (dot) product of an n-element row of A and an n-element
Column of B, which takes n multiplications.
†
Constraint: The entries of SM must be sorted in ascending order, using the row index and col
index as the primary and secondary keys, respectively.
Transpose of M is M':
M'[i].row = M[i].col
M'[i].col = M[i].row
Transpose a sparse matrix: swapping the row and col indices of all terms in
Combination with sorting of the terms.
Given a matrix M, find its transpose matrix M':
Inner loop over the number of non-zero elements (t)
For each element, exchange the value of row and col indices (i.e., create a
Transpose)
Outer loop over the col index of the original matrix (n values) using as the primary
Key (per row i in M')
For each index i, process inner loop (i.e., process all elements and pick up
Their col numbers if they match with i.
n Basic operation: Check whether a new row (M') contains non-zero element(s):
M[j].col = i
The runtime complexity of the algorithm is O(nt)
For dense matrix, t Æ n2, the complexity is of the order O(n3).
Given a matrix M, find its transpose matrix M': The process involves four isolated loops:
Initialize the number of terms per row in M'
Count the number of terms per row in M'
Scan all the non-zero elements and put them into multiple bins (rows)
Determine the starting position of each new row in M'
Scan all terms and decide the position of each bin (row)
Create the transpose matrix M'
For each term, find the correct old col number and swap the row and col
indices.
The runtime complexity of the algorithm is O(n+t)
Four loops listed above are O(n), O(t), O(n) and O(t), respectively.
So putting together and ignoring the constant factor, we have O(n+t)
For dense matrix t Æ n2, the complexity is of the order O(n2).
n ! = 1 • 2 • … • (n-1) • n
0! = 1
Recurrence relation for the number of multiplications:
Algorithm F(n)
//Compute n! recursively
//Input: A nonnegative integer n
//Output: The value of n!
if n = 0 return 1
else return F(n-1) * n
Given: n disks of different sizes and three pegs. Initially all disks are on the first peg in
order of size, the largest being on the bottom
Problem: move all the disks to the third peg, using the second one as an auxiliary.
One can move only one disk at a time, and it is forbidden to place a larger disk
On the top of a smaller one.
Problem: Investigate a recursive version of binary algorithm, which finds the number of
binary digits in the binary representation of a positive decimal integer n.
Algorithm BinRec(n)
//Input: A positive decimal integer n
//Output: The number of binary digits in n’s binary representation
if n = 1 return 1
else return BinRec
Use back forward substitution method to solve the problem only for n = 2k.
Smoothness rule implies that the observed order of growth is valid for all values of n.
A(2k) = A(2k-1) + 1
= [A(2k-2) + 1] + 1 = A(2k-2) + 2
= [A(2k-3) + 1] + 2 = A(2k-3) + 3
……
= A(2k-i) + i
……
= A(2k-k) + k
= k A(20) = A(1) = 0
= log2 n n = 2k
3.BRUTE FORCE
Brute force is a straightforward approach to solving a problem, based on the problem’s
statement and definition of concepts, that does not include any shortcuts to improve
performance, but instead relies on sheer computing power to try all possibilities until the
solution to a problem is found. A classic example is the traveling salesman problem (TSP).
Suppose a salesman needs to visit 10 cities across the country. How does one determine the
order in which cities should be visited such that the total distance traveled is minimized? The
brute force solution is simply to calculate the total distance for every possible route and then
select the shortest one. This is not particularly efficient because it is possible to eliminate
many possible routes through clever algorithms.
Although brute force programming is not particularly elegant, it does have a legitimate place
in software engineering. Since brute force methods always return the correct result -- albeit
slowly -- they are useful for testing the accuracy of faster algorithms. In addition, sometimes
a particular problem can be solved so quickly with a brute force method that it doesn't make
sense to waste time devising a more elegant solution.
A brute force attack by a cracker may systematically try all possible password combinations
until the unknown code is found and authentication is approved. It is often used in order to
decrypt a message without knowing the right key. A non-brute force method, in this case,
would be to simply find a mathematical weakness in the system used to hide the information.
Example: Imagine your Bank ATM allowed thieves to try each and every
combination of your PIN, instead of locking your account after three tries. With
10,000 possible pins, it would take a long time to punch in each and every one. And it
is more likely that they will accidentally happen upon the correct pin before trying all
10,000. A computer, meanwhile, can run through 10,000 possible keys in a matter of
minutes. If you were allowed to use 8 character pins, the difficulty would increase to
100,000,000 keys which would take a computer many hours, even days to try them
all. Using letters and special characters (#,$,&,+) increases this number significantly
In this section, we consider the application of the brute force approach to the problem of
sorting.
This type of sorting is called "Selection Sort" because it works by repeatedly element. It
works as follows: first find the smallest in the array and exchange it with the element in the
first position, then find the second smallest element and exchange it with the element in the
second position, and continue in this way until the entire array is sorted.
Because a selection sort looks at progressively smaller parts of the array each time (as it
knows to ignore the front of the array because it is already in order), a selection sort is
slightly faster than bubble sort, and can be better than a modified bubble sort.
It yields a 60% performance improvement over the bubble sort, but the insertion sort is over
twice as fast as the bubble sort and is just as easy to implement as the selection sort. In short,
there really isn't any reason to use the selection sort - use the insertion sort instead.
If you really want to use the selection sort for some reason, try to avoid sorting lists of more
than a 1000 items with it or repetitively sorting lists of more than a couple hundred items.
int index_of_min = x;
if(array[index_of_min]<array[y])
index_of_min = y;
array[x] = array[index_of_min];
array[index_of_min] = temp;
The worst case occurs if the array is already sorted in descending order. Nonetheless, the
time require by selection sort algorithm is not very sensitive to the original order of the array
to be sorted: the test "if A[j] < min x" is executed exactly the same number of times in every
case. The variation in time is only due to the number of times the "then" part (i.e., min j ← j;
min x ← A[j] of this test are executed.
The Selection sort spends most of its time trying to find the minimum element in the
"unsorted" part of the array. It clearly shows the similarity between Selection sort and Bubble
sort. Bubble sort "selects" the maximum remaining elements at each stage, but wastes some
effort imparting some order to "unsorted" part of the array. Selection sort is quadratic in both
the worst and the average case, and requires no extra memory.
For each i from 1 to n - 1, there is one exchange and n - i comparisons, so there is a total of n
-1 exchanges and (n -1) + (n -2) + . . . + 2 + 1 = n(n -1)/2 comparisons. These observations
hold no matter what the input data is. In the worst case, this could be quadratic, but in the
average case, this quantity is O(n log n). It implies that the running time of Selection sort is
quite insensitive to the input.
Bubble Sort
The bubble sort is the oldest and simplest sort in use. Unfortunately, it's also the slowest.
The bubble sort works by comparing each item in the list with the item next to it, and
swapping them if required. The algorithm repeats this process until it makes a pass all the
way through the list without swapping any items (in other words, all items are in the correct
order). This causes larger values to "bubble" to the end of the list while smaller values "sink"
towards the beginning of the list.
The bubble sort is generally considered to be the most inefficient sorting algorithm in
common usage. Under best-case conditions (the list is already sorted), the bubble sort can
approach a constant O(n) level of complexity. General-case is an abysmal O(n2).
While the insertion, selection, and shell sorts also have O(n2) complexities, they are
significantly more efficient than the bubble sort.
SourceCode
Sequential Search is a straightforward algorithm that’s search for a given item in a list of n
elements by checking successive elements of the list until either a match with the search key
is found or the list is exhausted
1. Pattern: 001011
2. Text: 10010101101001100101111010
1.Pattern: happy
A Closest pair
B Convex hull
b Strengths:
• Wide applicability
• Simplicity
• Yields reasonable algorithms for some important problems
– Searching
Exhaustive Search is just about trying every possible ways to search for a solution. It is
usually combined with pruning to reduce the number of items to search for. It is also known
as backtracking.
Method:
b Given n cities with known distances between each pair, find the shortest tour that
passes through all the cities exactly once before returning to the starting city.
b Alternatively: Find shortest Hamiltonian circuit in a weighted connected graph.
b Example:
2
2
a 2 b
5 3
8
4
8
c 7
d
b Tour Cost .
b a→b→c→d→a 2+3+7+5 = 17
b a→b→d→c→a 2+4+7+8 = 21
b a→c→b→d→a 8+3+4+5 = 20
b a→c→d→b→a 8+7+4+2 = 21
b a→d→b→c→a 5+4+3+8 = 20
b a→d→c→b→a 5+7+3+2 = 17
Given n items:
Find the most valuable subset of the items that fit into the knapsack
Example:
1 2 $20
2 5 $30
3 10 $50
4 5 $10
{1} 2 $20
{2} 5 $30
{3} 10 $50
{4} 5 $10
{1,2} 7 $50
{1,3} 12 $70
{1,4} 7 $30
{2,3} 15 $80
{2,4} 10 $40
{1,2,4} 12 $60
Final comments
a. Exhaustive search algorithms run in a realistic amount of time only on very small instances
• Euler circuits
• Shortest paths
• Minimum spanning tree
• Assignment problem
c. In some cases exhaustive search (or variation) is the only known solution
Assignment Problem
A special case of the transportation problem is the assignment problem which occurs when
each supply is 1 and each demand is 1. In this case, the integrality implies that every supplier
will be assigned one destination and every destination will have one supplier. The costs give
the charge for assigning a supplier and destination to each other.
Example. A company has three new machines of different types. There are four different
plants that could receive the machines. Each plant can only receive one machine, and each
machine can only be given to one plant. The expected profit that can be made by each plant if
assigned a machine is as follows:
Note that a balanced problem must have the same number of supplies and demands, so we
must add a dummy machine (corresponding to receiving no machine) and assign a zero cost
for assigning the dummy machine to a plant.
A typical assignment problem, presented in the classic manner, is shown in Fig. 12. Here
there are five machines to be assigned to five jobs. The numbers in the matrix indicate the
cost of doing each job with each machine. Jobs with costs of M are disallowed assignments.
The problem is to find the minimum cost matching of machines to jobs.
The network model is in following fig It is very similar to the transportation model except the
external flows are all +1 or -1. The only relevant parameter for the assignment model is arc
cost (not shown in the figure for clarity); all other parameters should be set to default values.
The assignment network also has the bipartite structure.
The solution to the assignment problem as shown in followinf fig has a total flow of 1 in
every column and row, and is the assignment that minimizes total cost.
• Breaking the problem into several sub-problems that are similar to the original
problem but smaller in size,
• Solve the sub-problem recursively (successively and independently), and then
• Combine these solutions to sub problems to create a solution to the original problem.
•
1. Divide Step
If given array A has zero or one element, return S; it is already sorted. Otherwise,
divide A into two arrays, A1 and A2, each containing about half of the elements of A.
2. Recursion Step
Recursively sort array A1 and A2.
3. Conquer Step
Combine the elements back in A by merging the sorted arrays A1 and A2 into a sorted
sequence.
We can visualize Merge-sort by means of binary tree where each node of the tree represents a
recursive call and each external nodes represent individual elements of given array A. Such a
tree is called Merge-sort tree. The heart of the Merge-sort algorithm is conquer step, which
merge two sorted sequences into a single sorted sequence.
Analysis
Let T(n) be the time taken by this algorithm to sort an array of n elements dividing A into
subarrays A1 and A2 takes linear time. It is easy to see that the Merge (A1, A2, A) also takes the
linear time. Consequently,
for simplicity
The total running time of Merge sort algorithm is O(n lg n), which is asymptotically optimal
like Heap sort, Merge sort has a guaranteed n lg n running time. Merge sort required (n)
Implementation
void mergeSort(int numbers[], int temp[], int array_size)
{
m_sort(numbers, temp, 0, array_size - 1);
void merge(int numbers[], int temp[], int left, int mid, int right)
left_end = mid - 1;
tmp_pos = left;
temp[tmp_pos] = numbers[left];
tmp_pos = tmp_pos + 1;
else
temp[tmp_pos] = numbers[mid];
tmp_pos = tmp_pos + 1;
mid = mid + 1;
temp[tmp_pos] = numbers[left];
left = left + 1;
tmp_pos = tmp_pos + 1;
temp[tmp_pos] = numbers[mid];
tmp_pos = tmp_pos + 1;
numbers[right] = temp[right];
right = right - 1;
4.2Quick Sort
Algorithm Analysis
The recursive algorithm consists of four steps (which closely resemble the merge sort):
1. If there are one or less elements in the array to be sorted, return immediately.
2. Pick an element in the array to serve as a "pivot" point. (Usually the left-most element
in the array is used.)
3. Split the array into two parts - one with elements larger than the pivot and the other
with elements smaller than the pivot.
4. Recursively repeat the algorithm for both halves of the original array.
Empirical Analysis
As soon as students figure this out, their immediate impulse is to use the quick sort for
everything - after all, faster is better, right? It's important to resist this urge - the quick sort
isn't always the best choice. As mentioned earlier, it's massively recursive (which means that
for very large sorts, you can run the system out of stack space pretty easily). It's also a
complex algorithm - a little too complex to make it practical for a one-time sort of 25 items,
for example.
With that said, in most cases the quick sort is the best choice if speed is important (and it
almost always is). Use it for repetitive sorting, sorting of medium to large lists, and as a
default choice when you're not really sure which sorting algorithm to use. Ironically, the
Quick sort has horrible efficiency when operating on lists that are mostly sorted in either
forward or reverse order - avoid it in those situations.
Sourcecode
Below is the basic quick sort algorithm.
l_hold = left;
r_hold = right;
pivot = numbers[left];
while (left < right)
{
while ((numbers[right] >= pivot) && (left < right))
right--;
if (left != right)
{
numbers[left] = numbers[right];
left++;
}
while ((numbers[left] <= pivot) && (left < right))
left++;
//output: an index of the arrays element that is equal to K or -1 if there is no such element
l <- 0 ;
r<- n-1;
While l<=r do
m <- l_ (l+r)/2_l
Return
In computer science, a binary tree is a tree data structure in which each node has at most
two children. Typically the child nodes are called left and right. One common use of binary
trees is binary search trees; another is binary heaps.
A simple binary tree of size 9 and depth 3, with a root node whose value is 2
A binary tree is a rooted tree in which every node has at most two children.
A full binary tree is a tree, conceived by Trevor Haba, in which every node has zero or two
children.
Sometimes the perfect binary tree is called the complete binary tree. Some others define a
complete binary tree to be a full binary tree in which all leaves are at depth n or n-1 for
some n. In order for a tree to be a complete binary tree, all the children on the last level must
occupy the leftmost spots consecutively, with no spot left unoccupied in between any 2. For
example, if 2 nodes on the bottomost level each occupy a spot with an empty spot between
the 2 of them, but the rest of the children nodes are tightly wedged together with no spots in
between, then the whole tree CANNOT be a binary tree due to the empty spot.
An almost complete binary tree is a tree in which for a right child, there is always a left
child, but for a left child there may not be a right child.
The mathematician Carl Friedrich Gauss (1777–1855) once noticed that although the product
of two complex numbers
Seems to involve four real-number multiplications, it can in fact be done with just three:
ac,bd, and (a + b)(c + d), since bc + ad = (a + b)(c + d) − ac − bd.
In our big-O way of thinking, reducing the number of multiplications from 4 to 3 seems
wasted ingenuity. But this modest improvement becomes very significant when applied
recursively.
Let’s move away from complex numbers and see how this helps with regular multiplication.
Suppose x and y are two n-bit integers; and assume for convenience that n is a power of
two (the more general case is hardly any different). As a first step towards multiplying x and
y, split each of them into their left and right halves, which are n/2 bits long:
We will compute xy via the expression on the right. The additions take linear time, as do the
Multiplications by powers of two (which are merely left-shifts). The significant operations
are the four n/2-bit multiplications, xLyL, xLyR, xRyL, xRyR;these we can handle by four
recursive calls. Thus our method for multiplying n-bit numbers starts by making recursive
calls to multiply these four pairs of n/2-bit numbers (four sub problems of half the size), and
then
This is where Gauss’s trick comes to mind. Although the expression for xy seems to demand
four n/2-bit multiplications, as before just three will do:
xLyL, xRyR, and (xL+xR)(yL+yR),since xLyR+xRyL = (xL+xR)(yL+yR)−xLyL−xRyR.
The resulting algorithm, shown in Figure 1.1,has an improved running time of1 T(n) =
3T(n/2) + O(n).
The point is that now the constant factor improvement, from 4 to 3, occurs at every level of
therecursion, and this compounding effect leads to a dramatically lower time bound of
O(n1.59).
This running time can be derived by looking at the algorithm’s pattern of recursive calls,
which form a tree structure, as in Figure 1.2. Let’s try to understand the shape of this tree. At
each successive level of recursion the sub problems get halved in size. At the (log2 n)th level,
the sub problems get down to size one, and so the recursion ends. Therefore, the height of the
For each sub problem, a linear amount of work is done in identifying further sub problems
And combining their answers. Therefore the total time spent at depth k in the tree is
At the very top level, when k = 0, this works out to O(n). At the bottom, when k = log2 n,
is O(3log2 n), which can be rewritten as O(nlog2 3) (do you see why?). Between these two
Endpoints, the work done increases geometrically from O(n) to O(nlog2 3), by a factor of 3/2
per level. The sum of any increasing geometric series is, within a constant factor, simply the
last term of the series: such is the rapidity of the increase (Exercise). Therefore the overall
running time is O(nlog2 3), which is about O(n1.59).
In the absence of Gauss’ trick, the recursion tree would have the same height, but the
Time Analysis
N
C i,j=∑ ai,k b k,j
k= 1
N N N
Thus T N =∑ ∑ ∑ c =cN 3 =O N 3
i=1 j=1 k= 1
Strassens’s Matrix Multiplication
A × B =R
A0
B0 A0×B0+A1×B2 A0×B1+A1×B3
A1
B1
A2 A3 A2×B0+A3×B2 A2×B1+A3×B3
B2 B3
•
AXB=R
a0 × b0 = a0 × b0
C11=P1+P4-P5+P7
C12=P3+P5
C21=P2+P4
C22 = P1 + P3 - P2 + P6
Comparison
C11=P1+P4-P5+P7
= (A11+ A22)(B11+B22) + A22 * (B21 - B11) - (A11 + A12) * B22+
= A11 B11 + A11 B22 + A22 B11 + A22 B22 + A22 B21 – A22 B11 -
A11 B22 -A12 B22 + A12 B21 + A12 B22 – A22 B21 – A22 B22
Strassen Algorithm
if (n == 1) {
} else {
Time Analysis
Insertion sort is a simple sort algorithm, a comparison sort in which the sorted array (or list)
is built one entry at a time. It is much less efficient on large lists than the more advanced
algorithms such as quicksort, heapsort, or merge sort, but it has various advantages:
• Simple to implement
• Efficient on (quite) small data sets
• Efficient on data sets which are already substantially sorted
• More efficient in practice than most other simple O(n2) algorithms such as selection
sort or bubble sort
• Stable (does not change the relative order of elements with equal keys)
• In-place (only requires a constant amount O(1) of extra memory space)
• It is an online algorithm, in that it can sort a list as it receives it.
In abstract terms, each iteration of an insertion sort removes an element from the input data,
inserting it at the correct position in the already sorted list, until no elements are left in the
input. The choice of which element to remove from the input is arbitrary and can be made
using almost any choice algorithm.
Sorting is typically done in-place. The result array after k iterations contains the first k entries
of the input array and is sorted. In each step, the first remaining entry of the input is removed,
inserted into the result at the right position, thus extending the result:
The most common variant, which operates on arrays, can be described as:
1. Suppose we have a method called insert designed to insert a value into a sorted
sequence at the beginning of an array. It operates by starting at the end of the
sequence and shifting each element one place to the right until a suitable position is
found for the new element. It has the side effect of overwriting the value stored
immediately after the sorted sequence in the array.
2. To perform insertion sort, start at the left end of the array and invoke insert to insert
each element encountered into its correct position. The ordered sequence into which
we insert it is stored at the beginning of the array in the set of indexes already
examined. Each insertion overwrites a single value, but this is okay because it's the
value we're inserting.
A simple pseudocode version of the complete algorithm follows, where the arrays are zero-
based:
Variants
D.L. Shell made substantial improvements to the algorithm, and the modified version is
called Shell sort. It compares elements separated by a distance that decreases on each pass.
Shellsort has distinctly improved running times in practical work, with two simple variants
requiring O(n3/2) and O(n4/3) time.
If comparisons are very costly compared to swaps, as is the case for example with string keys
stored by reference, then using binary insertion sort can be a good strategy. Binary insertion
sort employs binary search to find the right place to insert new elements, and therefore
performs comparisons in the worst case, which is Θ(n log n). The algorithm as a whole still
takes Θ(n2) time on average due to the series of swaps required for each insertion, and since it
always uses binary search, the best case is no longer O(n) but O(n log n).
To avoid having to make a series of swaps for each insertion, we could instead store the input
in a linked list, which allows us to insert and delete elements in constant time. Unfortunately,
binary search on a linked list is impossible, so we still spend Ω(n 2) time searching. If we
instead replace it by a more sophisticated data structure such as a heap or binary tree, we can
significantly decrease both search and insert time. This is the essence of heap sort and binary
tree sort.
In 2004, Bender, Farach-Colton, and Mosteiro published a new variant of insertion sort
called library sort or gapped insertion sort that leaves a small number of unused spaces
("gaps") spread throughout the array. The benefit is that insertions need only shift elements
over until a gap is reached. Surprising in its simplicity, they show that this sorting algorithm
runs with high probability in O(n log n) time
Insertion sort is very similar to bubble sort. In bubble sort, after k passes through the array,
the k largest elements have bubbled to the top. (Or the k smallest elements have bubbled to
the bottom, depending on which way you do it.) In insertion sort, after k passes through the
array, you have a run of k sorted elements at the bottom of the array. Each pass inserts
another element into the sorted run. So with bubble sort, each pass takes less time than the
previous one, but with insertion sort, each pass may take more time than the previous one.
Like depth first search, BFS traverse a connected component of a given graph and defines a
spanning tree.
BFS starts at a given vertex, which is at level 0. In the first stage, we visit all vertices at level
1. In the second stage, we visit all vertices at second level. These new vertices, which are
adjacent to level 1 vertices, and so on. The BFS traversal terminates when every vertex has
been visited.
Create a Queue Q.
ENQUEUE (Q, S) // Insert S
into Q.
While Q is not empty do
for each vertex v in Q do
for all edges e incident on v
do
if edge e is unexplored
then
let w be the other
endpoint of e.
if vertex w is
unexpected then
- mark e as a
discovery edge
- insert w into Q
else
mark e as a cross
edge
BFS label each vertex by the length of a shortest path (in terms of number of edges) from the
start vertex.
Example (CLR)
Step1
Step 3
Step 4
Step 5
Step 6
Step 8
Step 9
Dashed edge = error edge (since none of them connects a vertex to one of its ancestors).
As with the depth first search (DFS), the discovery edges form a spanning tree, which in this
case we call the BSF-tree.
Computing, for every vertex in graph, a path with the minimum number of edges between
start vertex and current vertex or reporting that no such path exists.
Analysis
Algorithm starts at a specific vertex S in G, which becomes current vertex. Then algorithm
traverse graph by any edge (u, v) incident to the current vertex u. If the edge (u, v) leads to an
already visited vertex v, then we backtrack to current vertex u. If, on other hand, edge (u, v)
leads to an unvisited vertex v, then we go to v and v becomes our current vertex. We proceed
in this manner until we reach to "deadened". At this point we start back tracking. The process
terminates when backtracking leads back to the start vertex.
Edges leads to new vertex are called discovery or tree edges and edges lead to already visit
are called back edges.
Output: Edges labeled as discovery and back edges in the connected component.
else
- mark e as a back edge
Example (CLR)
Each vertex has two time stamps: the first time stamp records when vertex is first discovered
and second time stamp records when the search finishes examining adjacency list of vertex.
Computing a cycle in graph or equivalently reporting that no such cycle exists. Testing
whether graph is connected.
Computing a path between two vertices of graph or equivalently reporting that no such path
exists.
Analysis
In simple words, a topological ordering is an ordering such that any directed path in DAG G
traverses vertices in increasing order.
It is important to note that if the graph is not acyclic, then no linear ordering is possible. That
is, we must not have circularities in the directed graph. For example, in order to get a job you
need to have work experience, but in order to get work experience you need to have a job
(sounds familiar?).
Proof:
Let is G acyclic.
Since is G acyclic, must have a vertex with no incoming edges. Let v1 be such a vertex. If
we remove v1 from graph, together with its outgoing edges, the resulting digraph is still
acyclic. Hence resulting digraph also has a vertex *
TOPOLOGICAL_SORT(G)
Example
Diagram
With no incoming edges, and we let v2 be such a vertex. By repeating this process until
digraph G becomes empty, we obtain an ordering v1 < v2 < , . . . , vn of vertices of digraph G.
Because of the construction, if (vi, vj) is an edge of digraph G, then vi must be detected before
vj can be deleted, and thus i < j. Thus, v1, . . . , vn is a topological sorting.
Generating permutations
There are n! permutations of n items, which grows so quickly that you can't expect to
generate all permutations for n > 11, since 11! = 39,916,800. Numbers like these should
cool the ardor of anyone interested in exhaustive search and help explain the importance of
generating random permutations.
There are two different paradigms for constructing permutations: ranking/unranking and
incremental change methods. Although the latter are more efficient, ranking and unranking
can be applied to solve a much wider class of problems, including the other combinatorial
generation problems in this book. The key is to define functions rank and unrank on all
permutations p and integers n, m, where |p| = n and .
• Rank(p) - What is the position of p in the given generation order? A typical ranking
function is recursive,
• Unrank(m,n) - Which permutation is in position m of the n! permutations of n items?
A typical unranking function finds the number of times (n-1)! goes into m and
proceeds recursively. Unrank(2,3) tells us that the first element of the permutation
must be `2' m leaves the smaller problem Unrank(0,2). The ranking of 0 corresponds
to the total order, and the total order on the two remaining elements (since `2' has
been used .
What the actual rank and unrank functions are does not matter as much as the fact that they
must be inverses. In other words, p = Unrank(Rank(p), n) for all permutations p. Once you
define ranking and unranking functions for permutations, you can solve a host of related
problems:
The rank/unrank method is best suited for small values of n, since n! quickly exceeds the
capacity of machine integers, unless arbitrary-precision arithmetic is available (see Section ).
The incremental change methods work by defining the next and previous operations to
transform one permutation into another, typically by swapping two elements. The tricky part
is to schedule the swaps so that permutations do not repeat until all of them have been
generated. See the output picture above for an ordering of the six permutations of with a
single swap between successive permutations.
Incremental change algorithms for sequencing permutations are tricky, but they are concise
enough that they can be expressed in a dozen lines of code. See the implementation section
for pointers to code. Because the incremental change is a single swap, these algorithms can
be extremely fast - on average, constant time - which is independent of the size of the
permutation! The secret is to represent the permutation using an n-element array to facilitate
the swap. In certain applications, only the change between permutations is important. For
example, in a brute-force program to search for the optimal tour, the cost of the tour
associated with the new permutation will be that of the previous permutation, with the
addition and deletion of four edges.
Throughout this discussion, we have assumed that the items we are permuting are all
distinguishable. However, if there are duplicates (meaning our set is a multiset), you can
save considerable time and effort by avoiding identical permutations. For example, there are
only ten permutationsTo avoid duplicates use backtracking and generate the permutations in
lexicographic order.
Generating random permutations is an important little problem that people stumble across
often, and often botch up. The right way is the following two-line, linear-time algorithm. We
assume that Random[i,n] generates a random integer between i and n, inclusive.
Such subtleties demonstrate why you must be very careful with random generation
algorithms. Indeed, we recommend that you try some reasonably extensive experiments with
any random generator before really believing it. For example, generate 10,000 random
permutations of length 4 and see whether all 24 of them occur approximately the same
number of times. If you understand how to measure statistical significance, you are in even
better shape.
Generating Subsets
Problem description: Generate (1) all or (2) a random or (3) the next subset of the integers .
Discussion: A subset describes a selection of objects, where the order among them does not
matter. Many of the algorithmic problems in this catalog seek the best subset of a group of
things: vertex cover seeks the smallest subset of vertices to touch each edge in a graph;
knapsack seeks the most profitable subset of items of bounded total size; and set packing
seeks the smallest subset of subsets that together cover each item exactly once.
There are distinct subsets of an n-element set, including the empty set as well as the set itself.
This grows exponentially, but at a considerably smaller rate than the n! permutations of n
items. Indeed, since 1,048,576, a brute-force search through all subsets of 20 elements is
easily manageable, although by n=30, 1,073,741,824, so you will certainly be pushing
things.
By definition, the relative order among the elements does not distinguish different subsets.
Thus is the same as . However, it is a very good idea to maintain your subsets in a sorted or
canonical order, in order to speed up such operations as testing whether two subsets are
identical or making them look right when printed.
• Lexicographic order - Lexicographic order is sorted order, and often the most
natural order for generating combinatorial objects. The eight subsets of in
lexicographic order are , , , , , , , and . Unfortunately, it is surprisingly difficult to
Subset generation in Gray code order can be very fast, because there is a nice
recursive construction to sequence subsets. Further, since only one element changes
between subsets, exhaustive search algorithms built on Gray codes can be quite
efficient. A set cover program would only have to update the change in coverage by
the addition or deletion of one subset. See the implementation section below for Gray
code subset generation programs.
This binary representation is the key to solving all subset generation problems. To
generate all subsets in order, simply count from 0 to . For each integer, successively
mask off each of the bits and compose a subset of exactly the items corresponding to
`1' bits. To generate the next or previous subset, increment or decrement the integer
by one. Unranking a subset is exactly the masking procedure, while ranking
constructs a binary number with 1's corresponding to items in S and then converts this
binary number to an integer.
To generate a random subset, you could generate a random integer from 0 to and
unrank, although you are probably asking for trouble because any flakiness with how
your random number generator rounds things off means that certain subsets can never
occur. Therefore, a better approach is simply to flip a coin n times, with the ith flip
deciding whether to include element i in the subset. A coin flip can be robustly
simulated by generating a random real or large integer and testing whether it is bigger
or smaller than half the range. A Boolean array of n items can thus be used to
represent subsets as a sort of premasked integer. The only complication is that you
must explicitly handle the carry if you seek to generate all subsets.
Generation problems for two closely related problems arise often in practice:
Our goal is to presort the mutual information values for all that do not co-occur
with . The following theorem shows that this can be done exactly as before.
Theorem Let be discrete variables such that do not co-occur with (i.e.
with equality only if is identically 0. The proof of the theorem is given in the Appendix.
The implication of this theorem is that the ACCL algorithm can be extended to variables
taking more than two values by making only one (minor) modification: the replacement of
tables .
Running time for the ACCL (full line) and (dotted line) CHOWLIU algorithms versus number
of vertices for different values of the sparseness .
Number of steps of the Kruskal algorithm versus domain size measured for the ACCL
algorithm for different values of .
The method consists of combining the coefficient matrix A with the right hand side b to the
``augmented'' (n, n + 1) matrix
A sequence of elementary row operations is then applied to this matrix so as to transform the
coefficient part to upper triangular form:
Assume we have transformed the first column, and we want to continue the elimination with
the following matrix
To zero we want to divide the second row by the ``pivot'' , multiply it with and
subtract it from the third row. If the pivot is zero we have to swap two rows. This procedure
frequently breaks down, not only for ill-conditioned matrices. Therefore, most programs
perform ``partial pivoting'', i.e. they swap with the row that has the maximum absolute value
of that column.
``Complete pivoting'', always putting the absolute biggest element of the whole matrix into
the right position, implying reordering of rows and columns, is normally not necessary.
Therefore, back substitution is not necessary and the values of the unknowns can be
computed directly. Not surprisingly, Gauss-Jordan elimination is slower than Gaussian
elimination.
Other applications
Suppose A is a square n × n matrix and you need to calculate its inverse. The n × n identity
matrix is augmented to the right of A, which produces an n × 2n matrix. Then you perform
the Gaussian elimination algorithm on that matrix. When the algorithm finishes, the identity
matrix will appear on the left; the inverse of A can then be found to the right of the identity
matrix.
In practice, inverting a matrix is rarely required. Most of the time, one is really after the
solution of a particular system of linear equations.
The Gaussian elimination algorithm can be applied to any m × n matrix A. If we get "stuck"
in a given column, we move to the next column. In this way, for example, any 6 by 9 matrix
can be transformed to a matrix that has a reduced row echelon form like
(the *'s are arbitrary entries). This echelon matrix T contains a wealth of information about A:
the rank of A is 5 since there are 5 non-zero rows in T; the vector space spanned by the
columns of A has as basis the first, third, forth, seventh and ninth column of A (the columns
of the ones in T), and the *'s tell you how the other columns of A can be written as linear
combinations of the basis columns.
The Gaussian elimination can be performed over any field. The three elementary operations
used in the Gaussian elimination (multiplying rows, switching rows, and adding multiples of
rows to other rows) amount to multiplying the original matrix A with invertible m × m
matrices from the left. In general, we can say:
To every m × n matrix A over the field K there exists a uniquely determined invertible
m × m matrix S and a uniquely determined reduced row-echelon matrix T such that A
= ST.
The formal algorithm to compute T from A follows. We write A[i,j] for the entry in row i,
column j in matrix A. The transformation is performed "in place", meaning that the original
matrix A is lost and successively replaced by T.
i=1
j=1
while (i ≤ m and j ≤ n) do
# Find pivot in column j, starting in row i:
max_val = A[i,j]
max_ind = i
for k=i+1 to m do
val = A[k,j]
if abs(val) > abs(max_val) then
max_val = val
max_ind = k
end_if
end_for
if max_val ≠ 0 then
switch rows i and max_ind
divide row i by max_val
for u = 1 to m do
if u ≠ i then
subtract A[u,j] * row i from row u
end_if
end_for
i=i+1
end_if
j=j+1
end_while
This algorithm differs slightly from the one discussed earlier, because before eliminating a
variable, it first exchanges rows to move the entry with the largest absolute value to the
"pivot position". Such a pivoting procedure improves the numerical stability of the
algorithm; some variants are also in use.
Note that if the field is the real or complex numbers and floating point arithmetic is in use,
the comparison max_val ≠ 0 should be replaced by abs(max_val) > epsilon for some
small, machine-dependent constant epsilon, since it is rarely correct to compare floating
point numbers to zero.
Binary search trees (see Figure 1) work well for many applications but they are limiting
because of their bad worst-case performance (height = O(n)). A binary search tree with this
worst-case structure is no more efficient than a regular linked list.
There exist over one hundred types of balanced search trees. Some of the more common
types are: AVL trees, B-trees, and red-black trees. B-trees are a category of tree designed to
red-black tree:
AVL tree:
2-3 tree:
AVL tree
AVL trees were invented by two computer chess specialists in Russia, Adelson-Velskii and
Landis in 1962 (hence the acronym AVL). See next section for detailed information
regarding AVL trees.
B-tree
B-trees of order 4 are known as 2-3-4 trees, and a B-tree of order 3 is known as a 2-3 tree.
The accompanying Java applet demonstrates insert operations on a 2-3 tree. In fact, 2-3 trees
were "invented" by J.E. Hopcroft in 1970 (Cormen et al., 1996, p:399), prior to B-trees in
1972 by Bayer and McCreight.
Red-black trees
Red-black trees are standard binary trees that satisfy the following conditions:
AVL trees are identical to standard binary search trees except that for every node in an AVL
tree, the height of the left and right sub trees can differ by at most 1 (Weiss, 1993, p:108).
AVL trees are HB-k trees (height balanced trees of order k) of order HB-1. The following is
the height differential formula:
When storing an AVL tree, a field must be added to each node with one of three values: 1, 0,
or -1. A value of 1 in this field means that the left subtree has a height one more than the
right subtree. A value of -1 denotes the opposite. A value of 0 indicates that the heights of
both subtrees are the same.
Updates of AVL trees require up to rotations, whereas updating red-black trees can
be done using only one or two rotations (up to color changes). For this reason, they
(AVL trees) are considered a bit obsolete by some.
Sparse AVL trees are defined as AVL trees of height h with the fewest possible nodes.
Figure 3 shows sparse AVL trees of heights 0, 1, 2, and 3.
By inspecting the Fibonacci sequence, one suspects that it grows somehow "exponentially"
i.e. that it is of the form: , for some suitable constants A and c that we have to find.
The theory for linear homogeneous recurrence relations (cf: Rosen, K. "Discrete
Mathematics and its Applications", third edition.) says that the general solution of such a
recurrence relation is given by:
We still have to find the two unknowns and . Using the initial conditions in the
recurrence relation, we know that and , so the constants are extracted from:
In the worst case, a 2-3 tree is a complete binary tree. Complete binary tree of height h has:
1 + 2 + 4 + ... + 2^h
= (2^(h+1) - 1)/(2 - 1)
= 2^(h+1) - 1 nodes.
Therefore, the height of such a tree with n nodes can be computed as:
n = 2^(h+1) - 1,
2^(h+1) = n + 1,
h+1= (n+1),
h= (n+1) - 1 ~= . (n >>)
The best case of a 2-3 tree is a complete ternary tree. A complete ternary tree of height h has:
1 + 3 + 9 + ... + 3^h
Thus, the height of such a tree with n node can be calculated in the same way.
n = (3^(h+1) - 1)/2,
3^(h+1) = 2n + 1,
h= (2n+1) - 1 ~= . (n>>)
Therefore, the height of a 2-3 tree is sandwiched between and , thus, always
beats a complete binary tree, which has a height ~= .
Although searching operations on a 2-3 balanced search tree are guaranteed a very good
worst case, this property comes at the expense of sometimes-costly insertions. Figure 6
illustrates such a situation and the Java applet below performs the insertion.
Sorting Revisited
One of the typical tasks in a Data Structures course is to compare the running times of
various sort algorithms. This can be done using analysis of algorithms techniques. For
example, we already know that bubble sort is Theta(n^2), but quick sort is Theta(n * log(n)),
both in the average case.
Another way to compare sort routines is to actually run several of them on the same set of
test data. Rather than have to write all of the code for this ourselves, an easy way to do this is
to visit Jason Harrison's sorting animation Web site. Once at this site, click on the picture for
each sort routine that you wish to run. This will download a Java routine that will perform the
sort and display the sorting progress visually. You will, of course, need to use a browser that
is Java-enabled. Which sort performs best? Do you see an obvious speed difference between
some of the sorts?
Definitions
Recall that in a binary tree each node can have a left child node and/or a right child node. A
leaf is a node with no children.
An almost complete binary tree is a binary tree in which the following 3 conditions hold: all
the leaves are at the bottom level or the bottom 2 levels, all the leaves are in the leftmost
possible positions, and (except possibly for the bottom level) all levels are completely filled
with nodes.
Here are some examples of almost complete binary trees. Note that there is no particular
ordering of the letters.
/ \
G A
T W G
/ \
L C
/ \ / \
B T Q Z
/ \
D P
The following example is a complete binary tree. (Of course, it is also fits the definition of
almost complete. Any complete binary tree is automatically almost complete.)
/ \
R T
/ \ / \
V E S N
/ \
T P
/ \ \
/ \
E L
/ \ /
/ \
S C
/ \
N D
/ \
/ \
/ \
Y O
/ \
B Z
What Is a Heap?
Definition: A minimal heap (descending heap) is an almost complete binary tree in which
the value at each parent node is less than or equal to the values in its child nodes.
Obviously, the minimum value is in the root node. Note, too, that any path from a leaf to the
root passes through the data in descending order.
H K
/ \ /
L I M
The typical storage method for a heap, or any almost complete binary tree, works as follows.
Begin by numbering the nodes level by level from the top down, left to right. For example,
consider the following heap. The numbering has been added below the nodes.
C
0
/ \
H K
1 2
/ \ /
L I M
3 4 5
The advantage of this method over using the usual pointers and nodes is that there is no
wasting of space due to storing two pointer fields in each node. Instead, starting with the
current index, CI, one calculates the index to use as follows:
Parent(CI) = (CI - 1) / 2
RightChild(CI) = 2 * (CI + 1)
LeftChild(CI) = 2 * CI + 1
For example, if we start at node H (with index 1), the right child is at index 2 * (1 + 1) = 4,
that is, node I.
This is done by temporarily placing the new item at the end of the heap (array) and then
calling a FilterUp routine to make any needed adjustments on the path from this leaf to the
root. For example, let's insert E into the following heap:
/ \
H K
/ \ /
L I M
/ \
H K
/ \ / \
L I M E
Of course, the new tree might not be a heap. The FilterUp routine now checks the parent, K,
and sees that things would be out of order as they are. So K is moved down to where E was.
Then the parent above that, C, is checked. It is in order relative to the target item E, so the C
is not moved down. The hole left behind is filled with E, then, as this is the correct position
for it.
/ \
H E
/ \ / \
L I M K
For practice, let's take the above heap and insert another item, D. First, place D temporarily
in the next available position:
/ \
H E
/ \ / \
L I M K
Then the FilterUp routine checks the parent, L, and discovers that L must be moved down.
Then the parent above that, H, is checked. It too must be moved down. Finally C is checked,
but it is OK where it is. The hole left where the H had been is where the target D is then
inserted.
/ \
D E
/ \ / \
H I M K
We always remove the item from the root. That way we always get the smallest item. The
problem is then how to adjust the binary tree so that we again have a heap (with one less
item).
The algorithm works like this: First, remove the root item and replace it temporarily with the
item in the last position. Call this replacement the target. A FilterDown routine is then used
to check the path from the root to a leaf for the correct position for this target. The path is
chosen by always selecting the smaller child at each node. For example, let's remove the C
from this heap:
/ \
D E
/ \ / \
H I M K
First we remove the C and replace it with the last item (the target), L:
/ \
D E
/ \ / \
H I M K
The smaller child of L is D. Since D is out of order compared to the target L, we move D up.
The smaller child under where D had been is H. When H is compared to L we see that the H
too needs to be moved up. Since we are now at a leaf, this empty leaf is where the target L is
put.
/ \
H E
/ \ / \
L I M K
For another example, let's remove the E from the following heap:
/ \
G K
/ \ / \
J N K X
/ \ /
X Y P
First remove the E and replace it with the target P (the last item):
/ \
G K
/ \ / \
J N K X
/ \
X Y
Now use the Filter Down routine to filter the P down to its correct position by checking the
smaller child, G, which should be moved up, and then the smaller child below that, J, which
should also be moved up. Finally, the smaller child, X, under where J had been is checked,
but it does not get moved since it is OK relative to the target P. The P is then placed in the
empty node above X. We then have the following heap:
/ \
J K
/ \ / \
P N K X
X Y
Heap sort
Somehow creating a heap and then removing the data items one at a time perform heap sort.
The heap could start as an empty heap, with items inserted one by one. However, there is a
relatively easy routine to convert an array of items into a heap, so that method is often used.
This routine is described below. Once the array is converted into a heap, we remove the root
item (the smallest), readjust the remaining items into a heap, and place the removed item at
the end of the heap (array). Then we remove the new item in the root (the second smallest),
readjust the heap, and place the removed item in the next to the last position, etc.
Heap sort is Theta (n * log (n)), either average case or worst case. This is great for a sorting
algorithm! No appreciable extra storage space is needed either. On average, quick sort
(which is also Theta (n * log (n)) for the average case) is faster than heap sort. However,
quick sort has that bad Theta (n^2) worst case running time.
To convert this to a heap, first go to the index of the last parent node. This is given by (Heap
Size - 2) / 2. In this case, (9 - 2) / 2 = 3. Thus K is the last parent in the tree. We then apply
the Filter Down routine to each node from this index down to index 0. (Note that this is
each node from 3 down to 0, not just the nodes along the path from index 3 to index 0.)
In our example, the array corresponds directly to the following binary tree. Note that this is
not yet a heap.
/ \
S C
/ \ / \
K M L A
X E
Applying Filter Down at K gives the following. (Note that E is the smaller child under K.)
/ \
S C
/ \ / \
E M L A
/ \
X K
Now apply Filter Down at index 2, that is, at node C. (Under C, A is the smaller child.)
/ \
S A
/ \ / \
E M L C
/ \
X K
Next, apply Filter Down at index 1, that is, at node S. Check the smaller child, E, and then
the smaller child under that, namely K. Both E and K get moved up.
/ \
E A
/ \ / \
K M L C
/ \
Finally, apply Filter Down at index 0, that is, at the root node. We check the smaller child,
A, and then the smaller child, C, relative to the target P. Both A and C get moved up.
/ \
E C
/ \ / \
K M L P
/ \
X S
Now we have a heap! The first main step of heap sort has been completed. The other main
component of heap sort was described earlier: to repeatedly remove the root item, adjust the
heap, and put the removed item in the empty slot toward the end of the array (heap).
First we remove the A, adjust the heap by using Filter Down at the root node, and place the
A at the end of the heap (where it is not really part of the heap at all and so is not drawn
below as connected to the tree).
C (the target is S)
/ \
E L
/ \ / \
K M S P
/ .
X A
Of course, all of this is really taking place in the array that holds the heap. At this point it
looks like the following. Note that the heap is stored from index 0 to index 7. The A is after
the end of the heap.
E (the target is X)
/ \
K L
/ \ / \
X M S P
. .
C A
Next we remove the E, adjust the heap by using Filter Down at the root node, and place the
E at the end of the heap:
K (the target is P)
/ \
M L
/ \ / .
X P S E
. .
C A
Next we remove the K, adjust the heap by using Filter Down at the root node, and place the
K at the end of the heap:
L (the target is S)
/ \
M S
X P K E
. .
C A
Next we remove the L, adjust the heap by using Filter Down at the root node, and place the
L at the end of the heap:
M (the target is P)
/ \
P S
/ . . .
X L K E
. .
C A
Next we remove the M, adjust the heap by using Filter Down at the root node, and place the
M at the end of the heap:
P (the target is X)
/ \
X S
. . . .
M L K E
. .
C A
Next we remove the P, adjust the heap by using Filter Down at the root node, and place the
P at the end of the heap:
S (the target is S)
/ .
. . . .
M L K E
. .
C A
Next we remove the S, adjust the heap (now a trivial operation), and place the S at the end of
the heap:
X (the target is X)
. .
S P
. . . .
M L K E
. .
C A
Since only the item X remains in the heap, and since we have removed the smallest item,
then the second smallest, etc., the X must be the largest item and should be left where it is. If
you now look at the array that holds the above items you will see that we have sorted the
array in descending order:
Linear programming
In mathematics, linear programming (LP) problems are optimization problems in which the
objective function and the constraints are all linear.
Linear programming is an important field of optimization for several reasons. Many practical
problems in operations research can be expressed as linear programming problems. Certain
special cases of linear programming, such as network flow problems and multicommodity
flow problems are considered important enough to have generated much research on
specialized algorithms for their solution. A number of algorithms for other types of
optimization problems work by solving LP problems as sub-problems. Historically, ideas
from linear programming have inspired many of the central concepts of optimization theory,
such as duality, decomposition, and the importance of convexity and its generalizations.
Standard form
Standard form is the usual and most intuitive form of describing a linear programming
problem. It consists of the following three parts:
e.g.
• Non-negative variables
e.g.
maximize
subject to
Other forms, such as minimization problems, problems with constraints on alternative forms,
as well as problems involving negative variables can always be rewritten into an equivalent
problem in standard form.
Example
Suppose that a farmer has a piece of farm land, say A square kilometres large, to be planted
with either wheat or barley or some combination of the two. The farmer has a limited
permissible amount F of fertilizer and P of insecticide which can be used, each of which is
required in different amounts per unit area for wheat (F1, P1) and barley (F2, P2). Let S1 be the
selling price of wheat, and S2 the price of barley. If we denote the area planted with wheat
and barley with x1 and x2 respectively, then the optimal number of square kilometres to plant
with wheat vs barley can be expressed as a linear programming problem:
maximize S1x1 + S2x2 (maximize the revenue - this is the "objective function")
subject to (limit on total area)
(limit on fertilizer)
(limit on insecticide)
(cannot plant a negative area)
maximize
subject to
[edit]
Linear programming problems must be converted into augmented form before being solved
by the simplex algorithm. This form introduces non-negative slack variables to replace non-
equalities with equalities in the constraints. The problem can then be written on the following
form:
Maximize Z in:
where xs are the newly introduced slack variables, and Z is the variable to be maximized.
Example
The example above becomes as follows when converted into augmented form:
Maximize Z in:
Every linear programming problem, referred to as a primal problem, can be converted into an
equivalent dual problem. In matrix form, we can express the primal problem as:
maximize
subject to
minimize
subject to
Theory
Geometrically, the linear constraints define a convex polyhedron, which is called the feasible
region. Since the objective function is also linear, all local optima are automatically global
optima. The linear objective function also implies that an optimal solution can only occur at a
boundary point of the feasible region.
There are two situations in which no optimal solution can be found. First, if the constraints
contradict each other (for instance, x ≥ 2 and x ≤ 1) then the feasible region is empty and
there can be no optimal solution, since there are no solutions at all. In this case, the LP is said
to be infeasible.
Alternatively, the polyhedron can be unbounded in the direction of the objective function (for
example: maximize x1 + 3 x2 subject to x1 ≥ 0, x2 ≥ 0, x1 + x2 ≥ 10), in which case there is no
optimal solution since solutions with arbitrarily high values of the objective function can be
constructed.
Barring these two pathological conditions (which are often ruled out by resource constraints
integral to the problem being represented, as above), the optimum is always attained at a
vertex of the polyhedron. However, the optimum is not necessarily unique: it is possible to
have a set of optimal solutions covering an edge or face of the polyhedron, or even the entire
polyhedron (This last situation would occur if the objective function were constant).
Algorithms
Basics
Trading space for time is much more prevalent than trading time for the
Space.
Two variations:
Input enhancement: Preprocesses the problem’s input, and store the additional
Information obtained in order to accelerate solving the problem afterward.
Sorting by counting
String matching
Counting sort is a linear time sorting algorithm used to sort items when they belong to a fixed
and finite set. Integers, which lie in a fixed interval, say k1 to k2, are examples of such items.
The algorithm proceeds by defining an ordering relation between the items from which the
set to be sorted is derived (for a set of integers, this relation is trivial).Let the set to be sorted
The algorithm makes two passes over A and one pass over B. If size of the range k is smaller
than size of input n, then time complexity=O(n). Also, note that it is a stable algorithm,
meaning that ties are resolved by reporting those elements first which occur first.
This visual demonstration takes 8 randomly generated single digit numbers as input and sorts
them. The range of the inputs are from 0 to 9.
Horspool’s Algorithm
ShiftTable
Precompiled shift sizes and store them in a table. For every character c,
The shift’s value is determined by
t(c) = the pattern’s length, m
If c is not among the first m - 1 characters of the pattern
t(c) = the distance from the rightmost c among the first m - 1 characters of
The pattern to its last character
Otherwise
For the pattern BARBER, all table entries will be equal to 6, except for the
Entries for E, B, R and A, which will be 1, 2, 3 and 4, respectively.
Horspool Matching
Boyer-Moore Algorithm
Boyer-Moore and Horspool algorithms act differently after some positive number k (0 < k <
Algorithmic Steps
d1 if k = 0
d=
max {d1, d2} if k > 0
7.3 Hashing
The hash table is a well-known data structure used to maintain dynamic dictionaries. A
dynamic dictionary is defined as a collection of data items that can be accessed according to
the following operations:
Dynamic dictionaries are ubiquitous in computing applications; they are widely used in
databases, operating systems, compilers, and a range of business and scientific applications.
The hash table data structure consists of an array T whose N slots are used to store the
collection of data items. When implementing the above operations, an index is computed
from the key value using an ordinary hash function h, which performs the mapping
where U denotes the set of all possible key values (i.e., the universe of keys). Thus, h(k)
denotes the index, or hash value, computed by h when it is supplied with key .
Furthermore, one says that k hashes to slot T[h(k)] in hash table T.
Since |U| is generally much larger than N, h is unlikely to be a one-to-one mapping. In other
words, it is very probable that for two keys and , where , . This
situation, where two different keys hash to the same slot, is referred to as a collision. Since
two items cannot be stored at the same slot in a hash table, the Insert operation must resolve
collisions by relocating an item in such a way that it can be found by subsequent Search and
Delete operations.
Much of the existing research on hash table implementations of dynamic dictionaries is based
on statistical analyses, typically focusing on average-case performance and uniformly
distributed data. The work presented here is distinguished from much of the previous
research on hashing in that we treat open address hash functions as iterators; which allows us
to employ tools from the field of nonlinear dynamical systems.
The remainder of this paper is organized as follows. In Section 2, a basic theoretical analysis
is given for two of the most popular open address hashing algorithms, linear probing and
double hashing. In the next section we introduce the Lyapunov exponent, a method used to
detect chaos. We then discuss the meaning of the Lyapunov exponent in the integer domain,
and its importance in analyzing probing behavior. In particular, after pointing out the
relationship between good hash functions and chaotic iterators, we develop a technique for
measuring chaos in hash functions. In this section we also consider the evaluation of hash
functions using the concept of entropy. The analysis of hashing from the dynamical systems
perspective motivated the development of a new hash function called exponential hashing,
which we present in Section 4, along with theoretical and empirical comparative analyses
with double hashing. Our experimentation, presented in Section 5, shows that exponential
hashing performs nearly as well as double hashing for uniformly distributed data, and
performs substantially better than double hashing on nonuniformaly distributed data.
Open hashing
The basic idea of a hash table is that each item that might be stored in the table has a natural
Location in the table. This natural location can be computed as a function of the item itself.
When searching for an item, it is not necessary to go through all the possible locations in the
table; instead, you can go directly to the natural location of the item. The function that
computes the natural location of an item in a hash table is called a hash function. A hash
table is in fact an array, so a location in a hash table is specie by an integer index. When a
hash table is used to implement sets, the hash function takes a possible element of the set and
returns an integer value that species the index in the array where that element should be
stored. When a hash table is 1.In many implementations, it is not an error to call get(k) when
k has no associated value. Instead, a special value (such as NULL), is returned to represent
\no value."
OPEN HASH TABLES used to implement maps, the hash function is applied to a key and it
returns the expected position of the pair with that key. The integer returned by a hash
function is referred to as the hash code of the function's input value. That is, the hash code of
a value is the natural location for that value in the hash table.
In some situations, we might want to avoid the use of dynamic memory, or we might want to
avoid using the extra memory that is required for storing the linked list pointers. In such
cases, we can use a closed hash table. In closed hashing, all the items are stored in the array
itself rather than in linked lists attached to the array. Since each location of the array can
accommodate only one item, we need a new strategy for handling collisions.
The simplest strategy is called linear probing. Suppose that we want to store an item in
The table. Let h be the hash code for the item. Suppose that location h in the array is already
Occupied. With linear probing, we simply look at following locations h + 1, h + 2, and so on,
until an empty location and we. (If we reach the end of the array without finding an empty
spot, we loop back to the beginning of the array.) When we find an empty spot, we store the
item there.
When we search for the item, we just follow the same strategy, looking first in the location
given by its hash code and then in following locations. Either we find the item, or we come to
an empty spot, which tells us that the item is not in the table.4
A problem with linear probing is that it leads to excessive clustering. When an item is
bumped from location x, it goes into location x + 1. Now, any item that is bumped from
either location x or location x + 1 is bumped into location x + 2. Then, any item that belongs
in any of these three locations is bumped into location x + 3. All these items in consecutive
locations form a cluster.
The larger a cluster grows, the more likely it is that more items will join the cluster. And the
more items there are in the cluster, the less efficient searching for those items will be. Note
that not all items in the cluster have the same hash code when an item with any hash code
hits the cluster, it will join the cluster.
There are other probing strategies that are less prone to clustering. In quadratic probing,
the sequence of locations that we probe for hash code h is h, h+12, h+22, h+32, h+42, . . .
(with wrap-around if we get to the end of the array). The point is not that these locations are
not physically next to each other all items with hash code h will follow the same probe
sequence and will form a sort of physically spread-out cluster. The point is that when another
item with a different hash code hits this cluster, it will not join the cluster. For example, if
8. Dynamic programming
Decision solutions that cannot possibly lead to optimal solution. In many practical situations,
this strategy hits the optimal solution in a polynomial number of decision steps. However, in
the worst case, such a strategy may end up performing full enumeration.
Dynamic programming takes advantage of the duplication and arranges to solve each sub
problem only once, saving the solution (in table or something) for later use. The underlying
idea of dynamic programming is: avoid calculating the same stuff twice, usually by keeping a
table of known results of sub problems. Unlike divide-and-conquer, which solves the sub
problems top-down, a dynamic programming is a bottom-up technique.
Bottom-up means
The dynamic programming relies on a principle of optimality. This principle states that in an
optimal sequence of decisions or choices, each subsequence must also be optimal. For
example, in matrix chain multiplication problem, not only the value we are interested in is
optimal but all the other entries in the table are also represent optimal.
The principle can be related as follows: the optimal solution to a problem is a combination of
optimal solutions to some of its sub problems.
The difficulty in turning the principle of optimally into an algorithm is that it is not usually
obvious which sub problems are relevant to the problem under consideration.
are used to denote a binomial coefficient, and are sometimes read as " choose ."
therefore gives the number of k-subsets possible out of a set of distinct items. For example,
The 2-subsets of are the six pairs , , , , , and , so
(1)
Where denotes a factorial. Writing the factorial as a gamma function allows the
binomial coefficient to be generalized to non-integer arguments, including complex and .
The binomial coefficient is implemented in Mathematical as Binomial[n, k].
The binomial coefficients form the rows of Pascal's triangle, and the number of lattice paths
from the origin to a point ) is the binomial coefficient (Hilton and Pedersen
1991).
(2)
(3)
The finite difference analog of this identity is known as the Chu-Vandermonde identity. A
similar formula holds for negative integers,
(4)
(5)
(6)
(7)
(8)
(9)
(10)
where denotes the fractional part of . This inequality may be reduced to the study of
the exponential sums , where is the Mangoldt function. Estimates of these
sums are given by Jutila (1974, 1975), but recent improvements have been made by Granville
and Ramare (1996).
(11)
for all primes, and conjectured that it holds only for primes. This was disproved when Skiena
(1990) found it also holds for the composite number . Vardi (1991, p. 63)
subsequently showed that is a solution whenever is a Wieferich prime and that if
with is a solution, then so is . This allowed him to show that the only
solutions for composite are 5907, , and , where 1093 and 3511 are
Wieferich primes.
Consider the binomial coefficients , the first few of which are 1, 3, 10, 35,
126, ... (Sloane's A001700). The generating function is
(12)
These numbers are squarefree only for , 3, 4, 6, 9, 10, 12, 36, ... (Sloane's A046097),
with no others known. It turns out that is divisible by 4 unless belongs to a 2-automatic
set , which happens to be the set of numbers whose binary representations contain at most
two 1s: 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 16, 17, 18, ... (Sloane's A048645). Similarly, is
divisible by 9 unless belongs to a 3-automatic set , consisting of numbers for which the
representation of in ternary consists entirely of 0s and 2s (except possibly for a pair of
adjacent 1s; D. Wilson, A. Karttunen). The initial elements of are 1, 2, 3, 4, 6, 7, 9, 10, 11,
12, 13, 18, 19, 21, 22, 27, ... (Sloane's A051382). If is squarefree, then must belong to
. It is very probable that is finite, but no proof is known. Now, squares larger than
4 and 9 might also divide , but by eliminating these two alone, the only possible for
are 1, 2, 3, 4, 6, 9, 10, 12, 18, 33, 34, 36, 40, 64, 66, 192, 256, 264, 272, 513, 514, 516,
576 768, 1026, 1056, 2304, 16392, 65664, 81920, 532480, and 545259520. All of these but
the last have been checked (D. Wilson), establishing that there are no other such that is
squarefree for .
Erdos showed that the binomial coefficient with is a power of an integer for
the single case (Le Lionnais 1983, p. 48). Binomial coefficients are
squares when is a triangular number, which occur for , 6, 35, 204, 1189, 6930, ...
(Sloane's A001109). These values of have the corresponding values , 9, 50, 289, 1682,
9801, ... (Sloane's A052436).
floor function, although the subset of coefficients is sometimes also given this name.
Erdos and Graham (1980, p. 71) conjectured that the central binomial coefficient is
never squarefree for , and this is sometimes known as the Erdos squarefree conjecture.
Sárkozy's theorem (Sárkozy 1985) provides a partial solution which states that the binomial
coefficient is never squarefree for all sufficiently large (Vardi 1991). Granville
and Ramare (1996) proved that the only squarefree values are and 4. Sander (1992)
subsequently showed that are also never squarefree for sufficiently large as long as
is not "too big."
(13)
such binomial coefficient has least prime factor or with the exceptions ,
The binomial coefficient (mod 2) can be computed using the XOR operation XOR ,
making Pascal's triangle mod 2 very easy to construct.
This algorithm can also be used to detect the presence of negative cycles—the graph has one
if at the end of the algorithm, the distance from a vertex v to itself is negative.
Let dist(k,i,j) be the the length of the shortest path from i and j that uses only the vertices
as intermediate vertices. The following recurrence:
• k = 0 is our base case - dist(0,i,j) is the length of the edge from vertex i to vertex j if it
exists, and otherwise.
• dist(k,i,j) = min(dist(k - 1,i,k) + dist(k - 1,k,j),dist(k - 1,i,j)): For any vertex i and
vertex j, the length of the shortest path from i to j with all intermediate vertices
simply does not involve the vertex k at all (in which case it is the same as dist(k -
1,i,j)), or that the shorter path goes through vertex k, so the shortest path between
vertex i and vertex j is the combination of the path from vertex i to k, and from vertex
k to j.
After N iterations, there is no need anymore to go through any more intermediate vertices, so
the distance dist(N,i,j) represents the shortest distance between i and j.
Pseudocode
for i = 1 to N
for j = 1 to N
if there is an edge from i to j
dist[0][i][j] = the length of the edge from i to j
else
dist[0][i][j] =
for k = 1 to N
for i = 1 to N
for j = 1 to N
dist[k][i][j] = min(dist[k-1][i][j], dist[k-1][i][k] + dist[k-1]
[k][j])
This will give the shortest distances between any two nodes, from which shortest paths may
be constructed.
This algorithm takes Θ(N3) time and Θ(N3) space, and has the distinct advantage of hiding a
small constant in its behavior, since very little work is done in the innermost loop.
Furthermore, the space-bound can be reduced further to Θ(N2) by noticing that dist(k,i,j) is
independent from dist(k - 1,i,j).
Given a digraph g = (V, E) determine for each i , jEV either or not there exists a path of
length one or more from vertex i to vertex j.
Def: Given the adjacency matrix C of any digraph C = (v,E), the matrix A is called the
transitive closure of C if i, jtex2html_image_mark>#tex2html_wrap_inline27764# V,
Abstract
The purpose of this paper is to analyze several algorithm design paradigms applied to a
Single problem – the 0/1 Knapsack Problem. The Knapsack problem is a combinatorial
Optimization problem where one has to maximize the benefit of objects in a knapsack
Without exceeding its capacity. It is an NP-complete problem and as such an exact
Solution for a large input is practically impossible to obtain.
The main goal of the paper is to present a comparative study of the brute force, dynamic
Programming, memory functions, branch and bound, greedy, and genetic algorithms. The
Paper discusses the complexity of each algorithm in terms of time and memory
Requirements, and in terms of required programming efforts. Our experimental results
Show that the most promising approaches are dynamic programming and genetic
Algorithms. The paper examines in more details the specifics and the limitations of these
Two paradigms.
Introduction
Different Approaches
Brute Force
Brute force is a straightforward approach to solving a problem, usually directly based on
The problem’s statement and definitions of the concepts involved. If there are n items to
Choose from, and then there will be 2n possible combinations of items for the knapsack. An
Item is either chosen or not chosen. A bit string of 0’s and 1’s is generated which is of
Length n. If the ith symbol of a bit string is 0, then the ith item is not chosen and if it is 1,
the ith item is chosen.
Dynamic Programming
If j < wi then
Table[i, j] ← Table[i-1, j] Cannot fit the ith item
Else
Table[i, j] ← maximum { Table[i-1, j] Do not use the ith item
AND
vi + Table[i-1, j – vi,] Use the ith item
The goal is to find Table [N, Capacity] the maximal value of a subset of the knapsack.
The two boundary conditions for the KP are:
- The knapsack has no value when there no items included in it (i.e. i = 0).
Table [0, j] = 0 for j _ 0
- The knapsack has no value when its capacity is zero (i.e. j = 0), because no items
can be included in it.
Table [i, 0] = 0 for i _ 0
In the implementation of the algorithm instead of using two separate arrays for the
Weights and the values of the items, we used one array Items of type item, where item is a
Structure with two fields: weight and value.
To find which items are included in the optimal solution, we use the following algorithm:
n _ N c _ Capacity
Start at position Table[n, c]
While the remaining capacity is greater than 0 do
If Table[n, c] = Table[n-1, c] then
Item n has not been included in the optimal solution
Else
Item n has been included in the optimal solution
Process Item n
Move one row up to n-1
Move to column c – weight(n)
Memory Functions
Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at any moment;
it never reconsiders this decision, whatever situation may arise later.
Problem Make a change of a given amount using the smallest possible number of coins.
Informal Algorithm
Make change for n units using the least possible number of coins.
MAKE-CHANGE (n)
S ← S {value of x}
sum ← sum + x
Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2
dollars, 3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it
chooses the largest coin without worrying about the consequences. Moreover, it never
changes its mind in the sense that once a coin has been included in the solution set, it remains
there.
To construct the solution in an optimal way. Algorithm maintains two sets. One contains
chosen items and the other contains rejected items.
A feasible set (of candidates) is promising if it can be extended to produce not merely a
solution, but an optimal solution to the problem. In particular, the empty set is
always promising why? (Because an optimal solution always exists)
Unlike Dynamic Programming, which solves the sub problems bottom-up, a greedy strategy
usually progresses in a top-down fashion, making one greedy choice after another, reducing
each problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem
that lend to a greedy strategy.
Greedy-Choice Property
It says that making a locally optimal choice can arrive at a globally optimal solution.
Like Kruskal's algorithm, Prim's algorithm is based on a generic MST algorithm. The main
idea of Prim's algorithm is similar to that of Dijkstra's algorithm for finding shortest path in a
given graph. Prim's algorithm has the property that the edges in the set A always form a
single tree. We begin with some vertex v in a given graph G =(V, E), defining the initial set
of vertices A. Then, in each iteration, we choose a minimum-weight edge (u, v), connecting a
vertex v in the set A to the vertex u outside of set A. Then vertex u is brought in to A. This
process is repeated until a spanning tree is formed. Like Kruskal's algorithm, here too, the
important fact about MSTs is we always choose the smallest-weight edge joining a vertex
inside set A to the one outside the set A. The implication of this fact is that it adds only edges
that are safe for A; therefore when the algorithm terminates, the edges in set A form a MST.
Choose a node and build a tree from there selecting at every stage the shortest available edge
that can extend the tree to an additional node.
Algorithm
MST_PRIM (G, w, v)
Like Prim's algorithm, Kruskal's algorithm also constructs the minimum spanning tree of a
graph by adding edges to the spanning tree one-by-one. At all points during its execution the
set of edges selected by Prim's algorithm forms exactly one tree. On the other hand, the set of
edges selected by Kruskal's algorithm forms a forest of trees.
Kruskal's algorithm is conceptually quite simple. The edges are selected and added to the
spanning tree in increasing order of their weights. An edge is added to the tree only if it does
not create a cycle.
The beauty of Kruskal's algorithm is the way that potential cycles are detected. Consider an
undirected graph . We can view the set of vertices, , as a universal set and the
set of edges, , as the definition of an equivalence relation over the universe . (See
Definition ). In general, an equivalence relation partitions a universal set into a set of
equivalence classes. If the graph is connected, there is only one equivalence class--all the
elements of the universal set are equivalent. Therefore, a spanning tree is a minimal set of
equivalences that result in a single equivalence class.
Each subsequent element of the sequence is obtained from its predecessor by joining two of
the elements of the partition. Therefore, has the form
for .
To construct the sequence the edges in are considered one-by-one in increasing order of
their weights. Suppose we have computed the sequence up to and the next edge to be
considered is . If v and w are both members of the same element of partition , then
the edge forms a cycle, and is not part of the minimum-cost spanning tree.
On the other hand, suppose v and w are members of two different elements of partition , say
and (respectively). Then must be an edge in the minimum-cost spanning tree. In
this case, we compute by joining and . I.e., we replace and in by the union
.
Figure illustrates how Kruskal's algorithm determines the minimum-cost spanning tree of
the graph shown in Figure . The algorithm computes the following sequence of
partitions:
Dijkstra's algorithm solves the single-source shortest-path problem when all edges have non-
negative weights. It is a greedy algorithm and similar to Prim's algorithm. Algorithm starts at
the source vertex, s, it grows a tree, T, that ultimately spans all vertices reachable from S.
Vertices are added to T in order of distance i.e., first S, then the vertex closest to S, then the
next closest, and so on. Following implementation assumes that graph G is represented by
adjacency lists.
DIJKSTRA (G, w, s)
Analysis
Step1. Given initial graph G=(V, E). All nodes nodes have infinite cost except the source
node, s, which has 0 cost.
Step 2. First we choose the node, which is closest to the source node, s. We initialize d[s] to
0. Add it to S. Relax all nodes adjacent to source, s. Update predecessor (see red arrow in
diagram below) for all nodes updated.
Step 4. Now, node y is the closest node, so add it to S. Relax node v and adjust its
predecessor (red arrows remember!).
Step 5. Now we have node u that is closest. Choose this node and adjust its neighbor node v.
We discuss here an example in which the binary tree structure is of value. Consider the
problem of coding (in binary) a message consisting of a string of characters. This is routinely
done in a computer system; the code used almost universally at present is known as ASCII 3.5,
and allocates 8 bits to store each character. Thus A is represented using decimal 65, or
01000001 in binary etc. A more modern one, which allows a much wider range of languages
to be represented, is Unicode, which allocates 16 bits to each character. This is used for
example by the language Java, and is an extension of ASCII in that any ASCII character can
be converted to Unicode by prefixing it with the zero byte. Although these codes are simple,
there are obvious inefficiencies; clearly Unicode wastes at least half of the available space
when storing plain ASCII.
In Following Table we give an example of such a prefix code for a small alphabet, and
contrast it with a simple fixed length code. It is clear that there are savings in this case which
make it worth going further. We will see shortly why the example has the prefix property; in
the meantime check that the string ``0000100111'' in Code 2 decodes uniquely as ``acbd''.
Code 1 has fixed length code; Code 2 has the prefix property.
Symbol Code 1 Code 2
a 001 000
b 001 11
c 010 01
d 011 001
e 100 10
Consider now a binary tree, in which each leaf node is labeled with a symbol. We can assign
a binary code to each symbol as follows: associate ``0'' with the path from a node to its left
child, and ``1'' with the corresponding path to the right child. The code for a symbol is
obtained by following the path from the root to the leaf node containing that symbol. The
code necessarily has the prefix property; the tree property means that a leaf node cannot
appear on a path to another leaf. Conversely it is clear how to associate a binary tree with a
binary code having the prefix property; the code describes the shape of the tree down to the
leaf associated with each symbol.
Of course a fixed length code necessarily has the prefix property. We show in Fig. 3.6 the
binary trees corresponding to the two codes given in Table 3.1, thus incidentally
demonstrating that the variable length code in the example does have the prefix property.
We now describe how to build the binary Huffman code for a given message. This code has
the prefix property, and in a fairly useful sense turns out to be the best such code. We
describe the code by building the corresponding binary tree. We start by analyzing the
message to find the frequencies of each symbol that occurs in it. Our basic strategy will be to
assign short codes to symbols that occur frequently, while still insisting that the code has the
prefix property. Our example will be build around the message
The corresponding frequencies are given in following table note that in this case, we choose
to include the space symbol `` '', written in the table as .
Symbol frequencies used to build a Huffman Code.
I ABDMEOCFGSTLRNPU
6 3 3 2 4 5 3 1 1 2 4 3 2 2 5 1 2 11
Now begin with a collection (a forest) of very simple trees, one for each symbol to be coded,
with each consisting of a single node, labeled by that symbol, and the frequency with which
it occurs in the string. The construction is recursive: at each stage the two trees which
account for the least total frequency in their root nodes are selected, and used to produce a
new binary tree. This has, as its children the two trees just chosen: the root is then labeled
with the total frequency accounted for by both sub trees, and the original sub trees are
removed from the forest. The construction continues in this way until only one-tree remains;
that is then the Huffman encoding tree.
The resulting Huffman encoding tree for our example string is shown in above fig By
construction, the symbols only occur at leaf nodes, and so the corresponding code has the
prefix property. In the diagram, these leaf nodes still carry the frequencies used in their
construction; formally once the tree has been built, the symbols, which are shown below the
leaves, should replace the frequencies at the nodes. The right-most node is the symbol. As
already described, the character encoding is the read by traversing from the root to each leaf,
recording ``0'' if the left hand branch is traversed, and ``1'' if the right hand one is taken. Thus
``S'' is encoded as ``0100'', while is ``11'' and ``C'' is ``000110''.
Definition Let T be a tree with weights w1,...wn at its leaf nodes. The weighted leaf path
length L(T) of T is
(3.2)
L(T) = liwi
where li is the path length; the length of the path from the root to node i.
We are interested in the case when the tree is an encoding tree and the weights are the
frequency of occurrence of the symbols associated with the leaf nodes. In that case L(T) is
the length of the message after encoding, since at node i, the character occurs a total of wi
times, and requires li bits to encode it. We now show that a Huffman encoding tree gives the
best encoding. Say that a binary tree T is optimal if L(T) has its minimum value over all
possible trees with the same set of leaf nodes.
Consider now the effect on the weighted leaf path length L(T) of interchanging the weights
on nodes i and j. The new weighted leaf path length is
L(T) - (wjlj + wili) + (wjli + wilj) < L(T).
Thus T was not optimal, since the new tree has a smaller weighted leaf path length.
Lemma Suppose that nodes in an optimal tree have weights wi labeled so that w1 w2 ...
wn. Then by relabelling if necessary subject to this constraint, we can also have l1 l2 ...
ln.
Proof. Suppose conversely that i < j but li < lj. Since i < j, we have wi wj. However if wi <
wj, then by Lemma 3.8 we have lj li since we are assuming the tree is optimal. But lj > li,
showing that we must have wi = wj. There is thus no loss if we interchange the labels i and j.
We can continue to do this until we achieve the required consistent labeling of the
corresponding node lengths.
We can now show that a Huffman tree is optimal. This argument was adapted from Gersting
(1993, Page 402). We establish the result by induction on the number n of leaf nodes in the
tree. The result is clear for n = 1.
Next note that in any optimal binary tree, there are no nodes with single children -- replacing
the child by the parent produces a shorter weighted external path length.
Thus L(Tn + 1) L(Hn + 1). But Tn + 1 was optimal, so this inequality is an equality and Hn + 1 is
optimal, as required.
Remark 3.10 The above proof has to be done a little carefully. The (complete) binary tree
having nodes with weights 1, 3, 2 and 2 is not a Huffman tree, but is optimal; however
interchanging the second and third nodes does not affect the weighted leaf path length and
does give a Huffman Tree. In the proof, this interchange is the step of creating T'n + 1 from Tn +
1
A little more can be said, centered round the need to have the coding tree available when
decoding. Of course, for ``general purpose'' language, the letter frequencies are well known,
and could be assumed. In general, the need to transmit the coding tree as well as the message
reduces the effectiveness of the method a little. And it can be impractical to preprocess a
message to get the exact frequencies of the symbols before any of the message is transmitted.
There is a variant however, called adaptive Huffman coding, in which the frequencies are
assumed initially to be all the same, and then adjusted in the light of the message being coded
to reflect the actual frequencies. Since the decoder has access to the same information as the
encoder, it can be arranged that the decoder changes coding trees at the same point as the
encoder does; thus they keep in step. One way to do that is to introduce extra ``control''
symbols to the alphabet, and use these to pass such information.
Modern coding schemes such as zip (or gzip or pkzip) are based on these ideas.
Decision trees
It allows finding lower bounds of comparison-based algorithm for sorting and searching in
sorted arrays
Is it possible to invent faster sorting then merge sort?
Is the binary search fastest algorithm for searching in sorted arrays?
“Complexity theory”
Lower-Bound
Efficiency of algorithm can be seen in two ways find an asymptotic efficiency class and
compare with efficiency classes (sequential search is more effective then selection sort,
because it runs in O(n) time, while selection sort need O(n2)) – comparing apples and
Oranges
Ask how effective is particular algorithm compared with other algorithms that saves same
problem – sequential search is slow, there exists search algorithm that runs in O(n log n)
To compare algorithm with other algorithms for same problem, we wish to know what is the
best possible efficiency of any algorithm for particular problem
Information-Theoretic Arguments
It is based on amount of information algorithm has to produce
Example: Think of number from 1 to n
Find answer by asking questions... Answer can be Yes/No
We can encode number by log2 n bits.
Each answer gives us one bit of information.
Algorithm will need at least log2 n steps before finding output – worst case
Problem Reduction
To show that problem P is at least as hard as problem Q, we need to reduce Q to P
Show that arbitrary instance of Q, can be transformed to instance of P
So any algorithm solving P would also solve Q
The lower bound of Q will be lower bound of P too
Decision Trees
Many algorithms (sorting,...) works by comparing their input items
Each internal node in decision tree represents a key comparison k<k'
Left sub tree contain information about comparisons in case k<k', right sub tree otherwise
Leaf represents output of algorithm
Number of comparisons in worst case is same as height of a tree
So for a tree with i leaves – h >= log2 i
P and NP
Problems that can be solved in polynomial time make a set of called P (informally)
DEFINITION: Class P is a class of decision problems that can be solved in polynomial time
by (deterministic) algorithms. This class of problems is called polynomial.
Many problems can be reduced to a set of decision problems (example: what is the smallest
number of colors needed to color vertices of graph so that no two adjacent vertices have same
color? –optimization problem. We ask if there exists such coloring with no
More then m colors, m=1,2,3,....)
While creating solutions may be computationally difficult, verifying that solution is correct is
after rather trivial (can be done in polynomial time) It is easy to check if the proposed list of
vertices is Hamiltonian circuit
DEFINITION: A no deterministic algorithm is a two-stage procedure that takes as input
instance I of decision problem and does following:
No deterministic stage(guessing): An arbitrary string S is created that is candidate solution
Deterministic (verification): Deterministic algorithm takes S and I and answer yes if S is
solution to I
Nondeterministic algorithm solves decision problems if for every yes instance of problem it
returns yes on some execution (it is able to guess correct answer at least once).
Nondeterministic algorithm is nondeterministic polynomial if verification stage is
polynomial
P is subset of NP
NP also contain Hamiltonian circuit problem, traveling salesman, graph coloring,...
The open question in computer science: is P == NP???
NP-complete problems are as difficult as any other in NP class, because any other problem in
NP can be reduced to NP-complete
NP-Complete problems
The Idea
• As we already know, there are undesirable problems
We need to detect and then problems that are difficult
to solve - algorithms takes too much time for arbitrary
Instances of problem
• Both strategies are improvements over exhaustive search -brutal force method for
combinatorial problem
Branch-and-bound
Backtracking
• We look for element with particular properties in domain that grows exponentially
with size of input (travelingsalesman)
• We generate one component at time
• Evaluate
If partial solution can be developed without violating problem's constraints – the first
remaining legitimate option for next component is taken
If there is no option for next component, no alternatives for any remaining
components need to be considered
Algorithms backtrack and replace last component with alternative
• We construct a state-space-tree
Root represents initial state
Nodes at first level – first components of solution
Node in a tree is called promising if it can lead to solution
otherwise – nonpromising
Leaves are nonpromising dead ends or solutions
• If current node is promising, it's child is generated and processing moves to child
• If current node is nonpromising, algorithm backtracks to parent node and consider
next possible option
n-Queens problem
In general
• Subset-Sum problem
Find a subset of a given set S={s1,s2,...,sn} of n positive integers whose sum
is equal to a given positive integer d.
Example: S={1,2,5,6,8}, d=9.
Solutions are: {1,2,6},{1,8}
Branch-Branch-and-bound
• Idea of backtracking was to cut of a branch as soon as we know it does not lead to
solution
• Feasible solution in optimization problems is a point in search space that satisfies all
the constraints
• With such information we can compare node's bound value with value of best
solution seen so far
Assignment problem
• Assigning n people n jobs so that total cost of the assignment is as small as possible
-Instance of problem is defined by n-by-n matrix
Select one element in each row so that no two selected elements are in same
Column and their sum is smallest possible
-Cost of any solution cannot be smaller then sum of smallest elements in each
row of matrix – this is not the cost of valid solution – just lower bound 2+3+1+4=10
-Any legitimate selection that selects 9 from first row the lower bound will
be 9+3+1+4 = 17
• We may apply the branch-and-bound if we can come up with lower bound on tour
lengths
-trivial one may be to find the smallest distance between two cities and multiply it by
number of cities n
-or, for each city, we make sum of distances to two closest cities, we compute a
sum of those for n cities and divide result by two – this one is more informative
Comments
• Many of those are greedy and based on some problem specific heuristics
-Heuristic is a common sense rule based on experience rather then mathematic
assertion