Topic Notes: Introduction and Overview: What Is An Algorithm?
Topic Notes: Introduction and Overview: What Is An Algorithm?
Analysis of Algorithms
Siena College
Spring 2011
What is an Algorithm?
A possible definition: a step-by-step method for solving a problem.
An algorithm does not need to be something we run on a computer in the modern sense. The notion
of an algorithm is much older than that. But it does need to be a formal and unambiguous set of
instructions.
The good news: if we can express it as a computer program, its going to be pretty formal and
unambiguous.
The algorithm implemented by this function or method has inputs (the three numbers) and one
output (the largest of those numbers).
The algorithm is defined precisely and is deterministic.
This notion of determinism is a key feature: if we present the algorithm multiple times with the
same inputs, it follows the same steps, and obtains the same outcome.
CS 385 Analysis of Algorithms Spring 2011
This gives the right answer when it gives any answer. But it does not compute any answer for many
perfectly valid inputs.
We will also be concerned with the efficiency in both time (number of instructions) and space
(amount of memory needed).
2
CS 385 Analysis of Algorithms Spring 2011
Searching
Primality testing
Knapsack problem
Chess
Towers of Hanoi
Sorting
Program termination
Brute force
Greedy approach
Dynamic programming
The study of algorithms often extends to the study of advanced data structures. Some should be
familiar; others likely will be new to you:
stacks/queues
priority queues
3
CS 385 Analysis of Algorithms Spring 2011
graph structures
tree structures
Finally, the course will often require you to write formal analysis and often proofs. You will
practice your technical writing. As part of this, you will gain experience with the mathematical
typesetting software LATEX.
Pseudocode
We will spend a lot of time looking at algorithms expressed as pseudocode.
Unlike a real programming language, there is no formal definition of pseudocode. In fact, any
given textbook is likely to have its own style for pseudocode.
Our text has a specific pseudocode style. My own style looks more like Java or C++ code. I will
not be picky about the pseudocode style you use as long as its clear what you mean.
A big advantage of using pseudocode is that we do not need to define types of all variable or
complex structures.
until the second number becomes 0, which makes the problem trivial.
Example: gcd(60,24) = gcd(24,12) = gcd(12,0) = 12
More precisely, application of Euclids Algorithm follows these steps:
4
CS 385 Analysis of Algorithms Spring 2011
while (n != 0) {
r = m mod n
m = n
n = r
}
return m
}
the second number (n) gets smaller with each iteration and can never become negative
so the second number in the pair eventually becomes 0, at which point the algorithm stops.
Euclids Algorithm is just one way to compute a GCD. Lets look at a few others:
Consecutive integer checking algorithm: check all of the integers, in decreasing order, starting
with the smaller of the two input numbers, for common divisibilty.
This algorithm will work. It always stops because every time around, Step 4 is performed, which
decreases t. It will eventually become t=1, which is always a common divisor.
Lets run through the computation of gcd(60,24):
5
CS 385 Analysis of Algorithms Spring 2011
Step 2 Divide m=60 by t=24 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=23 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=22 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=21 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=20 and check the remainder. It is 0, so we proceed to Step 3
Step 3 Divide n=24 by t=20 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=19 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=18 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=17 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=16 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=15 and check the remainder. It is 0, so we proceed to Step 3
Step 3 Divide n=24 by t=15 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=14 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=13 and check the remainder. It is not 0, so we proceed to Step 4
Step 2 Divide m=60 by t=12 and check the remainder. It is 0, so we proceed to Step 3
6
CS 385 Analysis of Algorithms Spring 2011
Step 3 Divide n=24 by t=12 and check the remainder. It is 0, so we return t=12 as our gcd
However, it does not work if one of our input numbers is 0 (unlike Euclids Algorithm). This is a
good example of why we need to be careful to specify valid inputs to our algorithms.
Another method is one you probably learned in around 7th grade.
While this took only a total of 4 steps, the first two steps are quite complex. Even the third is
not completely obvious. The description lacks an important characteristic of a good algorithm:
precision.
We could not easily write a program for this without doing more work. Once we work through
these, it seems that this is going to be a more complicated method.
We can accomplish the prime factorization in a number of ways. We will consider one known as
the sieve of Eratosthenes:
Sieve(n) {
for p = 2 to n { // set array values to their index
A[p] = p
}
for p = 2 to floor(sqrt(n)) {
if A[p] != 0 { //p hasnt been previously eliminated from the list
j = p * p
while j <= n {
A[j] = 0 //mark element as eliminated
j = j + p
}
}
// nonzero entries of A are the primes
7
CS 385 Analysis of Algorithms Spring 2011
Given this procedure to determine the primes up to a given value, we can use those as our candidate
prime factors in steps 1 and 2 of the middle school gcd algorithm. Note that each prime may be
used multiple times.
So in this case, the seemingly simple middle school procedure ends up being quite complex, since
we need to fill in the vague portions.
Linear Structures
The basic linear structures are your standard one-dimensional list structures: arrays, linked lists,
and strings.
Some characteristics of arrays:
Strings are usually built using arrays, and normally consist of bits or characters.
Important operations on strings include finding the length (whose efficiency depends on whether
the strings is counted or null-terminated), comparing, and concatenating.
Some characteristics of linked lists:
data stored in a list node along with a reference to the next list node (and to the previous one
for a doubly linked list)
cost of access/add/remove depends on position within the list
lends itself to an efficient traversal
These basic structures are used for many purposes, including as building blocks for more restrictive
linear structures: stacks and queues.
For a stack, additions (pushes) and removals (pops) are allowed only at one end (the top), meaning
those operations can be made to be very efficient. A stack is a last-in first-out (LIFO) structure.
For a queue, additions (enqueues) are made to one end (the rear of the queue) and removals (de-
queues) are made to the other end (the front of the queue). Again, this allows those operations to
be made efficient. A queue is a first-in first-out (FIFO) structure.
8
CS 385 Analysis of Algorithms Spring 2011
A variation on a queue is that of a priority queue, where each element is given a ranking and the
highest-ranked item is the only one allowed to be removed, regardless of the order of insertion. A
clever implementation using another structure called a heap can make both the insert and remove
operations on a priority queue efficient.
Graphs
A graph G is a collection of nodes or vertices, in a set V , joined by edges in a set E. Vertices have
labels. Edges can also have labels (which often represent weights). Such a graph would be called
a weighted graph.
The graph structure represents relationships (the edges) among the objects stored (the vertices).
D
B C
7
4 2
11
1
A 8
3 E
5
H G
F
Two vertices are adjacent if there exists an edge between them.
e.g., A is adjacent to B, G is adjacent to E, but A is not adjacent to C.
A simple path has no vertices repeated (except that the first and last may be the same).
e.g., A-B-C-E is a simple path.
A simple path is a cycle if the first and last vertex in the path are same.
e.g., B-C-F-B is a cycle.
Directed graphs (or digraphs) differ from undirected graphs in that each edge is given a
direction.
9
CS 385 Analysis of Algorithms Spring 2011
Two vertices u and v are connected if a simple path exists between them.
A subgraph S is a connected component iff there exists a path between every pair of vertices
in S.
e.g., {A,B,C,D,E,F,G} and {H} are the connected components of our example.
1. an adjacency matrix, or
2. adjacency lists.
As a running example, we will consider an undirected graph where the vertices represent the states
in the northeastern U.S.: NY, VT, NH, ME, MA, CT, and RI. An edge exist between two states if
they share a common border, and we assign edge weights to represent the length of their border.
We will represent this graph as both an adjacency matrix and an adjacency list.
In an adjacency matrix, we have a two-dimensional array, indexed by the graph vertices. Entries
in this array give information about the existence or non-existence of edges.
We represent a missing edge with null and the existence of an edge with a label (often a positive
number) representing the edge label (often representing a weight).
If the graph is undirected, then we could store only the lower (or upper) triangular part, since the
matrix is symmetric.
An adjacency list is composed of a list of vertices. Associated with each each vertex is a linked list
of the edges adjacent to that vertex.
10
CS 385 Analysis of Algorithms Spring 2011
Vertices Edges
VT/150 MA/54 CT/70
NY
VT NY/150 NH/172 MA/36
ME NH/160
RI MA/58 CT/42
Trees
In a linear structure, every element has unique successor.
In a tree, an element may have many successors.
We usually draw trees upside-down in computer science.
You wont see trees in nature that grow with their roots at the top (but you can see some at Mass
MoCA over in North Adams).
One example of a tree is an expression tree:
The expression
(2*(4-1))+((2+7)/3)
can be represented as
+
/ \
--- ----
/ \
* /
/ \ / \
2 - + 3
/ \ / \
4 1 2 7
11
CS 385 Analysis of Algorithms Spring 2011
1
1 2
1 4 2 3
1 8 4 5 2 7 3 6
1 16 8 9 4 13 5 12 2 15 7 10 3 14 6 11
1
1 2
1 4 2 3
1 8 4 5 2 7 3 6
8 9 5 12 7 10 6 11
The roots of the subtrees of a node are said to be the children of the node.
There may be many nodes without any successors: These are called leaves or leaf nodes.
The others are called interior nodes.
12
CS 385 Analysis of Algorithms Spring 2011
A simple path is series of distinct nodes such that there is an edge between each pair of
successive nodes.
The path length is the number of edges traversed in a path (equal to the number of nodes on
the path - 1)
The height of a node is length of the longest path from that node to a leaf.
The depth of a node is the length of the path from the root to that node.
Equivalently, the level of a node is the length of a path from the root to that node.
We often encounter binary trees trees whose nodes are all have degree 2.
We will also orient the trees: each subtree of a node is defined as being either the left or right.
Iterating over all values in linear structures is usually fairly easy. Moreover, one or two orderings
of the elements are the obvious choices for our iterations. Some structures, like an array, allow us
to traverse from the start to the end or from the end back to the start very easily. A singly linked
list however, is most efficiently traversed only from the start to the end.
For trees, there is no single obvious ordering. Do we visit the root first, then go down through the
subtrees to the leaves? Do we visit one or both subtrees before visiting the root?
There are four standard tree traversals, considered here in terms of binary trees (though most can
be generalized):
1. preorder: visit the root, then visit the left subtree, then visit the right subtree.
2. in-order visit the left subtree, then visit the root, then visit the right subtree.
3. postorder: visit the left subtree, then visit the right subtree, then visit the root.
4. level-order: visit the node at level 0 (the root), then visit all nodes at level 1, then all nodes
at level 2, etc.
For example, consider the preorder, in-order, and postorder traversals of the expression tree
13
CS 385 Analysis of Algorithms Spring 2011
/
* 2
+ -
4 3 10 5
14