Graphs and Algorithms
Graphs and Algorithms
Scribes: Romil Verma, Juliana Cook (2015), Virginia Williams, Date: Oct 22, 2019
and Seth Hildick-Smith (2016), G. Valiant (2017), M. Wootters (2017)
Adapted From Virginia Williams’ lecture notes
1 Graphs
A graph is a set of vertices and edges connecting those vertices. Formally, we define a graph G as
G = (V, E) where E ⊆ V × V . For ease of analysis, the variables n and m typically stand for the number
of vertices and edges, respectively. Graphs can come in two flavors, directed or undirected. If a graph
is undirected, it must satisfy the property that (i, j) ∈ E iff (j, i) ∈ E (i.e., all edges are bidirectional). In
undirected graphs, m ≤ n(n−1)
2 . In directed graphs, m ≤ n(n − 1). Thus, m = O(n2 ) and log m = O(log n).
A connected graph is a graph in which for any two nodes u and v there exists a path from u to v. For an
undirected connected graph m ≥ n − 1. A sparse graph is a graph with few edges (for example, Θ(n) edges)
while a dense graph is a graph with many edges (for example, m = Θ(n2 )).
1.1 Representation
A common issue is the topic of how to represent a graph’s edges in memory. There are two standard methods
for this task.
An adjacency matrix uses an arbitrary ordering of the vertices from 1 to |V |. The matrix consists of
an n × n binary matrix such that the (i, j)th element is 1 if (i, j) is an edge in the graph, 0 otherwise.
An adjacency list consists of an array A of |V | lists, such that A[u] contains a linked list of vertices v
such that (u, v) ∈ E (the neighbors of u). In the case of a directed graph, it’s also helpful to distinguish
between outgoing and ingoing edges by storing two different lists at A[u]: a list of v such that (u, v) ∈ E
(the out-neighbors of u) as well as a list of v such that (v, u) ∈ E (the in-neighbors of u).
What are the tradeoffs between these two methods? To help our analysis, let deg(v) denote the degree
of v, or the number of vertices connected to v. In a directed graph, we can distinguish between out-degree
and in-degree, which respectively count the number of outgoing and incoming edges.
• The adjacency matrix can check if (i, j) is an edge in G in constant time, whereas the adjacency list
representation must iterate through up to deg(i) list entries.
• The adjacency matrix takes Θ(n2 ) space, whereas the adjacency list takes Θ(m + n) space.
• The adjacency matrix takes Θ(n) operations to enumerate the neighbors of a vertex v since it must
iterate across an entire row of the matrix. The adjacency list takes deg(v) time.
What’s a good rule of thumb for picking the implementation? One useful property is the sparsity of the
graph’s edges. If the graph is sparse, and the number of edges is considerably less than the max (m n2 ),
then the adjacency list is a good idea. If the graph is dense and the number of edges is nearly n2 , then
the matrix representation makes sense because it speeds up lookups without too much space overhead. Of
course, some applications will have lots of space to spare, making the matrix feasible no matter the structure
of the graphs. Other applications may prefer adjacency lists even for dense graphs. Choosing the appropriate
structure is a balancing act of requirements and priorities.
1
paths, only backtracking when it hits a dead end or an already-explored section of the graph. DFS by itself
is fairly simple, so we introduce some augmentations to the basic algorithm.
• To prevent loops, DFS keeps track of a “color” attribute for each vertex. Unvisited vertices are white
by default. Vertices that have been visited but still may be backtracked to are colored gray. Vertices
which are completely processed are colored black. The algorithm can then prevent loops by skipping
non-white vertices.1
• Instead of just marking visited vertices, the algorithm also keeps track of the tree generated by the
depth-first traversal. It does so by marking the “parent” of each visited vertex, aka the vertex that
DFS visited immediately prior to visiting the child.
• The augmented DFS also marks two auto-incrementing timestamps d and f to indicate when a node
was first discovered and finished.
The algorithm takes as input a start vertex s and a starting timestamp t, and returns the timestamp at
which the algorithm finishes. Let N (s) denote the neighbors of s; for a directed graph, let Nout (s) denote
the out-neighbors of s.
Algorithm 1: init(G)
foreach v ∈ G do
color(v) ← white
d(v), f(v) ← ∞
p(v) ← nil
2
Algorithm 4: DFS(G) \\ DFS on an entire graph G
init(G);
t←1
foreach v ∈ G do
if color(v) = white then
t ← DFS(v, t)
t++
We mark all of the nodes as unvisited and start at a white node, in our case node a.
3
From node a we will visit all of a’s children, namely node b.
Node c has two children that we must visit. When we try to visit node a we find that node a has already
been visited (and would be colored gray, as we are in the process of searching a’s children), so we do not
continue searching down that path. We will next search c’s second child, node d.
Since node d has no children, we return back to its parent node, c, and continue to go back up the path
we took, marking nodes with a finish time when we have searched all of their children.
4
Once we reach our first source node a we find that we have searched all of its children, so we look in the
graph to see if there are any unvisited nodes remaining. For our example, we start with a new source node
e and run DFS to completion.
BFS(s) computes sets Li , the set of nodes at distance i from s, as seen in the diagram below.
5
Algorithm 5: BFS(s)
Set vis[v] ← false for all v;
Set all Li for i = 1 to n − 1 to ∅;
L0 = s;
vis[s] ← true;
for i = 0 to n − 1 do
if Li = ∅ then
exit
while Li 6= ∅ do
u ← Li .pop();
\\ In the loop below, replace N with Nout for a directed graph.;
foreach x ∈ N (u) do
if vis[u] is false then
vis[u] ← true;
Li+1 .insert(x);
p(x)← u;
3.2 Correctness
We will now show that BFS correctly computes the shortest path between the source node and all other
nodes in the graph. Recall that Li is the set of nodes that BFS calculates to be distance i from the source
node.
6
Claim 1. For all i, Li = {x|d(s, x) = i}.
Proof of Claim 1. We will prove this by (strong) induction on i. Base case (i = 0): L0 = s.
Suppose that Lj = {x|d(s, x) = j} for every j ≤ i (induction hypothesis for i).
We will show two things: (1) if y was added to Li+1 , then d(s, y) = i + 1, and (2) if d(s, y) = i + 1, then
y is added to Li+1 . After proving (1) and (2) we can conclude that Li+1 = {y|d(s, y) = i + 1} and complete
the induction.
Let’s prove (1). First, if y is added to Li+1 , it was added by traversing an edge (x, y) where x ∈ Li , so
that there is a path from s to y taking the shortest path from s to x followed by the edge (x, y), and so
d(s, y) ≤ d(s, x) + 1. Since x ∈ Li , by the induction hypothesis, d(s, x) = i, so that d(s, y) ≤ i + 1. However,
since y ∈
/ Lj for any j ≤ i, by the induction hypothesis, d(s, y) > i, and so d(s, y) = i + 1.
Let’s prove (2). If d(s, y) = i + 1, then by the inductive hypothesis y ∈ / Lj for j ≤ i. Let x be the node
before y on the s → y shortest path P . As d(s, y) = i + 1 and the portion of P from s to x is a shortest
path and has length exactly i. Thus, by the induction hypothesis, x ∈ Li . Thus, when x was scanned, edge
(x, y) was scanned as well. If y had not been visited when (x, y) was scanned, then y will be added to Li+1 .
Hence assume that y was visited before (x, y) was scanned. However, since y ∈ / Lj for any j ≤ i, y must
have been visited by scanning another edge out of a node from Li , and hence again y is added to Li+1 .