Graph Course Notes yx
Graph Course Notes yx
Plaintext
DFS(G):
input G=(V,E) in adjacency list representation
output: vertices labeled by connected components
cc = 0
for all v in V, visited(v) = False, prev(v) = NULL
for all v in V:
if not visited(v):
cc++
Explore(v)
Plaintext
Explore(z):
ccnum(z) = cc
visited(z) = True
for all (z,w) in E:
if not visited(w):
Explore(w)
prev(w) = z
The overall running time is O(n + m), n = |V |, m = |M|. This is because you visit each node once,
and, at each node, you try the edges, hence the total run time is total nodes and total edges in order to
reach all nodes.
What if our graph is now directed? We can still use DFS, but add pre or postorder numbers and remove
the counters.
Plaintext
DFS(G):
clock = 1
for all v in V, visited(v) = False, prev(v) = NULL
for all v in V:
if not visited(v):
cc++
Explore(v)
Plaintext
Explore(z):
pre(z) = clock;clock++
visited(z) = True
for all (z,w) in E:
if not visited(w):
Explore(w)
prev(w) = z
post(z) = clock; clock++
Here is an example:
C
F B
E
A
D G
Assuming we start at B, this is how our DFS looks like define as Node(pre,post):
Blue edge reflects that it is “used” during DFS but not used because the node has been visited.
F(13,14) C
C(12,15)
F B
B(1,16) E
A E(4,7)
A(2,11) D(3,10)
D G
H(8,9) G(5,6)
H
• Treeedge such as A → B, B → D
◦ where post(z) > post(w), because of how you recurse back up during DFS
• Back edges B → A, F → B
• Forward edges D → G, B → E
• Cross edges F → H, H → G
Notice that only for the back edges it is di�erent in terms of the post order and behaves di�erently
from the other edges.
Cycles
A graph G has a cycle if and only if its DFS tree has a back edge.
Proof:
Given a → b → c → … → j → a which is a cycle. Suppose somewhere down the line we have this
node i, then the sub tree (descendants) of i must contain i − 1 which contains a backedge to i.
For the other direction, it is obvious, Consider the back edge B → A, then the cycle exists.
Toplogical sorting
Topologically sorting a DAG (directed acyclic graph that has no cycles): order vertices so that all edges
go from lower → higher. Recall that since it has no cycles, it has no back edges. So the post order
numbers must be post(z) > post(w) for any edge z → w.
So, to do this, we can order vertices by decreasing post order number. Note that for this case, since we
have n vertices, we can create a array of size 2n and insert the nodes according to their post order
number. So, our sorting of post order numbers runtime is O(n).
X Y
W
Z
Note, for instance, XYUWZ is not a valid topological order! (Basically Z must come before W) - Because
thereʼs a directed edge from Z to W, Z must come before W in any valid topological ordering of this
graph.
Analogy: Imagine youʼre assembling a toy. Piece Y is needed for both piece Z and piece W. But, piece Z
is also needed for piece W. You must attach Y to Z first, then Z to W. Even though Y is needed for W, you
donʼt attach it directly to W.
DAG Structure
By now, you are probably thinking, what does all this has to do with strongly connected component
(SCC)? We will show that it is possible to do so with two DFS search.
Connectivity in DAG
So, SCC defined as strongly connected component, is the maximal set of strongly connected vertices.
Example:
C F
I H
D J
K L
A B
E
How many strongly connected components (SCC) does the graph has? 5
• A is a SCC by itself since it can reach many other nodes but no other nodes can reach A
• {H,I,J,K,L} Can reach each other
• {C,F,G}
• {B,E}
• {D}
C,F,G
H,I,J,K,L
A B,E D
Notice that this meta graph is a DAG and it is always the case. This should be obvious because if two
strongly connected components are involved in a cycle, then they will be combined to form a bigger
SCC.
So, every directed graph is a DAG of itʼs strongly connected components. You can take any graph,
break it up into SCC and then topologically sort this SCC so that all edges go le� to right.
Motivation
There are many ways we can do this, such as start with sinking vertices or source vertices. But, instead
we can find the sink SCC, so, we find SCC S, output it, and remove it and repeat it. It turns out, Sinks
SCC are easier to work with!
Recall we take any v ∈ S , where S is the sick SCC. For example from the earlier graph we run
Explore(v) , and we run explore from any of of the vertices in {H,I,J,K,L} , we will explore these
vertices and not any other because it is a sink SCC.
What if we find a vertex in the source component? You ended up exploring the whole graph, too bad!
So, how can we be smart about this and find a vertex that lies in a sink component? Because if we can
do so (somehow magically select a vertex in the sink SCC), then we are guaranteed to find all the
nodes in the sink SCC.
In a directed directed G, can we use the same property? Does the property for a general graph, such
that v with the lowest post order always lie in a sink SCC? HA of course its not true, do you think it will
be that easy?
A(1,6) B
A B(2,3)
C(4,5)
C
Notice that B has the lowest post order (3) but it belongs to the SCC {A,B} .
What about the other way around? Does v with the highest post order always lie in a source SCC?
Turns out, this is true! How can we make use of this? Simple, just reverse it! So the source SCC of the
reverse graph is the sink SCC!
So, for directed G = (V , E), look at GR = (V , E R), so, the source SCC in G = sink SCC in GR. So,
we just flip the graph, run DFS, take the highest post order which is the source SCC in GR, that will be
the sink in G.
Example of SCC
lets consider the same graph but now we reverse it, and we start at node C
G
G(2,5)
F(3,4) L(15,24)
C F
C(1,12) I H
H(18,19)
I(20,21) J(17,22)
B(6,11) D J
K K(16,23)
L
A B D(13,14)
A(7,8)
E
E(9,10)
We start from c from the reverse graph GR, find the SCC at {C, G, F, B, A, E} , then we proceed to
{D} before going to {L} . you may notice that the choosing of which vertex might be important,
suppose you pick at any vertex at {H,I,J,K,L} , it will be able to reach all vertices except {D} so that
starting vertex will still have the highest post number.
• At first step, we start at L, so, we will reach {L,K,J,I,H} label as 1 and strike them out.
• We then visit D, and reach {D} , label as 2 and strike them out
• We do the same for C , reach {C,F,G} label as 3 and strike them out
1
5 4 3
Notice that this metagraph, they go from 5 → to1. So, the {1,1,1,1,1,2,3,4,4,5,3,3} also outputs
the topological order in reverse order! So we can take any graph, run two iterations of DFS, finds its
SCC, and structure these SCC in topological order.
SCC algorithm
Plaintext
SCC(G):
input: directed G=(V,E) in adjacency list
1. Construct G^R
2. Run DFS on G^R
3. Order V by decreasing post order number
4. Run undirected connected components alg on G based on the post order number
Proof of claim:
Given two SCC S and S ′, and there is an edge v ∈ S → w ∈ S ′, the claim is the max post number in
S is always greater than max post number of S ′.
The first case is if we start from z ∈ S ′, then, we finish exploring in S ′ before moving to S , so post
numbers in S will be bigger.
The second case is if we start z ∈ S , then z will be the root node since we can travel to S ′ from z.
Since z is the root node, then it must have the highest post order number.
DFS : connectivity
BFS:
input: G = (V , E) & s ∈ V
output: for all v ∈ V , dist(v) = min number of edges from s to v and prev(v).
Dijkstraʼs:
Note, Dijkstra uses the min-heap (known as priority queue) that takes logn insertion run time. So, the
overall runtime for Dijkstra is O((n + m)logn).
2-Satis�ability (GR2)
Boolean formula:
CNF
Now, we define CNF (conjunctive normal form):
Clause: OR of several literals (x3 ∨ x̄t ∨ x¯1 ∨ x2) F in CNF: AND of m clauses:
(x2) ∧ (x¯3 ∨ x4) ∧ (x3 ∨ x̄t ∨ x¯1 ∨ x2) ∧ (x¯2 ∨ x¯1)
Notice that for F to be true, that means for each condition we need at least one literal to be true.
SAT
output: assignment (assign T or F to each variable) satisfying if one exists, NO if none exists.
K-SAT
For K sat, the input is formula f in CNF with n variables and m clauses each of size ≤ k. So the above
function f is an example. In general:
• SAT is NP-complete
• K-SAT is NP complete ∀k ≥ 3
• Poly-time algorithm using SCC for 2-SAT
We want to simplify unit-clause which is a clause with 1 literal such as (x¯1). This is because to satisfy
x¯1 there is only one way to set x1 = F .
So, the original f is satisfiable if f ′ is. Notice that there is a unit clause (x4) and we can remove it.
Eventually I am either going to le� with an empty set, or a formula where all clauses are of size 2.
SAT-graph
Take f with all clauses of size = 2, n variables and m clauses, we create a directed graph:
Consider the following example: f = (x¯1 ∨ x¯2) ∧ (x2 ∨ x3) ∧ (x¯3 ∨ x¯1)
_x2 x3
x1 _x1
_x3 x2
If we observe the graph, we notice that there is a path from x1 → x¯1, which is a contradiction. If
x1 = F , then it might be ok? In general, if there are paths such that x1 → x¯1 and x¯1 → x1, then f is
not satisfiable because x¯1, x1 is in the same SCC.
In general:
• If for some i, xi, x̄i are in the same SCC, then f is not satisfiable.
• If for some i, xi, x̄i are in di�erent SCC, then f is satisfiable.
2-SAT Algo
This works because of a key fact: if ∀i, xix̄i are in di�erent SCCʼs, then S is a sink SCC if and only if S¯ is
a source SCC.
Plaintext
2SAT(F):
1. Construct graph G for f
2. Take a sink SCC S
- Set S = T ( and bar(S) = F)
- remove S, bar(S)
- repeat until empty
Proof:
Recall that (γ¯1 ∨ γ2) is represented in the graph as (γ1 → γ2), since if γ1 = T then γ2 must also be T
. (γ¯1 ∨ γ2) is also represented in the graph as (γ¯2 → γ¯1). This shows that γ¯0 ← γ¯1 ← … ← γ̄l
which implies β¯ → ᾱ since γ0 = α, γl = β.
If α, β ∈ S , then ᾱ, β¯ ∈ S¯. This is true because if there are paths in α ↔ β since they are in the same
SCC, this means that there are paths β¯ ↔ ᾱ using the above claim and they belong to SCC.
We take a sink SCC S, for α ∈ S , that means there are no edges from α → β which implies no edges
such that β¯ → ᾱ. In other words, no outgoing edges from α means there is no incoming edges to ᾱ.
This shows that S¯ is a source SCC!
MST (GR3)
For the minimum spanning tree, we are going to go through the krusal algorithm but mainly focus on
the correctness of it. Note that krusal algorithm is a greedy algorithm which makes use of the cut
property. This property is also useful in proving primʼs algorithm.
MST algorithm
Goal: find minimum size, connected subgraph of minimum weight. This connected subgraph is known
as the spanning tree (refer this as T). So we want T ⊂ E, w(t) = ∑e∈T w(e)
Plaintext
Kruskals(G):
input: undirected G = (V,E) with weights w(e)
1. Sort E by increasing weight
2. Set X = {null}
3. For e=(v,w) in E (in increasing order)
if X U e does not have a cycle:
X = X U e (U here denotes union)
4. Return X
Runtime analysis:
1. This was a little confusing to me and why the lecture said O(mlogm) = O(mlogn). It turns
out that the max of m is n2 for a fully connected graph,so
O(mlogm) = O(mlogn2) = O(2mlogn) = O(mlogn).
Cut property
To prove the correctness, we first need to define the cut property:
In other words, the cut of a graph is a set of edges which partition the vertices into two sets. In later
part, we will look at problems such as minimum/maximum cut to partition the graphs into two
components.
• Use induction, and assume that X ⊂ E where X ⊂ T for a MST T . The claim is when we add an
edge from S, S¯, we form another MST T ′.
Proof outline
So, we need to consider two cases, if e∗ ∈ T or e∗ ∉ T .
Prim’s algorithm
MST algorithm is akin tio Dijkstraʼs algorithm, and use the cut property to prove correctness of Primʼs
algorithm.
The primʼs algorithm selects the root vertex in the beginning and then traverses from vertex to vertex
adjacently. On the other hand, Krushalʼs algorithm helps in generating the minimum spanning tree,
initiating from the smallest weighted edge.