CSCI 3313-10: Foundation of Computing: 1.1 Mathematical Notations and Terminologies
CSCI 3313-10: Foundation of Computing: 1.1 Mathematical Notations and Terminologies
1 Overview
Foundation of Computing
- Theory of Computing
• Automata theory
• Computability
- solvable vs. unsolvable problems
• Complexity
- computationally easy vs. hard problems
Chomsky Hierarchy
RL ⊂ CF L ⊂ CSL ⊂ REL
• sets: element, member, subset, proper subset, finite set, infinite set, empty set, union, inter-
section, complement, power set, Cartesian product (cross product)
• functions: mapping, domain, co-domain, range, one-to-one function, onto function, one-to-
one correcspondence
• relation:
- reflexive: xRx
- symmetric: xRy ⇒ yRx
- transitive: xRy ∧ yRz ⇒ xRz
- equivalence relation
1
• graphs:
• strings, languages:
- alphabet: any non-empty finite set
- string over an alphabet: a finite sequence of symbols from the alphabet
- |w|: length of a string w (w = w1 w2 · · · wn , where w ∈ Σ, for an alphabet Σ)
- empty string:
- reverse of w: wR
- substring
- concatenation
• logic
• theorem, proof
- by construction, induction contradiction
2 Regular Languages
1. r0 = q0 ,
3. rn ∈ F .
2
2.2 Designing FSA
union: A ∪ B = {x | x ∈ A or x ∈ B}
Example:
A = {0, 1}, B = {a, b}:
A ∪ B = {0, 1, a, b}
A ◦ B = {0a, 0b, 1a, 1b}
A∗ = {, 0, 1, 00, 01, 10, 11, 000, 001, · · · , 111, 0000, · · · }
Theorem 2.1 The class of regular languages is closed under the union operation, i.e., if A1 and
A2 are regular languages, so is A1 ∪ A2 .
Proof: Let A1 and A2 be regular languages. By definition, A1 and A2 are recognized by FSA
M1 and M2 , resp. Let M1 = (Q1 , Σ1 , δ1 , q1 , F1 ) and M2 = (Q2 , Σ2 , δ2 , q2 , F2 ). We construct
M = (Q, Σ, δ, q0 , F ) from M1 and M2 such that
1. Q = Q1 × Q2 ,
i.e., Q = {(r1 , r2 ) | r1 ∈ Q1 , r2 ∈ Q2 }
2. Σ = Σ1 ∪ Σ2
4. q0 = (q1 , q2 )
5. F = {(r1 , r2 ) | r1 ∈ F1 or r2 ∈ F2 },
i.e., F = (F1 × Q2 ) ∪ (Q1 × F2 ). (Note that F 6= F1 × F2 .)
Example:
Let L1 = {w | w has even number of 1’s} and L2 = {w | w contains 001 as a substring}. Construct
a FSA M for L1 ∪ L2 .
Theorem 2.2 The class of regular languages are closed under intersection operation.
Example:
Let L1 = {w | w has odd number of a’s} and L2 = {w | w has one b}. Construct a FSA M for
L = L1 ∩ L2 , i.e., L = {w | w has odd number of a’s and one b. }
3
2.4 Nondeterminism
2. Σ is an alphabet.
1. Q0 = P (Q).
2. For R ∈ P (Q), let δ 0 (R, a) = {q ∈ Q | q ∈ δ(r, a) for some r ∈ R}
(or, let δ 0 (R, a) = ∪{δ(r, a) | r ∈ R}.)
3. q00 = {q0 }.
4. F 0 = {R ∈ Q0 | R contains an accept state of N }.
(ii) Next, assume that N contains -transitions. For any R ∈ P (Q), let
E(R) = {q | q can be reached from R by traveling along 0 or more arrow. }
Let δ 0 (R, a) = {q ∈ Q | q ∈ E(δ(r, a)) for some r ∈ R }. The rest are same as in case (i)
1. Q = {q0 , q1 }
2. Σ = {0, 1}
4. initial state = q0
5. F = {q1 }
4
A DFA M = (Q0 , Σ, δ 0 , q00 , F 0 ) that is equivalent to N is then constructed as:
1. Q0 = {{q0 }, {q0 , q1 }}
2. Σ = {0, 1}
5. F = {{q0 , q1 }}
1. Q = {q0 , q1 , q2 , q3 , q4 }
2. Σ = {a, b}
3. δ(q0 , ) = {q1 }; δ(q0 , b) = {q2 }; δ(q1 , ) = {q2 , q3 }; δ(q1 , a) = {q0 , q4 }; δ(q2 , b) = {q4 }; δ(q3 , a) =
{q4 }; δ(q4 , ) = {q3 }
4. initial state = q0
5. F = {q4 }
Note that E(q0 ) = {q0 , q1 , q2 , q3 }, E(q1 ) = {q1 , q2 , q3 }, E(q2 ) = {q2 }, E(q3 ) = {q3 }, and E(q4 ) =
{q3 , q4 }. We then construct a DFA M = (Q0 , Σ, δ 0 , q00 , F 0 ) by following the algorithm in (i) as
follows:
2. Σ = {a, b}
4. initial state = p0
5. F = {p1 , p2 , p3 }.
5
2.6 Closure Properties of Regular Languages
Theorem 2.4 Regular languages are closed under the following operations:
(1) union
(2) intersection
(3) concatenation
Note: We can construct an NFA N for each case and find a DFA M equivalent to N .
(2)
(3) ∅
- () may be omitted.
- R+ = RR∗ or R∗ R
- R+ ∪ = R∗
- L(R)
6
2.8 Equivalence of Regular Expression and DFA
Theorem 2.5 A language is regular if and only if some regular expression can describe it.
(3) R = ∅ ⇒ L(R) = ∅
(4) R = R1 ∪ R2 ⇒
(5) R = R1 ◦ R2 ⇒
(6) R = R1∗ ⇒
Since L is a regular language, there must be a DFA that recognizes L. We then apply the following
result.
7
Lemma: Let M = (Q, Σ, δ, q0 , F ) be a DFA. Then there exists a regular expression E
such that L(E) = L(M ), where L(E) denotes the language represented by E.
Proof: Let Q = {q1 , · · · , qm } such that q1 is the start state of M . For 1 ≤ i, j ≤ m and 1 ≤ k ≤
m+1, we let R(i, j, k) denote the set of all strings in Σ∗ that derive M from qi to qj without passing
through any state numbered k or greater.
The crucial point is that each set R(i, j, k) is regular, and hence so is L(M ). The proof is by
induction on k. For k = 1, we have the following.
{a ∈ Σ | δ(qi , a) = qj } if i 6= j
R(i, j, 1) =
{} ∪ {a ∈ Σ | δ(qi , a) = qj } if i = j
Each of these sets is finite, and therefore regular. For k = 1, · · · , m, provided that all the sets
R(i, j, k) have been defined, each set R(i, j, k + 1) can be defined in terms of previously defined
languages as
R(i, j, k + 1) = R(i, j, k) ∪ R(i, k, k)R(k, k, k)∗ R(k, j, k).
This equation states that to get from qi to qj without passing through a state numbered greater
than k, M may either
(ii) go from qi to qk ; then from qk to qk repeatedly; and then from qk to qj , in each case without
passing through a state numbered greater than k − 1.
Therefore, if each language R(i, j, k) is regular, so is each language R(i, j, k + 1). This completes
the induction.
8
2.9 Non-regular Languages (Pumping Lemma)
Review ...
• Give a regular expression for the set L2 of non-negative integers that are divisible by 2.
Then, L2 = L1 ∩ Σ∗ ◦ {0, 2, 4, 6, 8}
• Give a regular expression for the set L3 of integers that are divisible by 3.
Then, L3 = L1 ∩ L(M ), where
M is defined as:
• Let Σ = {a, b}, and L4 ⊆ Σ∗ be the set of odd length, containing an even # of a’s.
Then, L4 = L5 ∩ L6 , where L5 is the set of all strings of odd length, i.e., L5 = Σ(ΣΣ)∗ , and
L6 is the set of all strings with an even # of a’s, i.e., L6 = b∗ (ab∗ ab∗ )∗ .
• A1 = {0n 1n | n ≥ 1}
(ii) |xy| ≤ p
9
• {an bam ban+m | n, m ≥ 1}
• {ww | w ∈ {a, b}∗ where w stands for w with each occurrence of a replaced by b, and vice
versa.}
• L = {am bn | m 6= n}
(d) L = {w | w = wR } is regular.
4. L = {an! | n ≥ 1}
5. L = {am bn | m > n}
6. L = {am bn | m < n}
11. L = {w ∈ {a, b}∗ | na (w) and nb (w) both are prime numbers}
10
2.9.3 additional properties of regular languages
• There exists an algorithm to determine whether a regular language is empty, finite, or infinite.
• membership
11
3 Context Free Languages and Context Free Grammars
4. S ∈ V is a start symbol.
12
Examples of Context-Free Grammars
G0 : E → E + E | E ∗ E | id
G1 : E → T E0
E 0 → +T E 0 |
T → FT0
T 0 → ∗F T 0 |
F → (E) | id
G2 : E →E+T | T
T →T ∗F | F
F → (E) | id
G3 : E0 → E
E → E + T |T
T →T ∗F | F
F → (E) | id
G4 : S0 → S
S→L=R
S→R
L → ∗R
L → id
R→L
G5 : S0 → S
S → aAd | bBd | aBe | bAe
A→c
B→c
13
3.1 Context Free Grammar
1. L = {an bn | n ≥ 0}
S → aSb |
2. L = {am bn | m > n}
S → AC
C → aCb |
A → aA | a
3. L = {am bn | m < n}
S → CB
C → aCb |
B → bB | b
4. L = {am bn | m 6= n}
S → AC | CB
C → aCb |
A → aA | a
B → bB | b
S → SS | aSb | bSa |
S0 → AS | SAS | SA
S → SS | SAS | aSb | bSa |
A → aA | a
Proof: Note that any string generated by the above rules has more a’s than b’s. We next
proceed to show that any string w ∈ L can be generated by these rules. We first note that
any string z such that na (z) = nb (z) must be split into substrings such that z = z1 z2 · · · zl
14
where (i) each zj has equal number of a’s and b’s, (ii) the first and the last symbols of zj are
different, and (iii) any such zj does not contain a substring that has the same number of a’s
and b’s but the first and the last symbols are same. For example, aabbab cannot be such a zj
since it contains abba, but aababb can be such a zj . It is then noted that for any w ∈ L, w
can be denoted as:
w = al0 z1 al1 z2 al2 · · · zk alk ,
where (1) each zi satisfies the above three conditions (i) - (iii); (2) for each i, 0 ≤ i ≤ k,
li ≥ 0; and (3) l0 + l1 + · · · + lk > 0. For example, w = aaababbaaaabbaaa may be decomposed
into w = aa · ab · ab · ba · a · aabb · aaa, where l0 = 2, z1 = ab, l1 = 0, z2 = ab, l2 = 0, z3 = ba,
l3 = 1, z4 = aabb, and l4 = 3.
From the start state S0 , one of the following three cases occurs: If l0 > 0, S0 ⇒ AS; else if
lk > 0, S0 ⇒ SA; otherwise, S0 ⇒ SAS. We then recursively apply S → SS or S → SAS
such that a single S generates a substring zj satisfying conditions (i)-(iii) above.
Consider the example above: w = aaababbaaaabbaaa. w is then split into a2 z1 z2 z3 a1 z4 a3 ,
and is generated as follows.
S0 → T S | ST S | ST
S → SS | ST S | aSc | cSa | bSc | cSb |
T → aT | bT | a | b
15
11. L = {w ∈ {a, b, c}∗ | na (w) + nb (w) > 2nc (w)}.
S0 → T S | ST S | ST
S → SS | ST S |
S → SDDC | DSDC | DDSC | DDCS
SDCD | DSCD | DCSD | DCDS
SCDD | CSDD | CDSD | CDDS
D → a|b
C → c
T → aT | bT | a | b
S0 → T S | ST S | ST
S → SS | ST S |
S → SDDC | DSDC | DDSC | DDCS
SDCD | DSCD | DCSD | DCDS
SCDD | CSDD | CDSD | CDDS
D → a|b
C → c
T → cT | c
A → BC
A → a
where a is any terminal and A, B, and C are any non-terminal (i.e., variable) except that B and C
may not be the start symbol. In addition, we permit the rule S → , where S is the start symbol.
Theorem 2.9 (pp. 107). Any context-free languages is generated by a context-free grammar in
Chomsky normal form.
16
3.3 CYK Membership Algorithm for Context-Free Grammars
Clearly, w ∈ L(G) if and only if S ∈ V1n . To compute Vij , we observe that A ∈ Vii if and only
if R contains a production A → ai . Therefore, Vii can be computed for all 1 ≤ i ≤ n by inspection
of w and the production rules of G. To continue, notice that for j > i, A derives wij if and only if
∗ ∗
there is a production A → BC with B ⇒ wik and C ⇒ wk+1j for some k with i ≤ k < j. In other
words,
Vij = ∪k∈{i,i+1,··· ,j−1} {A | A → BC, with B ∈ Vik , C ∈ Vk+1j }.
The above equation can be used to compute all the Vij if we proceed in the following sequence:
and so on.
17
Example: Consider a string w = aabbb and a CFG G with the following production rules:
S → AB
A → BB | a
B → AB | b
j
i 1 2 3 4 5
1 A ∅ S, B A S, B
2 A S, B A B, S
3 B A S, B
4 B A
5 B
18
3.4 Pushdown Automata
Note: An input is accepted only if (i) input is all read and (ii) the stack is empty.
• L = {0n 1n | n ≥ 0}
• L = {ai bj ck | i = j or i = k, where i, j, k ≥ 0}
• L = {an b2n | n ≥ 0}
• L = {an bm cn+m | n, m ≥ 0}
• L = {an bm | n ≤ m ≤ 3n}
Let L be a CFL. Then, there exists a number p, called the pumping length, where for any string
w ∈ L with |w| ≥ p, w may be divided into five substrings w = uvxyz such that
1) |vy| > 0
2) |vxy| ≤ p, and
3) for each i ≥ 0, uv i xy i z ∈ L.
19
3.6.1 Non-Context Free Languages
L = {an bn cn | n ≥ 0}
Let w = ap bp cp and apply the Pumping lemma.
L = {ww | w ∈ 0, 1∗ }
(Try with w = 0p 10p 1. Pumping lemma is not working!)
Let w = 0p 1p 0p 1p and apply Pumping lemma.
L = {ai bj ck | 0 ≤ i ≤ j ≤ k ≤ n}
Let w = ap bp cp and apply Pumping lemma.
L = {an! | n ≥ 0}
(Recall: L is not regular.)
Let w = ap! and apply Pumping lemma.
L = {an bj |n = j 2 }.
2
Let w = ap bp and apply Pumping lemma. We then have w = uvxyz and three cases to
consider.
20
• CFL’s are closed under the kleene star operation.
21
3.8 Top-Down Parsing
A grammar is called ambiguous if there is some sentence in its language for which there us more
than one parse tree.
Example: E → E + E | E ∗ E | id;
w = id + id ∗ id.
In general, we may not be able to determine which tree to use. In fact, determining whether a
given arbitrary CFG is ambiguous or not is undecidable.
Solution:
(b) Use disambiguating rule with the ambiguous grammar to specify, for ambiguous cases, which
parse tree to use.
For an input “if E1 then if E2 then S1 else S2 ,” two parse trees can be constructed; hence, G1 is
ambiguous. An unambiguous grammar G2 which is equivalent to G1 can be constructed as follows:
22
3.8.2 Left-factoring and Removing left recursions
G1 : S → ee | bAc | bAe
A → d | eA
Since the initial b is in two production rules, S → bAc and S → bAe, the parser cannot make a
correct decision without backtracking. This problem may be solved to redesign the grammar as
shown in G2 .
G2 : S → ee | bAQ
Q→c|e
A → d | eA
In G2 , we have factored out the common prefix bA and used another non-terminal symbol Q to
permit the choice between the final c and a. Such a transformation is called as left factorization or
left factoring.
Now, consider the following grammar G3 and consider a token string w = id + id + id.
G3 : E →E+T | T
T →T ∗F | F
F → id | (E)
A top-down parser for this grammar will start by expanding E with the production E → E + T .
It will then expand E in the same way. In the next step, the parser should expand E by E → T
instead of E → E + T . But there is no way for the parser to know which choice it should make. In
general, there is no solution to this problem as long as the grammar has productions of the form
A → Aα, called left-recursive productions. The solution to this problem is to rewrite the grammar
in such a way to eliminate the left recursions. There are two types of left recursions: immediate
left recursions, where the productions are of the form A → Aα, and non-immediate left recursions,
where the productions are of the form A → Bα; B → Aβ. In the latter case, A will use Bα, and
B will use Aβ, resulting in the same problem as the immediate left recursions have.
We now have the following formal definition: “A grammar is left-recursive if it has a nonterminal
+
A such that there is a derivation A ⇒ Aα for some string α.”
23
Input: A → Aα1 |Aα2 | · · · | Aαm | β1 | β2 | · · · | βn
Output: A → β1 A0 |β2 A0 | · · · | βn A0
A0 → α1 A0 | α2 A0 | · · · | αm A0 |
Consider the above example G3 in which two productions have left recursions. Applying the
above algorithm to remove immediate left recursions, we have
(i) E → E + T | T
⇒ E → T E0
E 0 → +T E 0 |
(ii) T → T ∗ F | F
⇒ T → FT0
T 0 → ∗F T 0 |
G4 : E → T E0
E 0 → +T E 0 |
T → FT0
T 0 → ∗F T 0 |
F → (E) | id
The following is an algorithm for eliminating all left recursions including non-immediate left
recursions.
24
Algorithm: Eliminating left recursioin.
Input: Grammar G with no cycles or -productions.
Output: An equivalent grammar with no left recursion.
Examples
G: S → Ba | b
B → Bc | Sd | e
G: A1 → A2 a | b
A2 → A2 c | A1 d | e
(i) i=1:
A1 → A2 a | b, OK
(ii) i=2:
A2 → A1 d is replace by A2 → A2 ad | bd
Now, G becomes
G: A1 → A2 a | b
A2 → A2 c | A2 ad | bd | e
25
By eliminating immediate recursions in A2 -productions, we have
A2 → bdA3 | eA3
A3 → cA3 | adA3 |
Therefore, we have
S → Ba | b
B → bdD | eD
D → cD | adD |
∗
Consider every string derivable from some sentential form α by a leftmost derivation. If α =⇒ β,
where β begins with some terminal, then that terminal is in F IRST (α).
26
Algorithm: Computing F IRST (A).
Now, we define F OLLOW (A) as the set of terminals that cam come right after A in any
sentential form of L(G). If A comes at the end, then F OLLOW (A) includes the end marker $.
1. $ is in F OLLOW (S).
2. if A → αBβ, then F IRST (β) − {} ⊆ F OLLOW (B).
∗
3. if A → αB or A → αBβ where ∈ F IRST (β) (i.e., β ⇒ ),
F OLLOW (A) ⊆ F OLLOW (B)
end.
Note: In Step 3, F OLLOW (B) 6⊆ F OLLOW (A). To prove this, consider the following example:
S → Ab | Bc; A → aB; B → c. Clearly, c ∈ F OLLOW (B) but c 6∈ F OLLOW (A).
EXAMPLE:
G4 : E → T E0
E 0 → +T E 0 |
T → FT0
T 0 → ∗F T 0 |
F → (E) | id
27
F IRST (T 0 ) = {∗, }.
F OLLOW (E) = F OLLOW (E 0 ) = {), $}.
F OLLOW (T ) = F OLLOW (T 0 ) = {+, ), $}.
F OLLOW (F ) = {+, ∗, ), $}.
EXAMPLE:
G4 : E → T E0
E 0 → +T E 0 |
T → FT0
T 0 → ∗F T 0 |
F → (E) | id
Input symbol
id + * ( ) $
E E → T E0 E → T E0
E0 E 0 → +T E 0 E0 → E0 →
T T → FT0 T → FT0
T0 T0 → T0 → ∗F T 0 T0 → T0 →
F F → id F → (E)
Stack Operation
28
Stack Input Action
$E id + id ∗ id$ E → T E0
$E 0 T id + id ∗ id$ T → FT0
$E 0 T 0 F id + id ∗ id$ F → id
$E 0 T 0 id id + id ∗ id$ match
$E 0 T 0 +id ∗ id$ T0 →
$E 0 +id ∗ id$ E 0 → +T E 0
$E 0 T + +id ∗ id$ match
$E 0 T id ∗ id$ T → FT0
$E 0 T 0 F id ∗ id$ F → id
$E 0 T 0 id id ∗ id$ match
$E 0 T 0 ∗id$ T 0 → ∗F T 0
$E 0 T 0 F ∗ ∗id$ match
$E 0 T 0 F id$ F →
$E 0 T 0 id id$ match
$E 0 T 0 $ T0 →
$E 0 $ E0 →
$E $ accept
Properties:
2. A grammar G is LL(1) if and only if whenever A → α | β are two distinct productions, the
following conditions hold:
∗ ∗
2.1 For any terminal a, there exist no derivations that α ⇒ aα0 and β ⇒ aβ 0 .
∗ ∗ ∗
Proof of Condition 2.2: Suppose α ⇒ and β ⇒ . Consider S ⇒ γ1 Aγ2 . Then, two possibilities
∗ ∗ ∗ ∗ ∗ ∗
exist: S ⇒ γ1 Aγ2 ⇒ γ1 αγ2 ⇒ γ1 γ2 and S ⇒ γ1 Aγ2 ⇒ γ1 βγ2 ⇒ γ1 γ2 . G must be then ambiguous.
∗ ∗
Proof of Condition 2.3: Suppose β ⇒ and α ⇒ aα0 , where a ∈ F OLLOW (A). Also, assume
∗ ∗ ∗ ∗
that γ2 ⇒ aγ20 . We then have two possibilities: (i) S ⇒ γ1 Aγ2 ⇒ γ1 αγ2 ⇒ γ1 aα0 γ2 , and (ii)
∗ ∗ ∗ ∗
S ⇒ γ1 Aγ2 ⇒ γ1 βγ2 ⇒ γ1 γ2 ⇒ γ1 aγ20 . Hence, after taking care of the input tokens corresponding
to γ1 , the parser cannot make a clear choice between the two productions A → α and A → β.
29
3.9 Bottom-Up Parsing
Computation of Closure
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I
by the two rules.
function closure(I):
begin
J = I;
repeat
for each item A → α · Bβ in J and each production
B → γ of G such that B → ·γ is not in J do
add B → ·γ to J
until no more items can be added to J
return
end
We are now ready to give the algorithm to construct C, the canonical collection of stes of LR(0)
items for an augmenting grammar G0 .
procedure items(G0 ):
begin
C = {closure({[S 0 → ·S]})};
repeat
for each set of items I in C and each grammar symbol X
such that goto(I, X) is not empty and not in C do
add goto(I, X) to C
until no more sets of items can be added to C
end
30
Algorithm: Constructing an SLR parsing table.
Input: An augmenting grammar G0 .
Output: The SLR parsing table functions action and goto for G0 .
Example
(0) E0 → E
(1) E →E+T
(2) E→T
(3) T →T ∗F
(4) T →F
(5) F → (E)
(6) F → id
31
The canonical LR(0) collection for G is:
I0 : E 0 → ·E I5 : F → id·
E → ·E + T
E → ·T I6 : E → E + ·T
T → ·T ∗ F T → ·T ∗ F
T → ·F T → ·F
F → ·(E) F → ·(E)
F → ·id F → ·id
I1 : E 0 → E· I7 : T → T ∗ ·F
E → E · +T F → ·(E)
F → ·id
I2 : E → T· I8 : F → (E·)
T → T · ∗F E → E · +T
I3 : T → F· I9 : E → E + T·
T → T · ∗F
I4 : F → (·E) I10 T → T ∗ F·
E → ·E + T
E → ·T I11: F → (E)·
T → ·T ∗ F
T → ·F
F → ·(E)
F → ·id
I1 : goto(I1 , +) = I6 ;
I2 : goto(I2 , ∗) = I7 ;
I9 : goto(I9 , ∗) = I7 ;
32
The FOLLOW set is: F OLLOW (E 0 ) = {$}; F OLLOW (E) = {+, ), $}; F OLLOW (T ) = F OLLOW (F ) =
{+, ), $, ∗}.
33
3.9.2 Canonical LR(1) Parser
S0 → S
S→L=R
S→R
L → ∗R
L → id
R→L
I0 : S 0 → ·S I5 : L → id·
S → ·L = R
S → ·R I6 : S → L = ·R
L→·∗R R → ·L
L → ·id L→·∗R
R → ·L L → ·id
I1 : S 0 → S· I7 : L → ∗R·
I2 : S → L· = R I8 : R → L·
R → L·
I3 : S → R· I9 : S → L = R·
I4 : L → ∗ · R
R → ·L
L→·∗L
L → ·id
Note that =∈ F OLLOW (R) since S ⇒ L = R ⇒ ∗R = R. Consider the state I2 and the input
symbol is “=.” From [R → L·], the parser will reduce by R → L since =∈ F OLLOW (R). But due
to [S → L· = R], it will try to shift the input as well, a conflict. Therefore, this grammar G cannot
be handled by the SLR(0) parser. In fact, G can be parsed using the canonical-LR(1) parser that
will be discussed next.
34
Construction of LR(1) Items
function closure(I):
begin
repeat
for each item [A → α · Bβ, a] in I,
each production B → γ in G0 ,
and each terminal b in F IRST (βa)
such that [B → ·γ, b] is not in I do
add [B → ·γ, b] to I;
until no more items can be added to I
return I
end
procedure items(G0 ):
begin
C = {closure({[S 0 → ·S, $]})};
repeat
for each set of items I in C and each grammar symbol X
such that goto(I, X) is not empty and not in C do
add goto(I, X) to C
until no more sets of items can be added to C
end
35
Construction of canonical-LR(1) parser
36
Example 1: Consider the following grammar G0 .
(0) S0 → S
(1) S→L=R
(2) S→R
(3) L → ∗R
(4) L → id
(5) R→L
I0 : S 0 → ·S, $
S → ·L = R, $
S → ·R, $
L → · ∗ R, =
L → ·id, =
R → ·L, $
L → · ∗ R, $
L → ·id, $
I1 : S 0 → S·, $
I2 : S → L· = R, $
R → L·, $
I3 : S → R·, $
I4 : L → ∗ · R, =
L → ∗ · R, $
R → ·L, = /$
L → · ∗ R, = /$
L → ·id, = /$
I5 : L → id·, = /$
I6 : S → L = ·R, $
R → ·L, $
L → · ∗ R, $
L → ·id, $
I7 : L → ∗R·, = /$
37
I8 : R → L·, = /$
I9 : S → L = R·, $
I10 : R → L·, $
I11 : L → ∗ · R, $
R → ·L, $
L → · ∗ R, $
L → ·id, $
I12 : L → id·, $
I13 : L → ∗R·, $
Example 2:
(0) S0 → S
(1) S → CC
(2) C → cC
(3) C→d
I0 : S 0 → ·S, $
S → ·CC, $
C → ·cC, c/d
C → ·d, c/d
I1 : S 0 → S·, $
I2 : S → C · C, $
C → ·cC, $
C → ·d, $
I3 : C → c · C, c/d
C → ·cC, c/d
C → ·d, c/d
38
I4 : C → d·, c/d
I5 : S → CC·, $
I6 : C → c · C, $
C → ·cC, $
C → ·d, $
I7 : C → d·, $
I8 : C → cC·, c/d
I9 : C → cC·, $
39
The transition for viable prefixes is:
I6 : goto(I6 , C) = I9 ;
Suppose we have an LR(1) grammar, that is, one whose sets of LR(1) items produce no parsing
action conflicts. If we replace all states having the same core with their union, it is possible that
the resulting union wil have a conflict, but it is unlikely for the following reasons.
Suppose in the union there is a conflict on lookahead a because there is an item [B → β · aγ, b]
calling for a reduction by A → α, and there is another item [B → β · aγ, b] calling for a shift. Then,
some set of items from which the union was formed has item [A → α·, a], and since the cores of
40
all these states are the same, it must have an item [B → β · aγ, c] for some c. But then this state
has the same shift/reduce conflict on a, and the grammar was not LR(1) as we assumed. Thus,
the merging of states with common cores can never produce a shift/reduce conflict that was not
present in one of the original states, because shift actions depend only on core, not the lookahead.
It is possible, however, that a merger will produce a reduce/reduce conflict as the following
example shows.
Example:
S0 → S
S → aAd | bBd | aBe | bAe
A→c
B→c
which generates the four strings acd, ace, bcd, bce. This grammar can be checked to be LR(1) by
constructing the sets of items. Upon doing so, we find the set of items {[A → c·, d], [B → c·, e]}
valid for viable prefix ac and {[A → c·, e], [B → c·, d]} valid for bc. Neither of these sets generates
a conflict, and their cores are the same. However, their union, which is
A → c·, d/e
B → c·, d/e
generates a reduce/reduce conflict, since reduction by both A → c and B → c are called for on
input d and e.
41
4 Turing Machine
Σ⊆Γ
δ : Q × Γ → Q × Γ × {L, R}
qaccept 6= qreject
2. L = {an bm | n, m ≥ 1, n 6= m}
3. L = {an bn cn | n ≥ 1}
7. L = {ai bj ck | i · j = k}
• D is not Turing-decidable.
• D is Turing-recognizable.
Church’s Thesis: Turing machine is equivalent in computing power to the digital computers.
3. AREX = {< R, w > | R is a regular expression that generates w. } (Theorem 4.3, TEXT)
42
4. EDF A = {< A > | A is a DFA such that L(A) = ∅. } (Theorem 4.4, TEXT)
5. EQDF A = {< A, B > | A and B are DFAs and L(A) = L(B). } (Theorem 4.5, TEXT)
8. EQCF G = {< G, H > | G and H are CFGs and L(G) = L(H). } (Not decidable)
Definition: A set A is countable if and only if either A is finite or A has the same size of N . That
is, there exists a bijection f such that f : N → A.
3. The set of all strings over Σ is countable. (Proof: Corollary 4.18, TEXT)
5. The set of all binary sequences of infinite length is uncountable. (Proof: Corollary 4.18, TEXT)
6. The set of all languages over Σ is uncountable. (Proof: Corollary 4.18, TEXT)
Theorem 4.1 There exists a language that is not Turing-recognizable. (Corollary 4.18, TEXT)
5.1 AT M
Proof: Suppose AT M is decidable, and let H be a decider (i.e, H is a TM that decides AT M .) Thus,
accept if M accepts w
H(< M, w >) =
reject if M does not accept w
43
Now, we construct a new TM D with H as a subroutine:
Given a TM M , D take < M > as an input, and (1) run H on input < M, < M >>, (2) output
the opposite of what H outputs, i.e., if H accepts, then “reject” and if H rejects, then “accept.”
In summary,
accept if M does not accepts < M >
D(< M >) =
reject if M accepts < M >
What happens when we run D with its own description < D > as input? In that case, we get
accept if D does not accepts < D >
D(< D >) =
reject if D accepts < D >
That is, no matter what D does, it is forced to do the opposite, a contradiction. Thus, neither TM
D nor TM H can exist. Therefore, AT M is not Turing-decidable.
However, AT M is Turing-recognizable.
Theorem 5.2 A language is Turing-decidable if and only if it is Turing-recognizable and also co-
Turing-recognizable.
2. If R reject, reject
44
Theorem 5.4 (Theorem 5.2, TEXT) ET M is Turing undecidable.
Proof: Suppose ET M is decidable. Let R be a decider. We then construct two TMs M1 and S that
takes < M, w >, an input to AT M and run as follows.
1. If x 6= w, reject.
6 NP-Completeness
Let A and B be two decision problems. We say problem A is transformed to B using a transfor-
mation algorithm f that takes IA (an arbitrary input to A) and computes f (IA ) (an input to B)
such that problem A with input IA is Y ES if and only if problem B with input f (IA ) is Y ES.
EXAMPLES:
45
• 3COLORABILITY to 4COLORABILITY
• SAT to 3SAT
• ···
Suppose A is a new problem for which we are interested in computing an upper bound, i.e., finding
an algorithm to solve A. Assume we have an algorithm ALGOB to solve B in O(nB ) time where
nB is the size of an input to B. We can then solve A using the following steps: (i) for an arbitrary
instance IA to A, transform IA to f (IA ) where f (IA ) is an instance to B; (ii) solve f (IA ) to B using
ALGOB ; (iii) if ALGOB taking f (IA ) as an input reports YES, we report IA is YES; otherwise,
NO.
Example:
U = {x1 , x2 , x3 , x4 }
C = {{x1 , x2 , x3 }, {x1 , x3 , x4 }, {x2 , x3 , x4 }, {x1 , x2 , x4 }}.
The input to SAT is also given as a well-formed formula in conjunctive normal form (i.e.,
sum-of-product form:
46
Let x1 = T , x2 = F , x3 = F , x4 = T . Then, w = T .
Ans: yes
• 3SAT
• Not-All-Equal 3SAT: Each clause has at least one true literal and one false literal, i,e, not
all three literals can be true.
• One-In-Three 3SAT: Each clause has exactly one true literal and two false literals.
47
Definition:
Cook’s Theorem: Every problem in NP can be transformed to the Satisfiability problem deter-
ministicall in polynomial time.
Note:
(ii) To prove a new problem, say B, being NPC, we need to show (1) B is in NP and (2) any
known NPC problem, say B 0 , can be transformed to B deterministically in polynomial time.
(By definition of B 0 ∈ N P C, every problem in NP can be transformed to B in polynomial
time. As polynomial time transformation is transitive, it implies that every problem in NP
can be transformed to B in polynomial time.)
Proof: If P = N P , it is clear that every problem in N P C belongs to P . Now assume that there is
a problem B ∈ N P C that can be solved in polynomial time deterministically. Then by definition
of B ∈ N P C, any problem in N P can be transformed to B in polynomial time derterministically,
which can then be solved in polynomial time deterministically using the algorithm for B. Hence,
N P ⊆ P . Since P ⊆ N P , we conclude that P = N P , which completes the proof of the theorem.
48
Problem Transformations:
Objective: to find a subset S ⊆ V such that (i) for each (u, v) ∈ E, either u or v (or both) is in S,
and (ii) |S| ≤ k.
Given: a graph G
Objective: to find a simple cycle of G that goes through every vertex exactly once.
Given: a graph G
Objective: to find a simple path of G that goes through every vertex exactly once.
Objective: to decide if there exists a proper coloring of V (i.e., a coloring of vertices in V such that
no two adjacent vertices receive the same color) using k colors.
For example, let W = (x1 + x2 + x3 )(x1 + x2 + x3 )(x1 + x2 + x3 ). Then G is defined such that
V (G) = {x1 , x1 , x2 , x2 , x3 , x3 , p1 , q1 , r1 , p2 , q2 , r2 , p3 , q3 , r3 } and E(G) = {(x1 , x1 ), (x2 , x2 ), (x3 , x3 ),
(p1 , q1 ), (q1 , r1 ), (r1 , q1 ), (p2 , q2 ), (q2 , r2 ), (r2 , p2 ), (p3 , q3 ), (q3 , r3 ), (r3 , p3 ), (p1 , x1 ), (q1 , x2 ), (r1 , x3 ),
(p2 , x1 ), (q2 , x2 ), (r3 , x3 ), (p3 , x1 ), (q3 , x2 ), (r3 , x3 )}.
49
We now claim that there exists a truth assignment to make W = T if and only if G has a node
cover of size k = n + 2m.
To prove this claim, suppose there exists a truth assignment. We then construct a node cover S
such that xi ∈ S if xi = T and xi ∈ S if xi = F . Since at least one literal in each clause Cj must be
true, we include the other two nodes in each triangle (i.e., pj , qj , rj ) in S. Conversely, assume that
there exists a node cover of size n + 2m. We then note that exactly one of xi , xi for each 1 ≤ i ≤ n
must be in S, and exactly two nodes in pj , qj , rj for each 1 ≤ j ≤ m must be in S. It is then easy
to see the S must be such that at least one node in each pj , qj , rj for 1 ≤ j ≤ m must be connected
to a node xi or xi for 1 ≤ i ≤ n. Hence we can find a truth assignment to W by assigning xi true
if xi ∈ S and false xi ∈ S.
50