0% found this document useful (0 votes)
29 views4 pages

CFG Properties

properties

Uploaded by

2022pgcsca042
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views4 pages

CFG Properties

properties

Uploaded by

2022pgcsca042
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pumping Lemma for Context-Free Language

Let L be any CFL. Then there is a constant n, depending only on L, such that if z is in L and |z| >= n,
then we may write z = uvwxy such that,

(1) |vx| >= 1


(2) |vwx| <= n, and
(3) For all i >= 0, uviwxiy is in L.

Example 1: As an example, let us prove that L = {𝑎𝑖 𝑏 𝑖 𝑐 𝑖 |𝑖 ≥ 1} is not a CFL. Such proves are mostly by
contradiction. Let us assume that L is a CFL, and let n be the constant associated with L, as in pumping
lemma. Consider z = 𝑎𝑛 𝑏 𝑛 𝑐 𝑛 . It’s length is certainly greater than n. So, we must be able to write z =
uvwxy such that the properties of the pumping lemma holds. Now, vx cannot contain instances of both
a and c, as |vwx| <= n. If vx would contain instances of both a and c, vwx should have complete bn as
well, making the length of vwx greater than n. So, vx can have either instances of only a, b or c, or
instances of the pairs (a,b) or (b,c). In the first scenario, pumping both v and x will pump only one
symbol, whereas in second scenario, pumping v and x will pump the symbols in the pair keeping the
count of third symbol unchanged. In either case, the pumped string cannot be in L, a contradiction.
Hence, L is not a CFL.

Example 2: L = {𝑎𝑖 𝑏 𝑗 | 𝑗 = 𝑖 2 }. Let us assume that L is CFL, and let n be the constant of pumping lemma,
2
associated with L. Consider the string z = 𝑎𝑛 𝑏 𝑛 . It’s length is greater than n. So, it is possible to write
z = uvwxy. Now, vx will either both contain a, or both contain b, or contain both a and b. In first
2
scenario let vx = am for any 0 < m <= n, then for any k > 1, 𝑢𝑣 𝑘 𝑤𝑥 𝑘 𝑦 will be 𝑎𝑚𝑘+𝑛−𝑚 𝑏𝑛 (Think here,
with vx having m a’s, rest of the string will have n-m a’s and vx is being pumped k times). For any 0 <
m <= n, 𝑚𝑘 + 𝑛 − 𝑚 ≠ 𝑛 as k > 1. Similarly, if vx covers only b’s, let vx = bm for some 0 < m <= n2.
2
Then 𝑢𝑣 𝑘 𝑤𝑥 𝑘 𝑦 will be 𝑎𝑛 𝑏 𝑚𝑘+ 𝑛 −𝑚 and number of b’s cannot be square of number of a’s. Now, let
vx = 𝑎𝑝 𝑏 𝑞 for some p and q. Then rest of the string has (n-p) a’s and (n2 – q) b’s. Pumping v and x by
2
same amount, say by k > 1, will make 𝑢𝑣 𝑘 𝑤𝑥 𝑘 𝑦 to be 𝑎𝑝𝑘+𝑛−𝑝 𝑏𝑞𝑘+𝑛 −𝑞 . Now, we must have
(𝑝𝑘 + 𝑛 − 𝑝)2 = 𝑝2 (𝑘 − 1)2 + 𝑛2 + 2𝑛𝑝(𝑘 − 1) = 𝑞𝑘 + 𝑛2 − 𝑞; => 𝑝2 (𝑘 − 1) + 2𝑛𝑝 = 𝑞, thus
making the number of q’s to be dependent on k, and will not be in L for other k’s (Think here, suppose
𝑢𝑣 2 𝑤𝑥 2 𝑦 is in L, then by above derivation q = p2 + 2np. Then, 𝑢𝑣 3 𝑤𝑥 3 𝑦 will have 3p+n-p = 2p+n a’s
and 2q+n2 = 2p2 +4np+n2 b’s, and (2p+n)2 = 4p2 +4np+n2 , a contradiction). Hence, 𝑢𝑣 𝑘 𝑤𝑥 𝑘 𝑦 cannot
be in L for all k’s with k > 1, a contradiction. Hence, L is not a CFL.

But there are certain languages, whose non-CFL ness cannot be proved using pumping lemma. That is
where Ogden’s lemma comes in.

Ogden’s Lemma:

Let L be a CFL. Then there is a constant n such that if z is any string in L, then we can mark any n or
more positions and write z = uvwxy, such that

(1) v and x together must have at least one marked position.


(2) vwx can have at most n marked positions, and
(3) For all i >= 0, uviwxiy is in L.
Observe that, Pumping lemma is a special case of Ogden’s lemma, where we mark all positions of z.
Thus, if we can prove certain string as non-CFL using Pumping lemma, we can prove the same using
Ogden’s Lemma, but vice-versa is not true.

As an example, proving that L = {𝑎𝑖 𝑏 𝑗 𝑐 𝑘 |𝑖 ≠ 𝑗, 𝑗 ≠ 𝑘, 𝑖 ≠ 𝑘} is not a CFL, but it cannot be proved


using Pumping lemma, we need Ogden’s lemma for this [Proving non-CFL ness using Ogden’s Lemma
is not in syllabus and is a bit advanced, but if you are interested, let me know.]

Theorem: Context-free languages are closed under union, concatenation and Kleene Closure.

Let G1 = (V1, T1, P1, S1) and G2 = (V2,T2,P2,S2) be two CFGs accepting L1 and L2 respectively. Then,
we can assume 𝑉1 ∩ 𝑉2 = ∅, because if they share some common variable, we can rename the
matching variable names and their associated rules in any of the grammar. Now, the grammar G =
(𝑉1 ∪ 𝑉2 ∪ {𝑆}, 𝑇1 ∪ 𝑇2, 𝑃1 ∪ 𝑃2 ∪ {𝑆 → 𝑆1 | 𝑆2}, 𝑆) where S belongs to neither V1 nor V2, accepts
𝐿1 ∪ 𝐿2. The logic is, if x is some string in L1, it will be generated by G1, and G will first follow S → S1
followed by exact sequence of rules as in G1 for derivation of x. Thus x is in G as well. Similarly, each
string in L2 can also be generated by G. The reverse can be proved similarly as well, i.e., any string
generated by G will either go via S → S1 or S → S2 (there are no other rules for S in G, as S is neither
in V1 nor in V2). In first case, the string will be in L1 and in second case, the string will be in L2.

Similar logic shows that, the grammar G = (𝑉1 ∪ 𝑉2 ∪ {𝑆}, 𝑇1 ∪ 𝑇2, 𝑃1 ∪ 𝑃2 ∪ {𝑆 → 𝑆1𝑆2}, 𝑆) where
S belongs to neither V1 nor V2, accepts 𝐿1𝐿2.

The grammar G = (𝑉1 ∪ {𝑆}, 𝑇1, 𝑃1 ∪ {𝑆 → 𝑆1𝑆 | 𝜖}, 𝑆) where S is not in V1, accepts L1*.

Consider the following two grammars:

S → AB S → CD
A → aAb | ab C → aC | a
B → cB | c D → bDC | bc

The grammar on left accepts L1 = {𝑎𝑖 𝑏 𝑖 𝑐 𝑗 |𝑖 ≥ 1, 𝑗 ≥ 1} and the grammar on the right accepts L2 =
{𝑎𝑖 𝑏 𝑗 𝑐 𝑗 |𝑖 ≥ 1, 𝑗 ≥ 1}. Their union represents L = {𝑎𝑖 𝑏 𝑗 𝑐 𝑘 |𝑖 = 𝑗 𝑜𝑟 𝑗 = 𝑘} is accepted by the following
grammar

S → S1 | S2
S1 → AB
A → aAb | ab
B → cB | c
S2 → CD
C → aC | a
D → bDc | bc

Observe that the intersection of L1 and L2 is L = {𝑎𝑖 𝑏 𝑖 𝑐 𝑖 |𝑖 ≥ 1} which is not CFL (we proved it on the
first page). Thus,

Theorem: Context-free languages are not closed under intersection.


Theorem: If L is a CFL and R is a Regular Set, then 𝑳 ∩ 𝑹 is a CFL.

Let PDA M1 = (𝑄1 , Σ, Γ, 𝛿1 , 𝑞0 , 𝑍0 , 𝐹1 ) accepts L, and the DFA M2 = (𝑄2 , Σ, 𝛿2 , 𝑞1 , 𝐹2 ) accepts R. We


can construct M’ = (𝑄1 × 𝑄2 , Σ, Γ, 𝛿, [𝑞0 , 𝑞1 ], 𝑍0 , 𝐹1 × 𝐹2 ) where for any 𝑝 ∈ 𝑄1 , 𝑞 ∈ 𝑄2 , 𝑎 ∈ Σ, X ∈ Γ,
𝛿([𝑝, 𝑞], 𝑎, 𝑋) contains ([p’, q’], 𝛾) if and only if 𝛿1 (𝑝, 𝑎, 𝑋) contains (p’, 𝛾) and 𝛿2 (𝑞, 𝑎) = 𝑞′. Then,
M’ will accept the intersection of the languages accepted by M1 and M2. Observe that, M’ simulates
both M1 and M2 simultaneously in each move.

Cocke-Younger-Kasami Algorithm (CYK Algorithm)

Given a grammar G=(V,T,P,S) in Chomsky Normal Form, and the string x, this algorithm determines
whether x is in L(G). This is a dynamic programming algorithm, where given a string x, the set Vij
determines the variables which can generate the substring of x of length n, starting at index i and of
length j.

Consider the grammar

S → AB | BC
A → BA | a
B → CC | b
C → AB | a

And let us consider the string x = baaba. Then we will start with 𝑉𝑖1 , for each i, i.e., for a given i,
which variables can generate the i^th symbol of x (substring of x starting at index i, and of length 1).
We can see, V11 = V41 = {B} and V21 = V31 = V51 = {A, C}.
Then, given 𝑉𝑖𝑘 which can generate the substring starting at index i of length k (from i to i+k-1) and
𝑉𝑖+𝑘,𝑗−𝑘 which generate the substring starting at index (i+k) and of length (j-k) (from i+k to i+k+j-k-1
= i+j-1) , concatenating them we will get the strings starting at index i to i+j-1, i.e., of length j.
Thus, to obtain 𝑉𝑖𝑗 , traverse through all possible values of k from 1 to (j-1) and concatenate 𝑉𝑖𝑘 and
𝑉𝑖+𝑘,𝑗−𝑘 .
So, as an example, to determine 𝑉12 , check which symbols generate 𝑉11 and which symbols generate
𝑉21 (there is no other possible value of k than 1, and with i=1,k=1,j=2, we need to concatenate
𝑉11 𝑎𝑛𝑑 𝑉21). So, the possibilities are BA and BC. A generates first one and S generates second one.
So, 𝑉12 = {𝑆, 𝐴}. Similarly, 𝑉22 = { 𝐴 | 𝐴 → 𝑋𝑌, 𝑋 ∈ 𝑉21 𝑎𝑛𝑑 𝑌 ∈ 𝑉31 } = {𝐵} as possibilities were {AA,
AC, CA, CC} and only B generates CC. Similarly, 𝑉32 = {𝑆, 𝐶}, 𝑉42 = {𝑆, 𝐴}.
Now, to determine 𝑉13, we are searching for symbols which can generate baa (substring starting at
index 1 and of length 3). And with i=1,j=3, we can vary k between 1 and 2. With k=1, check
concatenation of 𝑉11 𝑎𝑛𝑑 𝑉22 = {BB} (no symbol generates BB) and with k=2, check concatenation of
𝑉12 𝑎𝑛𝑑 𝑉31 = {𝑆𝐴, 𝑆𝐶, 𝐴𝐴, 𝐴𝐶} and no symbol generates them as well. So, 𝑉13 = ∅. Similarly, we
can find the other sets.

The trick is arrange the sets in a triangular matrix format. j=1 can be obtained trivially.

i=1 i=2 i=3 i=4 i=5


{B} {A,C} {A,C} {B} {A,C}

Then, fillup the next rows of the table as, suppose we need to fillup the i^th column of j^th row.
Then come down from the top of the i^th column from 1st row towards the j^th row. At each step, if
we are coming k steps down, go k steps diagonally up right from the target square, and check their
concatenation. Look into the next example.
{B} {A,C} {A,C} {B} {A,C}
{S,A} {B} {S,C} {S,A}
∅ 𝑉32

Now, we are going to determine 𝑉32 using the trick. Come down first row of that column {A,C} and
go one step diagonally in up and right from target square {S,C}. Their concatenation {AS,AC,CS,CC}
and only B generates CC. Now, come down one step further along the column {B} and go up
diagonally to up and right one more step {B}. No symbol can generate BB. So, eventually, 𝑉32 = {𝐵}.

We can fillup the rest of the table similarly

i=1 i=2 i=3 i=4 i=5


j=1 {B} {A,C} {A,C} {B} {A,C}
j=2 {S,A} {B} {S,C} {S,A}
j=3 ∅ {B} {B}
j=4 ∅ {S,A,C}
j=5 {S,A,C}

Now check if starting symbol is in 𝑉1𝑛 , i.e., if the starting symbol can generate the substring starting
at first index and of length equal to length of the string, i.e., the complete string. If it can, the string
is in the language, otherwise not. In this example, S is in 𝑉15 , and hence baaba is in the language of
the grammar.

begin
for i = 1 to n do
𝑉𝑖1 = {𝐴 |𝐴 → 𝑎 𝑖𝑠 𝑎 𝑟𝑢𝑙𝑒 𝑤ℎ𝑒𝑟𝑒 𝑎 𝑖𝑠 𝑖 𝑡ℎ 𝑠𝑦𝑚𝑏𝑜𝑙 𝑜𝑓 𝑥}
for j=2 to n do
for i=1 to n-j+1 do
begin
𝑉𝑖𝑗 = ∅
for k=1 to j-1 do
𝑉𝑖𝑗 = 𝑉𝑖𝑗 ∪ {𝐴 |𝐴 → 𝐵𝐶 𝑖𝑠 𝑎 𝑟𝑢𝑙𝑒 𝑤ℎ𝑒𝑟𝑒 𝐵 ∈ 𝑉𝑖𝑘 𝑎𝑛𝑑 𝐶 ∈ 𝑉𝑖+𝑘,𝑗−𝑘 }
end
if 𝑆 ∈ 𝑉1𝑛 output “x is member of the language”, otherwise output “x is not a member of the
language”.
end

You might also like