Linear languages

advertisement
Linear languages
A CFG is linear if no right side of a production has more than one instance of a variable.
Thus, all regular grammars are linear grammars. A linear language is a language generated
by a linear grammar. To show where these fit with the other languages we’ve looked at, we'll
prove two things:
1) There is a linear language that is not regular
2) There is a CFL that is not linear.
We can simplify linear grammars by removing -productions, unit productions and useless
symbols in exactly the same way as done for context-free languages.
To show 1, consider the following grammar:
S  aB | 
B  Sb
This grammar generates {anbn} which is not regular.
We'll hold off on the second proof temporarily until after we talk about the pumping lemma for
linear languages.
Pumping lemma: Let L be a linear language. Then there exists an integer m such that if w 
L, |w| ≥ m, then w can be written as w = uvxyz in such a way that |uvyz| ≤ m, |vy| ≥ 1 and for
every i ≥ 0, uvixyiyz  L.
Notice that in this differs from the other pumping lemmas in that there is no bound on the
length of x because the total length of uvyz is at most m.
Now, consider L = {w | na(w) = nb(w)}. Assume L is linear and let m be the integer from the
pumping lemma. Choose w = amb2mam. Clearly, w  L. Then w can be written w = uvxyz
such that |uvyz| ≤ m and |vy| ≥ 1. Notice that due to the way w was chosen we must pump
a's. So, let i = 2. Then, uv2xy2z has the form am+kb2mam+j which is clearly not in L since at
least one of j, k must be greater than or equal to 1 giving us more a’s than b’s.
Closure Properties of Linear Languages
Let L1 and L2 be linear languages generated by grammars G1 and G2, respectively with start
symbols S1 and S2.
closure under union—This is basically the same proof as for CFLs. Let S3 be a new variable
which is the starting symbol of grammar G3 which contains all productions of both G1 and G2
plus the additional production S3  S1 | S2.
closure under reversal--Again this is shown in the same was as for CFLs i.e. reverse the right
hand side of every production in the grammar.
concatenation--Linear languages are not closed under concatenation.
Let L1 = {aibi | i ≥ 0} and L2 = {cjdj | j ≥ 0}. Then L1L2 = { aibicjdj |i, j ≥ 0}. To show this
language is not linear, use the pumping lemma on the string ambmcmdm. We must pump either
a's or d's. Thus, if i = 0 we get w0 = am-ibmcmdm-j, that is, the number of a’s is less than the
number of b’s or the number of d’s is less than the number of c’s since at least one of i, j must
be greater than 0. Since linear languages are not closed under concatenation this implies
they are not closed under Kleene closure either.
Since the context-free languages are closed under concatenation, then L 1L2 above is a CFL.
Thus, we have the example for 2) above—L1L2 is a CFL that is not linear.
Substitution: Linear languages are not closed under substitution.
If we replace an alphabet symbol by a whole language and look at a grammar for the new
language we will have grammar productions of the wrong type i.e. with more than one variable
on the right hand side of the rule. Looking at a specific example, suppose L ={ab}. If we
replace a by anbn and b by cidi, then by the pumping lemma argument above, we get a
language that is not linear.
If we look at a grammar for the new language it must be
something like this: S  AB A  aAb |  and B  cBd | .
Closure under homomorphism: If we replace each alphabet symbol by a string we will get
another linear language. Let L = L(G) where the alphabet  = {a, b}. If, in each production in
G, we replace a and b by strings, then each of the rules in the resulting grammar will still have
only one variable on the right hand side of each production.
Intersection: The linear languages are not closed under intersection.
To see this, consider the languages L1 = {aibjcndn | i, j, n  0} and L2 = {akbkcidj | i, j, k  0}. It
should be clear that L1  L2 = {akbkcndn} which we showed above is not linear.
Before finishing the closure properties, we look PDAs that accept linear languages. A PDA is
said to make a turn if it enters a sequence of Ids (q1, w1, 1) |-- (q2, w2, 2) |-- (q3, w3, 3)
where |2| > |1| and |2| > |3|. That is, when looking at the stack, its height increases to a
certain point and then the height begins to decrease again. A PDA is said to be a k-turn PDA
if for every word w  L(M), w is accepted by a sequence of Ids making no more than k turns.
If L is accepted by a finite turn PDA it is said to be metalinear. A language is linear if and only
if it is accepted by a one-turn PDA.
We now return to other operations and ask whether the linear languages are closed under
those operations. Rather than doing a formal proof we use the automaton characterization as
the justification for the result.
Closure under intersection with a regular set: If we look at the construction used in proving
the CFLs are closed under intersection with a regular set, we basically ran the PDA and the
DFA in parallel since only the PDA required the use of a stack. For a linear language L there
is a one-turn PDA that will recognize it. Thus, using this type of PDA to construct a PDA for
the intersection of L and a regular language R, the resulting machine will also be a one-turn
PDA and thus the intersection is a linear language.
Complement: The linear languages are not closed under complement. To see this observe
this note that the complement of {anbn} contains strings of the type akbkajbj. To recognize a
string like this a two-turn PDA will be needed so that the complement of {anbn} is not linear.
Thus, we know that the set of regular languages is a proper subset of the CFLs, and that the
set of linear languages is a proper subset of the CFLs. A natural question to ask if where do
the DCFLs fit into this Venn diagram below.
Consider two languages we've looked at before:
L1 = {na(w) = nb(w)} is a DCFL but not linear.
L2 = {anbn | n ≥ 0} and L3 = {anb2n | n ≥ 0 are both linear languages and both deterministic.
However, since the linear languages are closed under union and the DCFLs are not, L1  L2 =
{anbk | n = k or n = 2k} is a linear language that is not a DCFL. Thus, neither is a subset of the
other.
Download