Linear languages A CFG is linear if no right side of a production has more than one instance of a variable. Thus, all regular grammars are linear grammars. A linear language is a language generated by a linear grammar. To show where these fit with the other languages we’ve looked at, we'll prove two things: 1) There is a linear language that is not regular 2) There is a CFL that is not linear. We can simplify linear grammars by removing -productions, unit productions and useless symbols in exactly the same way as done for context-free languages. To show 1, consider the following grammar: S aB | B Sb This grammar generates {anbn} which is not regular. We'll hold off on the second proof temporarily until after we talk about the pumping lemma for linear languages. Pumping lemma: Let L be a linear language. Then there exists an integer m such that if w L, |w| ≥ m, then w can be written as w = uvxyz in such a way that |uvyz| ≤ m, |vy| ≥ 1 and for every i ≥ 0, uvixyiyz L. Notice that in this differs from the other pumping lemmas in that there is no bound on the length of x because the total length of uvyz is at most m. Now, consider L = {w | na(w) = nb(w)}. Assume L is linear and let m be the integer from the pumping lemma. Choose w = amb2mam. Clearly, w L. Then w can be written w = uvxyz such that |uvyz| ≤ m and |vy| ≥ 1. Notice that due to the way w was chosen we must pump a's. So, let i = 2. Then, uv2xy2z has the form am+kb2mam+j which is clearly not in L since at least one of j, k must be greater than or equal to 1 giving us more a’s than b’s. Closure Properties of Linear Languages Let L1 and L2 be linear languages generated by grammars G1 and G2, respectively with start symbols S1 and S2. closure under union—This is basically the same proof as for CFLs. Let S3 be a new variable which is the starting symbol of grammar G3 which contains all productions of both G1 and G2 plus the additional production S3 S1 | S2. closure under reversal--Again this is shown in the same was as for CFLs i.e. reverse the right hand side of every production in the grammar. concatenation--Linear languages are not closed under concatenation. Let L1 = {aibi | i ≥ 0} and L2 = {cjdj | j ≥ 0}. Then L1L2 = { aibicjdj |i, j ≥ 0}. To show this language is not linear, use the pumping lemma on the string ambmcmdm. We must pump either a's or d's. Thus, if i = 0 we get w0 = am-ibmcmdm-j, that is, the number of a’s is less than the number of b’s or the number of d’s is less than the number of c’s since at least one of i, j must be greater than 0. Since linear languages are not closed under concatenation this implies they are not closed under Kleene closure either. Since the context-free languages are closed under concatenation, then L 1L2 above is a CFL. Thus, we have the example for 2) above—L1L2 is a CFL that is not linear. Substitution: Linear languages are not closed under substitution. If we replace an alphabet symbol by a whole language and look at a grammar for the new language we will have grammar productions of the wrong type i.e. with more than one variable on the right hand side of the rule. Looking at a specific example, suppose L ={ab}. If we replace a by anbn and b by cidi, then by the pumping lemma argument above, we get a language that is not linear. If we look at a grammar for the new language it must be something like this: S AB A aAb | and B cBd | . Closure under homomorphism: If we replace each alphabet symbol by a string we will get another linear language. Let L = L(G) where the alphabet = {a, b}. If, in each production in G, we replace a and b by strings, then each of the rules in the resulting grammar will still have only one variable on the right hand side of each production. Intersection: The linear languages are not closed under intersection. To see this, consider the languages L1 = {aibjcndn | i, j, n 0} and L2 = {akbkcidj | i, j, k 0}. It should be clear that L1 L2 = {akbkcndn} which we showed above is not linear. Before finishing the closure properties, we look PDAs that accept linear languages. A PDA is said to make a turn if it enters a sequence of Ids (q1, w1, 1) |-- (q2, w2, 2) |-- (q3, w3, 3) where |2| > |1| and |2| > |3|. That is, when looking at the stack, its height increases to a certain point and then the height begins to decrease again. A PDA is said to be a k-turn PDA if for every word w L(M), w is accepted by a sequence of Ids making no more than k turns. If L is accepted by a finite turn PDA it is said to be metalinear. A language is linear if and only if it is accepted by a one-turn PDA. We now return to other operations and ask whether the linear languages are closed under those operations. Rather than doing a formal proof we use the automaton characterization as the justification for the result. Closure under intersection with a regular set: If we look at the construction used in proving the CFLs are closed under intersection with a regular set, we basically ran the PDA and the DFA in parallel since only the PDA required the use of a stack. For a linear language L there is a one-turn PDA that will recognize it. Thus, using this type of PDA to construct a PDA for the intersection of L and a regular language R, the resulting machine will also be a one-turn PDA and thus the intersection is a linear language. Complement: The linear languages are not closed under complement. To see this observe this note that the complement of {anbn} contains strings of the type akbkajbj. To recognize a string like this a two-turn PDA will be needed so that the complement of {anbn} is not linear. Thus, we know that the set of regular languages is a proper subset of the CFLs, and that the set of linear languages is a proper subset of the CFLs. A natural question to ask if where do the DCFLs fit into this Venn diagram below. Consider two languages we've looked at before: L1 = {na(w) = nb(w)} is a DCFL but not linear. L2 = {anbn | n ≥ 0} and L3 = {anb2n | n ≥ 0 are both linear languages and both deterministic. However, since the linear languages are closed under union and the DCFLs are not, L1 L2 = {anbk | n = k or n = 2k} is a linear language that is not a DCFL. Thus, neither is a subset of the other.