Document

Chapter 4 Context-Free Languages 1 Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Using Grammar Rules to Define a Language • Regular languages and FAs are too simple for many purposes – Using context-free grammars allows us to describe more interesting languages – Much high-level programming language syntax can be expressed with context-free grammars – Context-free grammars with a very simple form provide another way to describe the regular languages • Grammars can be ambiguous • We will study how derivations can be related to the structure of the string being derived Introduction to Computation 2 Using Grammar Rules to Define a Language (cont’d.) • A grammar is a set of rules, usually simpler than those of English, by which strings in a language can be generated • Consider the language AnBn = {anbn | n  0}, defined using the recursive definition: –   AnBn – For every S  AnBn, aSb  AnBn • Think of S as a variable representing an arbitrary element, and write these rules as S   S  aSb (In the process of obtaining an element of AnBn, S can be replaced by either string) Introduction to Computation 3 Using Grammar Rules to Define a Language (cont’d.) • If  and  are strings, and  contains at least one occurrence of S, then    means that  is obtained from  in one step, by using one of the two rules to replace a single occurrence of S by either  or aSb • For example, we could write: S  aSb  aaSbb  aaaSbbb  aaabbb to describe a derivation of the string aaabbb • We can simplify the rules by using the | symbol to mean “or”, so that the rules become S   | aSb Introduction to Computation 4 Context-Free Grammars: Definitions and More Examples • Definition: A context-free grammar (CFG) is a 4-tuple G=(V, , S, P), where V and  are disjoint finite sets, S  V, and P is a finite set of formulas of the form A  , where A  V and   (V ∪ )* – Elements of  are terminal symbols, or terminals, and elements of V are variables, or nonterminals – S is the start variable, and elements of P are grammar rules, or productions – We use  for productions in a grammar and  for a step in a derivation – The notations  n  and  *  refer to n steps and zero or more steps, respectively Introduction to Computation 5 Context-Free Grammars: Definitions and More Examples (cont’d.) • We will sometimes write G to indicate a derivation in a particular grammar G •    means that there are strings 1, 2, and  in (V ∪ )* and a production A   in P such that  = 1A2 and  = 12 – This is a single step in a derivation • What makes the grammar context-free is that the production above, with left side A, can be applied wherever A occurs in the string (irrespective of the context; i.e., regardless of what 1 and 2 are) Introduction to Computation 6 Context-Free Grammars: Definitions and More Examples (cont’d.) • Definition: If G = (V, , S, P) is a CFG, the language generated by G is L(G) = { x  * | S G* x} (S is the start variable, and x is a string of terminals) • A language L is a context-free language (CFL) if there is a CFG G with L = L(G) Introduction to Computation 7 Context-Free Grammars: Definitions and More Examples (cont’d.) • Consider AEqB = {x  {a,b}* | na(x) = nb(x)} • Let’s develop a CFG for AEqB • If x is a non-null string in AEqB then either x = ay, where y  Lb = {z | nb(z) = na(z) + 1}, or x = by, where y  La = {z | na(z) = nb(z) + 1} – We represent Lb by the variable B and La by the variable A – The productions so far are S   | aB | bA – All we need now are productions for A and B Introduction to Computation 8 Context-Free Grammars: Definitions and More Examples (cont’d.) • If a string x  La starts with a, then the remainder is a member of AEqB • If it starts with b, the rest has two more a’s than b’s • Observation: a string containing two more a’s than b’s must be the concatenation of two strings, each with one more a; similarly with a and b reversed • The grammar resulting from these observations is S   | aB | bA A  aS | bAA B  bS | aBB (Note: if A were the start variable, it would generate La) Introduction to Computation 9 Context-Free Grammars: Definitions and More Examples (cont’d.) • Theorem 4.9: If L1 and L2 are CFLs over , then so are L1 ∪ L2, L1L2, and L1* • Suppose G1 and G2 are CFGs that generate L1 and L2 respectively, and assume that they have no variables in common • Suppose that S1 and S2 are the start variables. Su, Sc and Sk , the start variables of the new grammars, will be new variables. – Gu just adds the rules Su  S1 | S2 to G1 and G2 – Gc just adds the rule Sc  S1S2 to G1 and G2 – Gk just adds the rules Sk   | SkS1 to G1 Introduction to Computation 10 Regular Languages and Regular Grammars • The three operations in Theorem 4.9 are the ones involved in the recursive definition of regular languages • The “basic” regular languages over ,  and {}, are easily seen to be CFLs • Now we can prove by structural induction that every regular language over  is a CFL • In fact, however, the CFG can be of a simpler form. Definition 4.13: A context-free grammar is regular if every production is of the form A  B or A   Introduction to Computation 11 Regular Languages and Regular Grammars (cont’d.) • Theorem 4.14: For every language L  *, L is regular if and only if L = L(G) for some regular grammar G • Proof: – If L is a regular language, then there is a FA M=(Q, , q0, A, ) that accepts it – Define G=(V, , S, P) by letting V be Q, S the initial state q0, and P the set containing the production T  aU for every transition (T, a) = U in M and the production T   for every accepting state T of M Introduction to Computation 12 Regular Languages and Regular Grammars (cont’d.) • G is a regular grammar, and G accepts the same language as M – For every x = a1a2…an, the transitions on these symbols that start at q0 end at an accepting state if and only if there is a derivation of x in G • To prove the other direction we can start with a regular grammar G and reverse the construction to produce M – M may be an NFA, but it still accepts L(G), and it follows that L(G) is regular Introduction to Computation 13 Derivation Trees and Ambiguity • So far we’ve been interested in what strings a CFG generates • It is also useful to consider how a string is generated by a CFG • A derivation may provide information about the structure of a string, and if a string has several possible derivations, one may be more appropriate than another • We can draw trees to represent derivations Introduction to Computation 14 Derivation Trees and Ambiguity (cont’d.) • The root node represents the start variable S • Any interior node and its children represent a production A   used in the derivation; the node represents A, and the children, from left to right, represent the symbols in . • Each leaf node represents a symbol or  • The string derived is read off from left to right, ignoring ’s • Every derivation has exactly one derivation tree, but a tree can represent more than one derivation Introduction to Computation 15 Derivation Trees and Ambiguity (cont’d.) • In a derivation, at each step some production is applied to some occurrence of a variable • Consider a derivation that starts S  S + S. We could apply a production to either the first or second of the S’s, but the resulting trees would be the same • When we talk about a string having several possible derivations, one being more appropriate, we are talking about derivations corresponding to different trees Introduction to Computation 16 Derivation Trees and Ambiguity (cont’d.) • We can distinguish between trivially different derivations and essentially different ones by specifying that in a derivation, we always choose the left-most variable to expand • Definition 4.16: A derivation in a CFG is a leftmost derivation (LMD) if, at each step, a production is applied to the leftmost variable-occurrence in the current string – A rightmost derivation is defined similarly Introduction to Computation 17 Derivation Trees and Ambiguity (cont’d.) • Theorem 4.17: If G is a CFG, then for any x  L(G) these three statements are equivalent: – x has more than one derivation tree – x has more than one LMD – x has more than one RMD • Proof: see book • Definition 4.18: A CFG G is ambiguous if, for at least one x  L(G), x has more than one derivation tree (or equivalently, according to Theorem 4.17, more than one LMD) Introduction to Computation 18 Derivation Trees and Ambiguity (cont’d.) • A classic example of ambiguity is the dangling else • In C, an if-statement can be defined by S  if ( E ) S | if ( E ) S else S | OS (where OS stands for “other statement”) • Consider the statement if (e1) if (e2) f(); else g(); – In C, the else to belong to the second if, but this grammar does not rule out the other interpretation • The two derivation trees shown on the next slide show the two interpretations of a dangling else Introduction to Computation 19 Introduction to Computation 20 Derivation Trees and Ambiguity (cont’d.) • Clearly the grammar given is ambiguous, but there are equivalent grammars that allow only the correct interpretation • Example: S  S1 | S2 S1  if ( E ) S1 else S1 | OS S2  if ( E ) S | if ( E ) S1 else S2 Introduction to Computation 21 Derivation Trees and Ambiguity (cont’d.) Consider the CFG G : S  S + S | S * S | (S) | a • G generates simple algebraic expressions • One reason for ambiguity is that the relative precedence of + and * hasn’t been specified: a+a*a could be interpreted as (a+a)*a or as a+(a*a) • In fact, S  S + S causes ambiguity by itself, because a+a+a could be interpreted as either (a+a)+a or a+(a+a). Similarly for S  S * S • We might try to correct both problems by using the productions S  S + T | T T  T + F | F (think of T as “term” and F as “factor”) Introduction to Computation 22 Derivation Trees and Ambiguity (cont’d.) • * now has higher precedence than + (all the multiplications are performed within a term) • By making the production S  S + T, not S  T + S, we make + associate to the left. Similarly for * • We want parenthetical expressions to be evaluated first; this means we should consider such an expression to be part of a factor. The resulting unambiguous CFG generating L(G) is S  S + T | T T  T * F | F F  (S) | a (proofs of unambiguity and equivalence are both somewhat complicated) Introduction to Computation 23 Simplified Forms and Normal Forms • Questions about the strings generated by a CFG are sometimes easier to answer if we know something about the form of the productions – For example, if we know that a grammar has no -productions and no unit productions (A  B) we can deduce that no derivation of a string x can take more than 2|x| - 1 steps (see book for details). We could then, in principle, determine whether x can be derived by considering derivations no longer than this • We show how to modify an arbitrary CFG to have no productions of either of these types Introduction to Computation 24 Simplified Forms and Normal Forms (cont’d.) • Suppose we have the production A  BCDCB, and  can be derived from either B or C. If we get rid of -productions, then the steps that replace B and C by  will no longer be possible, but we must still be able to get all the same non-null strings from A • We must retain the production A  BCDCB but we should add A  CDCB, A  DCB, A  BDCB, and so on • We will need to know what variables can derive  (we will call such a variable a nullable variable) Introduction to Computation 25 Simplified Forms and Normal Forms (cont’d.) • Definition 4.26: A recursive definition of the set of nullable variables of G – If there is a production A   then A is nullable – If A1, A2, …, Ak are nullable variables and there is a production B  A1A2… Ak , then B is nullable • This leads immediately to an algorithm for identifying the nullable variables Introduction to Computation 26 Simplified Forms and Normal Forms (cont’d.) • Theorem 4.27: For every CFG G = (V, , S, P) the following algorithm produces a CFG G1=(V, , S, P1) having no -productions for which L(G1) = L(G) – {} – Identify the nullable variables in V and initialize P1 to P – For every production A   in P, add to P1 every production obtained by deleting from  one or more variable-occurrences involving a nullable variable – Delete every -production from P1, as well as every production of the form A  A Introduction to Computation 27 Simplified Forms and Normal Forms (cont’d.) • The procedure we use to eliminate unit productions is similar • We first identify pairs of variables (A, B) for which A * B (in this case we call B A-derivable); then for each such pair (A, B) and each nonunit production B  , we add the production A   • Such pairs can be found as follows: – If A  B is a production, then B is A-derivable – If C is A-derivable and C  B is a production, then B is A-derivable – No other variables are A-derivable Introduction to Computation 28 Simplified Forms and Normal Forms (cont’d.) • Theorem 4.28: For every CFG G = (V, , S, P) without -productions, the CFG G1=(V, , S, P1) produced by the following algorithm generates the same language as G and has no unit productions: – Initialize P1 to P, and for each A  V, identify the A-derivable variables – For every such pair A  B and every nonunit production B  , add the production A   to P1 – Delete all unit productions from P1 Introduction to Computation 29 Simplified Forms and Normal Forms (cont’d.) • Definition 4.29: A CFG is said to be in Chomsky normal form if every production is of one of these two types: A  BC (where B and C are variables) A   (where  is a terminal) • Theorem 4.30: For every context-free grammar G, there is another CFG G1 in Chomsky normal form such that L(G1) = L(G) – {} • The algorithm on the next slide shows how to generate G1 Introduction to Computation 30 Simplified Forms and Normal Forms (cont’d.) • The first step is to eliminate -productions and unit productions • The second step is to introduce for every terminal symbol  a new variable X and production X   • In every production, replace every terminal by its new variable (except for the new productions above) • Replace a production like A  BACB by the productions A BY1, Y1  AY2, Y2  CB, where Y1 and Y2 are new variables • The resulting CFG is in Chomsky normal form Introduction to Computation 31

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib