MODELING COMPUTATION 10.1 Languages and Grammars English grammar gives us the rules of how words may be combined into a valid sentence. Syntax is the form of a sentence, what order and in what relationship each word or phrase may take. Semantics is the underlying meaning of a sentence or construct. Natural languages are spoken human languages such as English, Swedish, Spanish, etc. The syntax of natural languages is extremely complicated. Formal languages are specified by a welldefined set of rules of syntax. Examples are any programming language. Sec 10.1-2 Page 1 of 10 A subset of English can be defined using the following list of rules that describe how a valid sentence can be produced: A sentence is composed of a noun phrase followed by a verb phrase A noun phrase is an article - adjective - noun or it is an article - noun A verb phrase is a verb - adverb or it is a verb An article is a or the An adjective is large or hungry A noun is rabbit or mathematician A verb is eats or hops An adverb is quickly or wildly CREATING A VALID SENTENCE: sentence noun phrase verb phrase article adjective noun verb phrase article adjective noun verb adverb the adjective noun verb adverb the large noun verb adverb the large rabbit verb adverb the large rabbit hops adverb the large rabbit hops quickly a hungry mathematician eats wildly vs. the wildly mathematician eats hungry Sec 10.1-2 Page 2 of 10 PHRASE-STRUCTURE GRAMMARS Definitions: an alphabet (or vocabulary) V is a finite, nonempty set of elements called symbols. A word (or sentence) over V is a string of finite length of elements of V. The empty (or null) string, denoted by , is the string containing no symbols. V* is the set of all words over V. A language over V is a subset of V*. Examples: Alphabets: A1 = {a, b, c} A2 = {0, 1} symbols: a, b, c 0, 1 words: abba bcbaa aaa 0 101 11011 A1* = {a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, ...} A2* = {0, 1, 00, 01, 10, 11, 000, 001, 010, ...} A language over A1 = // begin & end with a {a, aa, aaa, aba, aca, aaaa, aaba, abaa, ...} A language over A2 = // a single 1 in them {1, 01, 10, 001, 010, 100, 0001, 0010, ...} Sec 10.1-2 Page 3 of 10 As we noticed when we looked at the subset of English, the alphabet was divided into two groups: terminals (T) and nonterminals (N). Terminals cannot be replaced by other symbols - examples were a, the, large, rabbit. Nonterminals can be replaced by other symbols (and must be before the sentence is valid) - examples were article, noun, adverb. There is a special member of the alphabet called the start symbol (S) with which we always begin. In the subset of English, it was the symbol sentence. Productions are the rules that specify when we can replace a string from V* with another string. These rules are denoted by: w0 w1 Definition. A phrase-structure grammar G = (V, T, S, P) consists of an alphabet V, the set of terminals T V, the start symbol S V, and a set of productions P. N, the nonterminals, is denoted by V - T. Every production in P must contain at least one nonterminal on its left side. Sec 10.1-2 Page 4 of 10 Let G = (V, T, S, P) where V = {a, b, A, B, S} T = {a, b} S = the start symbol P = {S ABa, A BB, B ab, AB b} Then G is a phrase-structure grammar. What words can we generate by the productions of a phrase-structure grammar? Of what words does the language consist? Definition. Let G = (V, T, S, P) be a phrase structure grammar. Let w0 = lz0r V* and let w1 = lz1r V*. If z0 z1 P, w1 is directly derivable from w0, which is written w0 w1. If w0 w1, w1 w2, ..., wn-1 wn, n 0, then wn is derivable from w0, denoted by w0 * wn. The sequence of steps used to obtain wn from w0 is called a derivation. Sec 10.1-2 Page 5 of 10 Example: Aaba is directly derivable from ABa since Bab P. abababa is derivable from ABa since ABa Aaba since B ab BBaba since A BB Bababa since B ab abababa since B ab Thus, we would say that ABa * abababa Definition. Let G = (V, T, S, P) be a phrasestructure grammar. The language generated by G (or the language of G), L(G), is the set of all strings of terminals that are derivable from the starting state S. In other words: L(G) = { w T | S * w } Example. Let G = (V, T, S, P) where V = {S, A, a, b}, T = {a, b}, Start symbol is S, and P = { SaA, Sb, Aaa}. What is L(G): S aA or S b from aA, we can derive aaa. There are no other words in L(G) = {b, aaa} Sec 10.1-2 Page 6 of 10 Example. Let G = (V, T, S, P) where V = {S, 0, 1}, T = {0, 1}, Start symbol is S, and P = {S 11S, S0}. What is L(G): S11S or S0 from 11S we can derive: 110 and 1111S 11110 and 111111S 1111110 and 11111111S Hence, L(G) = {0, 110, 11110, 1111110, ...} or L(G) = { w | w begins with an even number of ones followed by a single 0 at the end} Example. Give the phrase-structure grammar that generates {0n1n | n = 0, 1, 2, ...} When n = 0, we get the empty string, so we need the production S . When n is not 0, we must “pump out” an equal number of 0’s at the beginning, and 1’s at the end of the string, so we need the production: S0S1. Thus, we have G = ({S, 0, 1}, {0, 1}, S, {S, S0S1}). Sec 10.1-2 Page 7 of 10 Example. Find the phrase-structure grammar that generates {0m1n | m and n are nonnegative integers}. Notice how this differs from the previous example: m and n do not have to be equal. We must be careful not to allow a 1 before a 0, nor a 0 after a 1 has been generated. Two solutions: V1 = {S, 0, 1} T1 = {0, 1} S1 = S P1 = {S, S0S, SS1} V2 = {S, A, 0, 1} T2 = {0, 1} S2 = S P2 = { S, S0S, S1A, S1, A1A, A1} Sec 10.1-2 Page 8 of 10 Types of Phrase-Structure Grammars PSG are classified according to the types of productions which are allowed. Type 0: no restrictions on its productions Type 1: [aka context-sensitive] only productions of the form w1w2 where the length of w2 length of w1 or is of the form w1 Type 2: [aka context-free] only productions of the form w1w2 where w1 is a single symbol that is not a terminal symbol. Type 3: [aka regular] only productions of the form w1w2 with w1 = A and either w2 = aB or w2 = a, where A and B are nonterminal symbols, and a is a terminal symbol, or with w1 = S and w2 = Type3 Type2 Type1 Type0 {0m1n} is regular; {0n1n} is context-free; {0n1n2n} is context-sensitive. Sec 10.1-2 Page 9 of 10 A finite state machine M = (S, I, O, f, g, s0) a finite set of states S a finite input alphabet I a finite output alphabet O a transition funciton f that assigns to each state and input pair a new state an output funciton g that assigns to each state and input pair an output an initial state s0 State table state diagram Sec 10.1-2 Page 10 of 10