sec10_1

advertisement
MODELING COMPUTATION
10.1 Languages and Grammars
English grammar gives us the rules of how
words may be combined into a valid sentence.
Syntax is the form of a sentence, what order
and in what relationship each word or phrase
may take.
Semantics is the underlying meaning of a
sentence or construct.
Natural languages are spoken human
languages such as English, Swedish, Spanish,
etc.
The syntax of natural languages is
extremely complicated.
Formal languages are specified by a welldefined set of rules of syntax. Examples are
any programming language.
Sec 10.1-2
Page 1 of 10
A subset of English can be defined using the
following list of rules that describe how a valid
sentence can be produced:
A sentence is composed of a noun phrase
followed by a verb phrase
A noun phrase is an article - adjective - noun or it is
an article - noun
A verb phrase is a verb - adverb or it is a verb
An article is a or the
An adjective is large or hungry
A noun is rabbit or mathematician
A verb is eats or hops
An adverb is quickly or wildly
CREATING A VALID SENTENCE:
sentence
noun phrase verb phrase
article adjective noun verb phrase
article adjective noun verb adverb
the
adjective noun verb adverb
the
large
noun verb adverb
the
large
rabbit verb adverb
the
large
rabbit hops adverb
the
large
rabbit hops quickly
a hungry mathematician eats wildly
vs.
the wildly mathematician eats hungry
Sec 10.1-2
Page 2 of 10
PHRASE-STRUCTURE GRAMMARS
Definitions:
an alphabet (or vocabulary) V is a finite,
nonempty set of elements called symbols.
A word (or sentence) over V is a string of
finite length of elements of V.
The empty (or null) string, denoted by ,
is the string containing no symbols.
V* is the set of all words over V.
A language over V is a subset of V*.
Examples:
Alphabets: A1 = {a, b, c}
A2 = {0, 1}
symbols:
a, b, c
0, 1
words:
abba bcbaa aaa 0 101 11011
A1* = {a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, ...}
A2* = {0, 1, 00, 01, 10, 11, 000, 001, 010, ...}
A language over A1 = // begin & end with a
{a, aa, aaa, aba, aca, aaaa, aaba, abaa, ...}
A language over A2 = // a single 1 in them
{1, 01, 10, 001, 010, 100, 0001, 0010, ...}
Sec 10.1-2
Page 3 of 10
As we noticed when we looked at the subset
of English, the alphabet was divided into two
groups: terminals (T) and nonterminals (N).
Terminals cannot be replaced by other
symbols - examples were a, the, large, rabbit.
Nonterminals can be replaced by other
symbols (and must be before the sentence is
valid) - examples were article, noun, adverb.
There is a special member of the alphabet
called the start symbol (S) with which we
always begin. In the subset of English, it was
the symbol sentence.
Productions are the rules that specify when
we can replace a string from V* with another
string. These rules are denoted by: w0  w1
Definition. A phrase-structure grammar G =
(V, T, S, P) consists of an alphabet V, the set of
terminals T  V, the start symbol S  V, and a
set of productions P. N, the nonterminals, is
denoted by V - T. Every production in P must
contain at least one nonterminal on its left side.
Sec 10.1-2
Page 4 of 10
Let G = (V, T, S, P) where
V = {a, b, A, B, S}
T = {a, b}
S = the start symbol
P = {S  ABa, A  BB, B  ab, AB  b}
Then G is a phrase-structure grammar.
What words can we generate by the
productions of a phrase-structure grammar?
Of what words does the language consist?
Definition. Let G = (V, T, S, P) be a phrase
structure grammar. Let w0 = lz0r  V* and let
w1 = lz1r  V*.
 If z0  z1  P, w1 is directly derivable
from w0, which is written w0  w1.
 If w0  w1, w1  w2, ..., wn-1  wn, n  0,
then wn is derivable from w0, denoted by
w0 * wn.
 The sequence of steps used to obtain wn
from w0 is called a derivation.
Sec 10.1-2
Page 5 of 10
Example:
Aaba is directly derivable from ABa since
Bab  P.
abababa is derivable from ABa since
ABa  Aaba
since B  ab
 BBaba
since A  BB
 Bababa
since B  ab
 abababa
since B  ab
Thus, we would say that ABa * abababa
Definition. Let G = (V, T, S, P) be a phrasestructure grammar. The language generated
by G (or the language of G), L(G), is the set of
all strings of terminals that are derivable from
the starting state S. In other words:
L(G) = { w  T | S * w }
Example. Let G = (V, T, S, P) where
V = {S, A, a, b}, T = {a, b}, Start symbol is S,
and P = { SaA, Sb, Aaa}. What is L(G):
S aA
or S b
from aA, we can derive aaa. There are no
other words in L(G) = {b, aaa}
Sec 10.1-2
Page 6 of 10
Example. Let G = (V, T, S, P) where
V = {S, 0, 1}, T = {0, 1}, Start symbol is S, and
P = {S 11S, S0}. What is L(G):
S11S
or S0
from 11S we can derive:
110 and 1111S
11110 and 111111S
1111110 and 11111111S
Hence, L(G) = {0, 110, 11110, 1111110, ...} or
L(G) = { w | w begins with an even number of
ones followed by a single 0 at the end}
Example. Give the phrase-structure grammar
that generates {0n1n | n = 0, 1, 2, ...}
When n = 0, we get the empty string, so we
need the production S  .
When n is not 0, we must “pump out” an equal
number of 0’s at the beginning, and 1’s at the
end of the string, so we need the production:
S0S1.
Thus, we have G = ({S, 0, 1}, {0, 1}, S, {S,
S0S1}).
Sec 10.1-2
Page 7 of 10
Example. Find the phrase-structure grammar
that generates {0m1n | m and n are
nonnegative integers}. Notice how this differs
from the previous example: m and n do not
have to be equal. We must be careful not to
allow a 1 before a 0, nor a 0 after a 1 has
been generated.
Two solutions:
V1 = {S, 0, 1}
T1 = {0, 1}
S1 = S
P1 = {S, S0S, SS1}
V2 = {S, A, 0, 1}
T2 = {0, 1}
S2 = S
P2 = { S, S0S, S1A, S1, A1A,
A1}
Sec 10.1-2
Page 8 of 10
Types of Phrase-Structure Grammars
PSG are classified according to the types of
productions which are allowed.
Type 0: no restrictions on its productions
Type 1:
[aka context-sensitive] only
productions of the form w1w2 where the
length of w2  length of w1 or is of the form
w1  
Type 2: [aka context-free] only productions
of the form w1w2 where w1 is a single
symbol that is not a terminal symbol.
Type 3: [aka regular] only productions of the
form w1w2 with w1 = A and either w2 = aB
or w2 = a, where A and B are nonterminal
symbols, and a is a terminal symbol, or
with w1 = S and w2 = 
Type3  Type2  Type1  Type0
{0m1n} is regular;
{0n1n} is context-free;
{0n1n2n} is context-sensitive.
Sec 10.1-2
Page 9 of 10
A finite state machine M = (S, I, O, f, g, s0)
a finite set of states S
a finite input alphabet I
a finite output alphabet O
a transition funciton f that assigns to each
state and input pair a new state
an output funciton g that assigns to each state
and input pair an output
an initial state s0
State table
state diagram
Sec 10.1-2
Page 10 of 10
Download