September 1

advertisement
Notes on regular languages
The simplest language class is the set of regular languages, so let's begin with finite
automata, the machines that recognizes languages in this class.
What is a finite automaton? Pictorially, the states of the machine are circles, and they
are generally labeled by subscripted q’s such as q0, q1, … . The start state is usually
indicated by an unlabeled arrow or a triangle pointing to it. By convention, the start
state is usually q0. The final or accepting states are indicated by concentric circles. We
accept a string if the machine ends up in that state after examining all elements of the
string being tested. Transitions from one state to another are indicated by arrows
labeled by an alphabet symbol.
Suppose we want to recognize bit strings of even parity. Since parity is either even or
odd we will need two states—one corresponding to each possibility. Thus we begin
like this:
1

1
The start state on the left corresponds to strings with even parity and thus is indicated
as an accepting state while the one on the right is where odd parity strings would end
up. However, our alphabet contains two symbols—0 and 1 and thus our machine
should have transitions corresponding to an input of 0. Since a 0 doesn’t change
parity, we’ll use loops on both states so strings with 0’s can be handled. This gives us
the machine below:
Another example of the use of a finite automaton is in a lexical analyzer to recognize
key words such as “if”, “then”, and “while” when compiling a program.
Another way that regular languages are characterized is by grammars. A finite
automaton recognizes strings in a language i.e. run the string through the machine and
see what kind of state it stops in. Grammars generate strings in a language. That is, if
you follow the rules in the grammar you will produce a string that is in the language.
We’ll give the formal definition of a grammar later, but let’s construct one for the even
parity language. We will need two variables corresponding to the two states in the
machine above. We’ll use S and A corresponding to q0 and q1, respectively. The
grammar below will generate even parity strings. The generation stops when the only
symbols remaining in the string are alphabet symbols (terminals).
S  0S
S  1A
S
A  0A
A  1S
Using the S   rule stops the derivation because all variables have been removed.
Here’s a sample derivation of an even parity string:
S  0S  01A  010A  0101S  0101S  01010
Note that a grammar is not unique and there may be other grammars that generate the
same language.
Now, for the formal definition of a grammar:
A grammar G is defined as a quadruple G = (V, T, S, P) where
V is a finite set of objects called variables,
T is a finite set of objects called terminal symbols
S  V is a special symbol called the start variable,
P is a finite set of productions or rules to generate strings in the language. The
rules in P have the form    where  and  are strings of symbols from V and T
For regular languages, the left hand side of a production is always a single variable. A
common convention is to use lower case letters near the front of the (English) alphabet
and digits as the terminal symbols. Lower case letters near the end of the alphabet are
usually used to denote strings in a language. For example, the string we derived above
could be “named” w i.e. w = 01010. V and T are disjoint, and both sets are nonempty.
T is really the same thing as . We use a different notation because we are discussing
them in different ways. S is always an element of V. Other commonly used notation
includes T* whose elements are a sequence of (terminal) symbols from T. A string in
(V  T)* contains both variables and terminal symbols. This may also be referred to as
a sentential form.
If string w can be obtained by starting with S and applying the productions of the
grammar then we say that S derives w denoted by S * w. (The * indicates this can be
done in 1 or more steps. We may also put a number above the double arrow indicating
the actual number of steps used in the derivation.) The language generated by the
grammar G is defined as L(G) = {w  T* | S * w}.
For more examples of grammars see examples 1.11, 1.12 and 1.13 in the text.
Let’s look briefly at example 1.12 and discuss why certain variations will not work.
The language L = {anbn+1 | n  0}.
The grammar from the book is
S  Ab
A  aAb | A  
Observe that the following grammar also works: S  aSb | b
If we wanted all strings with an equal number of a’s and b’s with all a’s preceding the b’s
we could replace the production A  b in the second grammar by S  , What does
the following grammar generate? S  aSb | Sb | 
Finally, let’s look at exercise 12 in section 1.2.
generate? S  aA |  A  bS
What does the following grammar
(We’ll discuss the answers to these in class.)
Think about how you could construct a grammar that generates all strings with an equal
number of a’s and b’s. There is no restriction on the order of the a’s and b’s.
Here’s a table that shows grammar characteristics and machines that accept the
different language classes.
language class
regular
context-free
context-sensitive
recursively
enumerable
grammar characteristics
A  wB | w where w  T* (i.e. a string of
terminal symbols)
lhs is single variable. If there is a
variable on the rhs it is at the extreme right.
e.g. S  aA
A   where  (V  T)*. That is, the rhs is a
string of variables and terminals
e.g. S  aSb
   where ,   (V  T)* That is, both the
lhs and rhs are strings of variables and
terminals and 1  ||  ||
e.g. aA  bBS
unrestricted or phrase-structured
   where ,   (V  T)*
No restrictions on  or 
machine
finite automaton
pushdown automaton
linear-bounded
automaton (LBA)
Turing machine
In addition to the finite automata and grammar, regular expressions are also used to
represent regular languages. For example, for the language discussed in class, the
set of odd length strings over {a, b} ending in b one possible regular expression is
[(a+b)(a+b)]*b. A * means to use the expression 0 or more time. We’ll look at regular
expressions in more detail later.
Some simple examples of context-free languages that are not regular are the set of
strings with an equal number of a’s and b’s or strings of the form aibjci+j.
Download