Formal Languages Part 1 Including Regular Expressions Basic Concepts for

advertisement
Basic Concepts for
Symbols, Strings, and Languages
TDDD16 Compilers and interpreters
TDDB44 Compiler Construction
„ Alphabet
A finite set of symbols.
„ Example:
Formal Languages Part 1
Including Regular Expressions
Sb = { 0,1 }
Ss = { A,B,C,...,Z,Å,Ä,Ö }
Sr = { WHILE,IF,BEGIN,... }
binary alphabet
Swedish characters
reserved words
„ String
A finite sequence of symbols from an alphabet.
„ Example:
10011
KALLE
WHILE DO BEGIN
Peter Fritzson
IDA, Linköpings universitet, 2009.
Properties of Strings in Formal Languages
String Length, Empty String
Number of symbols in the string.
„ Example:
z
z
z
z
x arbitrary string, |x| length of the string x
|10011| = 5 according to Sb
|WHILE| = 5 according to Ss
|WHILE| = 1 according to Sr
„ Empty string
z
The empty string is denoted e, |e| = 0
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.2
Properties of Strings in Formal Languages
Concatenation, Exponentiation
„ Concatenation
„ Length of a string
z
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
from Sb
from Ss
from Sr
2.3
Substrings: Prefix, Suffix
z
Two strings x and y are joined together x•y = xy
„ Example:
z
x = AB, y = CDE produce x•y = ABCDE
z
|xy| = |x| + |y|
z
xy ≠ yx (not commutative)
zex=xe= x
„ String exponentiation
z
x0 = e
z
x1 = x
z
x2 = xx
z
xn = x•xn-1, n >= 1
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.4
Languages
„ A Language = A finite or infinite set of strings which can be
constructed from a special alphabet.
„ Example:
z
„ Alternatively: a subset of all the strings which can be
x = abc
constructed from an alphabet.
„ Prefix: Substring at the beginning.
z
Prefix of x: abc (improper as the prefix equals x), ab, a, e
„ Suffix: Substring at the end.
z
Suffix of x: abc (improper as the suffix equals x), bc, c, e
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.5
z
∅ = the empty language.
NB! {e} ≠ ∅.
„ Example: S = {0,1}
z
L1 = {00,01,10,11}
z
L2 = {1,01,11,001,...,111, ...} all strings which finish on 1
z
L3 = ∅
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
all strings of length 2
all strings of length 1 which finish on 01
2.6
1
Closure
Operations on Languages
Concatenation
„ S* denotes the set of all strings which can be constructed from
„ L, M are languages.
the alphabet
„ Concatenation operation • (or nothing) between languages
„ Closure types:
z
* = closure, Kleene closure
z
+ = positive closure
„ Example: S = {0,1}
z
S* = { e, 0,1,00,01,...,111,101,...}
z
S+ = S* – {e} = {0,1,00,01,...}
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.7
z
L•M = LM = {xy|x ∈ L and y ∈ M}
z
L{e} = {e}L = L
z
L∅ = ∅L = ∅
„ Example:
z
L ={ab,cd} M={uv,yz}
z
gives us: LM ={abuv,abyz,cduv,cdyz}
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.8
Exponents and Union of Languages
Closure of Languages
„ Exponents of languages
„ Closure
z L0
= {e}
z L1
=L
z
L* = L0 ∪ L1 ∪ ... ∪ L∞
„ Positive closure
z
L2 = L•L
z
Ln = L•Ln-1, n >= 1
„ Union of languages
L+ = L1 ∪ L2 ∪ ... ∪ L∞
z
L* = {e} ∪
LL* = L* – {e} , if e not in L
L+
„ Example: A = {a,b}
z
L, M are languages.
z
L ∪ M = {x| x ∈ L or x ∈ M}
z
Example:
L = {ab,cd} , M = {uv,yz}
z
gives us:
L ∪ M = {ab,cd,uv,yz}
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
z
2.9
z
A* = {e,a,b,aa,ab,ba,bb,...}
= All possible sequences of a and b.
„ A language over A is always a subset of A*.
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.10
Regular expressions
„ Regular expressions are used to describe simple languages,
Small Language Exercise
e.g. basic symbols, tokens.
z
Example: identifier = letter • (letter | digit)*
„ Regular expressions over an alphabet S denote a language
(regular set).
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.11
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.12
2
Rules for constructing regular expressions
„ S is an alphabet,
z
z
the regular expression r
describes the language Lr,
„ Examples: S = {a,b}
Regular expression r
Language Lr
z
1. r=a
e
{e}
Lr={a}
z
2. r=a*
Lr={e,a,aa,aaa, ...} = {a}*
{a}
z
3. r=a|b
Lr={a,b}={a} ∪ {b}
z
4. r=(a|b)*
Lr={a,b}*={e,a,b,aa,ab,ba,bb,aaa,aab,...}
a
a∈S
the regular expression s
corresponds to the language union: (s) | (t)
Ls, etc.
concatenation: (s).(t)
„ Each symbol in the alphabet S is
Regular Expression Language Examples
Ls ∪ Lt
Ls.Lt
repetition: (s)*
Ls *
z
5. r=(a*b*)*
Lr={a,b}*={e,a,b,aa,ab,ba,bb,aaa,aab,...}
repetition: (s)+
Ls +
z
6. r=a|ba*
Lr={a,b,ba,baa,baaa,...}={a or bai | i≥0}
a regular expression which
denotes {a}.
z * = repetition, zero or more
times.
Priorities
Highest
* +
.
Lowest
|
„ NB! {anbn | n>=0} cannot be described with regular expressions.
z r=a*b* gives us Lr={ai bj | i,j>=0} does not work.
z
z
+ = repetition, one or more
times.
z . concatenation can be left out
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.13
r=(ab)* gives us Lr={(ab)i | i>=0}={e,ab,abab, ... } does not work.
„ Regular expressions cannot ’’count’’ (have no memory).
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.14
Finite state Automata and Diagrams
State Transition Diagram
„ (Finite automaton)
„ state diagram (DFA) for banbm
„ Assume:
z
regular expression RU = ba+b+ = baa ... abb ... b
z
L(RU) = { banbm | n, m ≥ 1 }
start
A program which takes a string x and answers yes/no
depending on whether x is included in the language.
z
The first step in constructing a recognizer for the language
L(RU) is to draw a state diagram (transition diagram).
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.15
b
b
0
a
9
3
a
accepting state
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.16
Input and State Transitions
„ Start in the starting node 0.
„ Example of input: baab
„ Repeat until there is no more input:
„ Then accept when there is no
Read input.
z
Follow a suitable edge.
more input and state 3 is an
accepting state.
z
Check whether we are in a final state. In this case accept
the string.
„ There is an error in the input if there is no suitable edge to
follow.
z Add one or several error nodes.
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.17
Step
Current
State
Input
1
0
baab
2
1
aab
3
2
ab
4
2
b
5
3
e
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
a
1
a
2
b
start
„ When there is no more input:
b
a, b
error state
Interpret a State Transition Diagram
z
a
2
b
„ Recognizer
z
a
1
b
b
0
a
9
a
3
b
a, b
error state
accepting state
2.18
3
Representation of State Diagrams by
Transition Tables
NFA and Transition Tables
„ The previous graph is a DFA
Example: NFA for (b|a)* ab
(Deterministic Finite Automaton).
State
„ It is deterministic because at each
step there is exactly one state to
go to and there is no transition
marked ‘‘e’’.
Accept
regular set and corresponds to an
NFA (Nondeterministic Finite
Automaton).
e
no
0
„ A regular expression denotes a
Found
Next
state
Next
state
a
b
9
1
1
no
b
2
9
2
no
ba+
2
3
3
yes
ba+b+
9
3
9
no
a
start
0
a
1
state
a
b
Accept
0
{0,1}
{0}
no
{2}
no
b
2
1
b
2
9
state diagram for (b|a)*ab
yes
Transition table for (b|a)*ab
Transition Table
(Suitable for computer representation).
It requires more calculations to simulate an NFA with a computer program,
e.g. for input ab, compared to a DFA.
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.19
2.20
Transforming NFA to DFA
„ Theorem
z
Any NFA can be transformed to a corresponding
DFA.
Small Regular Expression and
Transition Diagram/Table
Exercise
„ When generating a recognizer automatically, the
following is done:
regular expression → NFA.
NFA → DFA.
z DFA → minimal DFA.
z DFA → corresponding program code or table.
z
z
b
a
DFA for (b|a)*ab
start
0
b
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.21
a
1
b
2
a
TDDD16/B44, P Fritzson, IDA, LIU, 2009.
2.22
4
Download