expressions

advertisement
Regular Expressions,
Backus-Naur Form and
Reverse Polish Notation
Starter
What does the expression 3+4*5 evaluate to?
The answer is 23
Why is the answer not 35?
Objectives
• To form simple regular expressions for string
manipulation and matching
• To be able to check language syntax by
referring to BNF or syntax diagrams
• To be able to convert simple infix notation to
reverse Polish notation and vice versa
Natural Language
• A natural language comprises a set of words written and spoken by
humans
• Governed by syntax rules that define the order in which words may
be put together
– “Work students young hard clever”  is an invalid construct
– “Clever young students work hard” 
• Governed by semantic rules which define the actual meaning when
applied to real world concepts
– “The peanut ate the monkey” is a valid construct but lacks meaning
– “The monkey ate the peanut” 
• Constructing rules for a natural language is difficult and this is why
we use formal languages to communicate with computer systems
Formal Language
• Used for the precise definition of syntactically
correct programs for a programming language
• Defined using an alphabet and syntax rules
• The alphabet is defined as a finite set of
symbols
• The rules of syntax define how to construct
strings of the language out of symbols
• It is not feasible to list all the strings. Instead
Regular expressions are used.
Regular Expressions
• Is a way of expressing the language that a
machine can use in a formal manner
• They provide a means of identifying if a string of
characters is allowed in a particular language
• They can be represented using a finite state
machine
• Used extensively in operating systems for pattern
matching in commands, searching for information
using Google and search and replace in word
processors
Regular Expressions
If the alphabet of some formal language is {a, b} then here are
some regular expressions:
a is a regular expression that matches a string consisting of the
symbol a
ab matches a string consisting of the symbol a followed by b
a* matches a string consisting of zero or more a’s
Regular Expression Example
Define the syntax of a formal language with alphabet
{a, b, c} in which valid strings consist of at least one a
or at least one b, followed by two c’s.
Q1
A language L is defined by the alphabet {a, b, c}
together with the regular expression (a|c)+bb.
(a) Explain what represents a valid string in L.
A non-empty string consisting of any combination
of as and cs, terminated by two bs.
(b) Give two examples of valid strings in L.
aaccbb, cacccbb
Metacharacters
• | separates alternatives e.g. a |b can match a or b
• ? indicates there is zero or one of the preceding
element
• * indicates there is zero or more of the preceding
element
• + indicates there is one or more of the preceding
element
• [ab] means a or b
• [a-z] matches all 26 lower case letters
• n[^t] means an n followed by a character that is not a t
Worked Examples
1. Write down the strings defined by the regular
expression b[ea]d?
be, ba, bed, bad
2. Write down the strings defined by the regular
expression 10*1
11, 101, 1001, 10001, ...
3. Write down the strings defined by the regular
expression 10+1
101, 1001, 10001, ...
Backus-Naur Form (BNF)
• Some languages cannot be defined by regular
expressions
• BNF allows representations of a wider range of
languages
• Expresses the rules for constructing valid
strings in a regular language
• Defines the terms of the language, characters,
words and symbols
BNF Notation
• ::= means ‘is defined as’
• | separates alternatives on the right-hand side of
the rule
• <digit> ::= 0|1|2|3|4|5|6|7|8|9
• Recursive definition
<unsigned integer> ::= <digit>|<digit><unsigned integer>
• Syntax of a programming language
<expression> ::= <term><arithmetic_operator><term>
<term> ::= <identifier>|<constant>|<expression>
<arithmetic_operator> ::= <add_op>|etc
<add_op> ::= + | -
Syntax Diagrams
• Alternative way of defining the syntax rules of
a language
• See examples in book pg 65-66
Syntax Diagrams
BNF of Select
command
Syntax diagram
showing Select
command
Reverse Polish Notation (RPN)
• We evaluate arithmetic expressions using infix
notation e.g. 3 + 4
• An alternative is prefix notation where the
operator occurs before the operands e.g. + 3 4
• Postfix is where the operator occurs after the
operands e.g. 3 4 +
• Postfix notation is also called RPN
• RPN is used in scientific calculators
Examples of Infix and Postfix
Expressions
Infix
Postfix
x–y
xy-
(a+b)/(a-b)
ab+ab-/
x+(y^2)
xy2^+
5+((1+2)*4)-3
512+4*+3-
• Infix means normal mathematical expressions and Postfix means RPN
expressions
• Stacks are used in the evaluation of RPN expressions
Advantages of RPN
• RPN expressions do not need brackets to show
the order of evaluation. This makes them simpler
to evaluate by a machine
• If you try to use an infix (normal) calculator, there
will be a limit on the length of expression that can
be entered
• Postfix expressions can be evaluated as they are
entered, they are not limited to a certain number
of operators in an expression
Evaluating a Postfix Expression
The infix expression 5+((1+2)*4)-3 can be written like this in RPN: 5 1 2 + 4 * + 3 Input
Operation
Stack
Comment
5
Push operand
5
1
Push operand
5, 1
2
Push operand
5, 1, 2
+
Add
5, 3
4
Push operand
5, 3, 4
*
Multiply
5, 12
Pop two values (3, 4) and push result (12)
+
Add
17
Pop two values (5, 12) and push result (17)
3
Push operand
17, 3
-
Subtract
14
Pop two values (1, 2) and push result (3)
Pop two values (17, 3) and push result (14)
BNF is used by compiler writers to express the syntax of a
programming language. The syntax for part of one such
language is written in BNF as follows:
<expression> ::= <integer> | <integer> <operator> <expression>
<integer> ::= 0|1|2|3|4|5|6|7|8|9
<operator> ::= +|-|*|/
(a) Do the following expressions conform to this grammar?
Expression
1
4*9
2
8+6/2
3
-6*2
4
(4+5)*5
Yes/No
(b) (i) Express the infix expression 5+6*2 in RPN.
(ii) Give one advantage of RPN
Regular Expressions Class work
1. Complete Q3 Regular expressions in the worksheet.
2. Download and install the regular expression tool
Regex Coach http://www.weitz.de/regex-coach/#install
and try the regular expressions:
a+
a*
(ac)*
a(a|b)*
1(1│0)*0(1│0)*1
Class work and Homework
1. June 2010 COMP3 Q4
2. June 2011 COMP3 Q9
3. Watch this You Tube video which
demonstrates how to convert the infix
expression to postfix expression using a Stack
4. June 2011 COMP3 Q5
Hand-in Monday 23rd April 2012
Download