Regular Expressions, Backus-Naur Form and Reverse Polish Notation Starter What does the expression 3+4*5 evaluate to? The answer is 23 Why is the answer not 35? Objectives • To form simple regular expressions for string manipulation and matching • To be able to check language syntax by referring to BNF or syntax diagrams • To be able to convert simple infix notation to reverse Polish notation and vice versa Natural Language • A natural language comprises a set of words written and spoken by humans • Governed by syntax rules that define the order in which words may be put together – “Work students young hard clever” is an invalid construct – “Clever young students work hard” • Governed by semantic rules which define the actual meaning when applied to real world concepts – “The peanut ate the monkey” is a valid construct but lacks meaning – “The monkey ate the peanut” • Constructing rules for a natural language is difficult and this is why we use formal languages to communicate with computer systems Formal Language • Used for the precise definition of syntactically correct programs for a programming language • Defined using an alphabet and syntax rules • The alphabet is defined as a finite set of symbols • The rules of syntax define how to construct strings of the language out of symbols • It is not feasible to list all the strings. Instead Regular expressions are used. Regular Expressions • Is a way of expressing the language that a machine can use in a formal manner • They provide a means of identifying if a string of characters is allowed in a particular language • They can be represented using a finite state machine • Used extensively in operating systems for pattern matching in commands, searching for information using Google and search and replace in word processors Regular Expressions If the alphabet of some formal language is {a, b} then here are some regular expressions: a is a regular expression that matches a string consisting of the symbol a ab matches a string consisting of the symbol a followed by b a* matches a string consisting of zero or more a’s Regular Expression Example Define the syntax of a formal language with alphabet {a, b, c} in which valid strings consist of at least one a or at least one b, followed by two c’s. Q1 A language L is defined by the alphabet {a, b, c} together with the regular expression (a|c)+bb. (a) Explain what represents a valid string in L. A non-empty string consisting of any combination of as and cs, terminated by two bs. (b) Give two examples of valid strings in L. aaccbb, cacccbb Metacharacters • | separates alternatives e.g. a |b can match a or b • ? indicates there is zero or one of the preceding element • * indicates there is zero or more of the preceding element • + indicates there is one or more of the preceding element • [ab] means a or b • [a-z] matches all 26 lower case letters • n[^t] means an n followed by a character that is not a t Worked Examples 1. Write down the strings defined by the regular expression b[ea]d? be, ba, bed, bad 2. Write down the strings defined by the regular expression 10*1 11, 101, 1001, 10001, ... 3. Write down the strings defined by the regular expression 10+1 101, 1001, 10001, ... Backus-Naur Form (BNF) • Some languages cannot be defined by regular expressions • BNF allows representations of a wider range of languages • Expresses the rules for constructing valid strings in a regular language • Defines the terms of the language, characters, words and symbols BNF Notation • ::= means ‘is defined as’ • | separates alternatives on the right-hand side of the rule • <digit> ::= 0|1|2|3|4|5|6|7|8|9 • Recursive definition <unsigned integer> ::= <digit>|<digit><unsigned integer> • Syntax of a programming language <expression> ::= <term><arithmetic_operator><term> <term> ::= <identifier>|<constant>|<expression> <arithmetic_operator> ::= <add_op>|etc <add_op> ::= + | - Syntax Diagrams • Alternative way of defining the syntax rules of a language • See examples in book pg 65-66 Syntax Diagrams BNF of Select command Syntax diagram showing Select command Reverse Polish Notation (RPN) • We evaluate arithmetic expressions using infix notation e.g. 3 + 4 • An alternative is prefix notation where the operator occurs before the operands e.g. + 3 4 • Postfix is where the operator occurs after the operands e.g. 3 4 + • Postfix notation is also called RPN • RPN is used in scientific calculators Examples of Infix and Postfix Expressions Infix Postfix x–y xy- (a+b)/(a-b) ab+ab-/ x+(y^2) xy2^+ 5+((1+2)*4)-3 512+4*+3- • Infix means normal mathematical expressions and Postfix means RPN expressions • Stacks are used in the evaluation of RPN expressions Advantages of RPN • RPN expressions do not need brackets to show the order of evaluation. This makes them simpler to evaluate by a machine • If you try to use an infix (normal) calculator, there will be a limit on the length of expression that can be entered • Postfix expressions can be evaluated as they are entered, they are not limited to a certain number of operators in an expression Evaluating a Postfix Expression The infix expression 5+((1+2)*4)-3 can be written like this in RPN: 5 1 2 + 4 * + 3 Input Operation Stack Comment 5 Push operand 5 1 Push operand 5, 1 2 Push operand 5, 1, 2 + Add 5, 3 4 Push operand 5, 3, 4 * Multiply 5, 12 Pop two values (3, 4) and push result (12) + Add 17 Pop two values (5, 12) and push result (17) 3 Push operand 17, 3 - Subtract 14 Pop two values (1, 2) and push result (3) Pop two values (17, 3) and push result (14) BNF is used by compiler writers to express the syntax of a programming language. The syntax for part of one such language is written in BNF as follows: <expression> ::= <integer> | <integer> <operator> <expression> <integer> ::= 0|1|2|3|4|5|6|7|8|9 <operator> ::= +|-|*|/ (a) Do the following expressions conform to this grammar? Expression 1 4*9 2 8+6/2 3 -6*2 4 (4+5)*5 Yes/No (b) (i) Express the infix expression 5+6*2 in RPN. (ii) Give one advantage of RPN Regular Expressions Class work 1. Complete Q3 Regular expressions in the worksheet. 2. Download and install the regular expression tool Regex Coach http://www.weitz.de/regex-coach/#install and try the regular expressions: a+ a* (ac)* a(a|b)* 1(1│0)*0(1│0)*1 Class work and Homework 1. June 2010 COMP3 Q4 2. June 2011 COMP3 Q9 3. Watch this You Tube video which demonstrates how to convert the infix expression to postfix expression using a Stack 4. June 2011 COMP3 Q5 Hand-in Monday 23rd April 2012