BNF Dr. Milica Barjaktarovic Computers are stupid… aka very literal •If I say: rm * .out in Unix, Unix will remove ALL my files, although I wanted to say rm *.out •Computer understands ONLY the rules it knows about •Computer reads computer code and translates verbatim EVERY letter, every comma, every dot, … ***everything!*** and tries to see if it matches what the computer knows •If it looks ok according to the rule the computer knows about, then the computer will assume certain meaning of that code according to that rule •How do we deal with that? –By following the rules precisely Syntax and Semantics •Syntax: what it must look like •Semantics: what it means •For example, both of these commands have proper syntax, but very different semantics: –rm * .out –rm *.out •For example: a sentence in very simple English has syntax: noun verb object period –“Alice studies the book.” is a sentence that has the proper syntax. The semantics is… –“Bob studies.” is a sentence that doesn’t have proper syntax according to this simple English. Ok then what? •If the computer is so particular and we have to be so clear and so precise in how we give it orders, we need to have some means to communicate clearly – with each other and with the computer •Is English clear and unambiguous enough for communicating with a computer? –Hm…. •Solution: formalize the communication (i.e. express things more mathematically) –BNF, EBNF, … What is BNF for •A way to formally (i.e. unambiguously) express syntax and/or behavior of – anything! programming languages, natural languages, procedures, serial numbers, equipment behavior, etc. •Roughly speaking: –BNF description is a bunch of rules (called productions) –Each rule has a symbol called non-terminal on the left side, which is equal to a bunch of symbols called terminals on the right side –So, start from the top (i.e. the start symbol) and keep on expanding the non-terminals by substituting their assignment –This is somewhat similar to deriving equations in regular math. Difference: there are many possibilities in BNF, each can be expanded differently BNF definition •Roughly speaking, BNF grammar will be a bunch of rules (i.e. productions) looking like this: –non-terminal_1 = some combo of terminals –non-terminal_2 = some combo of terminals –… –There is one rule, i.e. one production, for each non-terminal; and rules can be recursive, have choices, optional parts, repeated parts, etc. –Terminals consist of lexemes and tokes, i.e. small lexical units •Formal syntax and postal address example: –http://en.wikipedia.org/wiki/Backus-Naur_form#Introduction BNF vs EBNF •EBNF has shortcuts to define repeats, etc. Notice that there are two alternative ways to represent those features, for example {} or +. () Parentheses. Used to group several elements, so they are treated as one single token ? Any token followed by ? occurs 0 or 1 times * Any token followed by * can occur 0 or more times + Any token followed by + can occur 1 or more times . Any character/token can occur one time ~ Any character/token following the ~ may not occur at the current place .. Between two characters .. spans a range which accepts every character between both boundaries inclusive. Usage Notation definition = concatenation , termination ; separation | option [ ... ] repetition { ... } grouping ( ... ) double quotation marks " ... " single quotation marks ' ... ' comment (* ... *) special sequence ? ... ? exception - –http://www.cs.umd.edu/class/spring2002/cmsc214/Tutorial/ebnf.html {} [] etc notation - http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+G rammars+-+No+Past+Experience+Required + * ? etc notations - http://odin.himinbi.org/bytewise_ebnf/ebnf_spec.html Alternative ways comparison - Sandwich example: “A sandwich consists of a lower slice of bread, mustard or mayonnaise; optional lettuce, an optional slice of tomato; two to four slices of either bologna, salami, or ham (in any combination); one or more slices of cheese, and a top slice of bread.” This translates to: sandwich ::= lower_slice [ mustard | mayonnaise ] -- should this be here? lettuce? tomato? [ bologna | salami | ham ] {2,4} cheese+ top_slice Also, it can be written as shown below (also, for a slightly better sandwich): <sandwich> :: <lowerbreadslice> ( <mustard> | <mayo>) [lettuce] [tomato] ( <bologney> | <salami> | <ham>) {2-4} ( <cheese>) {1-4} <topbreadslice> Examples The following examples have BNF specs of real-life and real-life like applications –http://www.garshol.priv.no/download/text/bnf.html#id2.2. simple parsers for numerals –http://www.ugrad.cs.ubc.ca/~cs126/Homepage/tutorials/tutOne.html simple calculators, serial numbers spec, etc. - - http://www.csm.astate.edu/~rossa/cs3543/bnf.html http, calculator, etc. - http://courses.cs.vt.edu/~cs1104/BNF/BNF.samples.html “specs on the fly” applet and demo - http://www.w3.org/Addressing/URL/5_BNF.html URL BNF in Real Life •Most (if not every) programming language is specified via BNF – because of standardization –http://cui.unige.ch/isi/bnf/ –Java http://cui.unige.ch/isi/bnf/JAVA/AJAVA.html –Java http://cui.unige.ch/isi/bnf/JAVA/BNFindex.html – http://lists.canonical.org/pipermail/kragen-hacks/1999October/000201.html – C http://www.cs.man.ac.uk/~pjj/bnf/c_syntax.bnf - Algol http://www.lrz-muenchen.de/~bernhard/Algol-BNF.html - 3APL http://www.cs.uu.nl/3apl/bnf.pdf •Most standardized or aspiring to be standardized documents are in BNF –Google “BNF for TCP or RFC” - e.g. OPM http://sdm.lbl.gov/OPM/DM_TOOLS/OPM/OPM_4.1/OPM_S T/node8.html - -Internet protocols use Augmented BNF (ABNF), i.e. a short-hand version of BNF: http://www.unix.com.ua/rfc/rfc2234.html http://xml.resource.org/public/rfc/html/rfc2234.html - SQL functions are described in BNF: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.js p?topic=/com.ibm.db2.udb.doc/admin/r0003509.htm - http://www.w3.org/TR/REC-PICS-services-961031 Rating service - http://www.computer.org/portal/cms_docs_ieeecs/ieeecs/edu cation/csidc/2001ProjectReports/Karlsruhe.pdf Remote control system -AMQP bug discussion: https://jira.amqp.org/jira/browse/AMQP63 -ZOE language spec: http://radio.weblogs.com/0101039/stories/misc/bnfGrammarForZo eSpecification.html PS – [RFC 2119] IETF (Internet Engineering Task Force). RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. 1997. Prof. Nancy Reed’s notes with examples from Sebesta book for ICS313 –http://www2.hawaii.edu/~nreed/ics313/lectures/03syntax.pdf