LANGUAGES & COMPILER DESIGN CS 321 HM PSU HW 5 HomeWork 5, Structured, 100 Points (2/15/2005) Due Date: Subject: Monday February 28th 2005, at start of class Implement a parser for the simple structured language, named Structured General Rules: Implement Homework in C or C++. Any flavor of C that runs on PSU’s computers will do: K&R C, ANSI C, or C++. Hand in the complete listing of all C/C++ source files plus include files, if any, plus all inputs, and generated outputs. Write your name, the HW number, completion date, and the current PSU term into the header of each source file. Designs and discussions are to be provided in electronic form, not in long-hand, and not using hand-drawn pictures. Description: Design (20), implement (50), test (20), and debug a complete parser for a simple language; the name of the language and program is Structured. The language recognized by your parser is defined in EBNF below. For the necessary scanner use a variation of a scanner implemented for an earlier HW. The parser proper emits no output for a correct input. Instead, it only emits an “approved” message for correct source programs, and error messages in case of wrong source. At the end of a parse, Structured emits the “approved” message only if appropriate, and statistics, including the number of tokens, If Statements, Null Statements, Assignment Statements etc. Discuss (10/100) in a brief essay (less than one page) how you designed the parser for stmt_list such that any number of all legal statements will be parsed, yet the parse of statements (for the current statement list) stops as soon as an unsuitable (correct or incorrect) token is found. EBNF of structured: A source program may not be empty but instead consists of 1 or more statements. The statements include conventional Assignment Statements, plus various loop and conditional statements. A sample is included below. Of particular interest in this language are the reserved keywords: these are single letter identifiers, reserved as key tokens. For example, the input string e is not an identifier, but the reserved keyword e, standing for END_SYM. Similarly, i stands for the reserved keyword IF_SYM. By convention in the EBNF below, terminals are spelled in all uppercase, or are enclosed in a pair of ' characters, or are special symbols such as = > < ; etc. Nonterminals are spelled in lowercase. Meta symbols (here separated by the meta-meta-symbol comma) are ::=, [, ], {, }, and |. The start symbol is structured. The lexical rules for identifiers and literals, including string literals, char literals etc. are similar to the ones of HW2. Strings are delimited by “ and chars literals by the ‘ character. Define in your design, how the ‘ itself is represented as a char literal. Define also, how the “ is represented as part of a string literal. structured stmt_list statement ::= ::= ::= | | | | | null_stmt assignment_stmt if_stmt ::= ::= ::= { [ stmt_list statement { statement } null_stmt assignment_stmt if_stmt while_stmt loop_stmt for_stmt NULL_SYM ; IDENT = expression ; IF_SYM expression THEN_SYM stmt_list OTHERWISE_SYM expression THEN_SYM stmt_list } ELSE_SYM stmt_list ] 1 HW 5 CS 321 HM LANGUAGES & COMPILER DESIGN PSU HW 5 while_stmt ::= loop_stmt for_stmt ::= ::= expression bool_op relation relop simple_expression add_op term mult_op factor primary literal ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= END_SYM IF_SYM ; WHILE_SYM expression LOOP_SYM stmt_list END_SYM LOOP_SYM ; LOOP_SYM stmt_list END_SYM LOOP_SYM ; FOR_SYM IDENT = expression UPTO_SYM expression LOOP_SYM stmt_list END_SYM LOOP_SYM ; relation { bool_op relation } AND_SYM | OR_SYM | XOR_SYM simple_expression { relop simple_expression } = | > | < term { add_op term } + | factor { mult_op factor } * | / primary [ ^ primary ] ( expression ) | literal | IDENT STRING_LIT | CHAR_LIT | NUM_LIT | NULL_SYM Reserved Keyword List: Each reserved keyword is spelled as a single letter identifier. Both lowercase and uppercase versions are allowed. The complete list of reserved keywords follows: AND_SYM IF_SYM END_SYM THEN_SYM OTHERWISE_SYM OR_SYM WHILE_SYM XOR_SYM FOR_SYM UPTO_SYM LOOP_SYM ELSE_SYM NULL_SYM = = = = = = = = = = = = = a i e t o r w x f u l s n | | | | | | | | | | | | | A I E T O R W X F U L S N A Sample Source Program: Demonstrate your parser handles the source program below correctly. // PSU CS 321 // statement parser // name: ... date: ... h_1 = 12; i h_1 = mary_2 t xx = 12 ^ ( max ^ 2 ); e i; n; f ident = 12 u 13 l ex = "hello"; e l; n; f index = 0 u 100 l f j_dex = max u max + 109 l j_dex = j_dex + 1; e l; expr = 12 > od r "hello" = str; foo = 12 * 8 + ( 12 ^ 8 / 3 - 12 ); 2 // // // // // assign if condition then assign end if; null statement HW 5 LANGUAGES & COMPILER DESIGN CS 321 HM e i o o PSU HW 5 w j_dex > min l n; j_dex = j_dex - this; e l; l; aa > bb t aa = bb; aa > cc t aa = cc; aa > dd t aa = dd; // OTHERWISE == ELSIF s aa = 0; e i; Minimal Required Testing: Supply test cases of your own construction. In addition, test the Sample Source Program above. Show with other separate tests that all of the following error cases are all handled correctly via suitable error message. This includes line number, column number, what is expected, and what was found instead: THEN_SYM expected but not found in if_stmt Expression malformed Program empty --may have nothing but comments Semicolon missing at end of statement END_SYM expected but not found in if_stmt IF_SYM expected but not found at end in if_stmt LOOP_SYM expected but not found in for_stmt END_SYM expected but not found in for_stmt LOOP_SYM expected but not found at end in for_stmt = expected but not found in for_stmt UPTO_SYM expected but not found in for_stmt LOOP_SYM expected but not found in while_stmt END_SYM expected but not found in while_stmt LOOP_SYM expected but not found at end in while_stmt Empty stmt_list in while_stmt LOOP_SYM expected but not found in loop_stmt END_SYM expected but not found in loop_stmt LOOP_SYM expected but not found at end in loop_stmt Illegal use of multiple exponentiation operators ^ in expression Missing ) in expression 3 HW 5