CH/S7CS/Nov., 2002 PROGRAMMING LANGUAGE Language Evaluation Criteria (For Reference Only) Readability The ease with which programs can be read and understood. A number of characteristics of programming languages contribute to their readability: Overall simplicity i. a) A language that has a large number of elementary components is usually more difficult to learn than one with a small number of elementary components. b) Another problem is feature multiplicity, i.e. having more than one way to accomplish a particular operation. E.g. In C, a user can increment a simple integer variable in four different ways, count = count +1 count++ ++count count += 1 c) A third problem is operator overloading, in which single operator symbol has more than one meaning Overloading the operator ‘+’ to mean simple integer or floating point addition, unary operation, the sum of all elements of two single-dimensional array and even vector addition. E.g. Rem. : Language statements can also be simplified too much and reduce readability, language. ii. e.g. assembly Orthogonality a) In a programming language, it means that there is a relatively small set of primitive constructs that can be combined in a relatively small number of ways to build the control and data structures of the language. b) Furthermore, every possible combination is legal and meaningful -- a symmetry of relationship among primitives. c) Example 1: Addition in the assembly languages of the IBM mainframe computers and the VAX series of super-minicomputer In IBM mainframe, A AR Reg, memory cell Reg1, Reg2 Reg, Reg1 and Reg2 represent registers. The semantics of these are Reg <--- contents(Reg) + contents(memory_cell) Reg1 <--- contents(Reg1) + contents(Reg2) In VAX machine, ADDL operand_1, operand_2 whose semantic is operand_2 <--- contents(operand_1) + contents(operand_2) In this case, either operand can be a register or a memory cell. PROGRAMMING LANGUAGE page 1 CH/S7CS/Nov., 2002 The VAX instruction design is not orthogonal . There are two ways to specify operands, which can be combined in any way. d) Example 2 : Pascal Procedures can have both variable and value parameters. Functions can return only unstructured types. Formal parameter types must be named; they cannot be complete type descriptions. Files and structured data cannot be passed by value. Thus the type rules of Pascal are not orthogonal . ** Rem : The extreme form of orthogonality leads to unnecessary complexity. iii. iv. Control Statements a) The structured programming revolution of the 1970s was a reaction to the poor readability caused by the limited control structures of some of the languages of the 1950s and 1960s, e.g. BASIC and FORTRAN. b) Early language lacks the control statements that allow strong restrictions on the use of GOTO, so writing highly readable programs in those languages was difficult. Data Structures a) Boolean variable Using integer for flag, Error = 1 which is ambiguous. Using Boolean, Error = true which is better. b) v. A record data type provides a more readable way to represent employee records than a parallel array scheme. (data abstraction) Syntax Consideration a) b) Identifier forms Restricting identifiers to very short forms detracts from readability. BASIC and FORTRAN. The availability of connector characters, such as the underscore in identifiers is a great aid to readability. E.g. Special words Especially important is the method of forming compound statements, or statement groups, primarily in control constructs. E.g. Pascal uses begin-end pairs and C uses braces ({}) for the same purpose. Both of these languages suffer because groups are always terminated in the same way, which makes it difficult to determine which group is being ended when an ‘end’ or ‘}’ is found. FORTRAN - 77 and Ada make this clearer by using distinct closing syntax for each type of statement group, e.g. end if and end loop in Ada. If the special words of a language can be used as names for program variables, the resulting programs can be very confusing. PROGRAMMING LANGUAGE page 2 CH/S7CS/Nov., 2002 c) Form and meaning Designing statements so that their appearance at least partially indicates their action is an obvious aid to readability. E.g. In FORTRAN, go to (10, 20, 30), I means that the variable I is used to stores a numeric value, while go to I, (10, 20, 30) means that it stores an label value. Writability The ease with which a language can be used to create programs for a chosen problem area. Most of the language characteristics that affect readability also affect writability. Writability must be considered in the context of the target problem domain of a language. The most important factors influencing the writability of a language: i. ii. Simplicity and Orthogonality a) A large number of different constructs may lead to a misuse of some features and a disuse of others that may be either more elegant or more efficient, or both, than those that are used. b) A smaller number of primitive constructs and a consistent set of rules for combining them (orthogonality) is much better than simply having a large number of primitive. Support for Abstraction a) Abstraction means that complicated structures or operations can be stated in simple ways by ignoring many of the details. b) Example 1, the use of a subprogram to implement a sort algorithm that is required several times in a program. c) Example 2, data abstraction, e.g. binary tree. In FORTRAN, three parallel integer arrays is used. In Pascal, an abstraction of a tree node in the form of a single record unit with two pointers and an integer. Reliability A desirable goal of programming language design is to allow and encourage reliable programs, which will perform to its specifications under all conditions. Several language features that affect the reliability: i. ii. Type checking a) Type checking is the testing for type compatibility between two variables or a variable and a constant that are somehow involved with one another. b) E.g. two sides of an operator, parameter correspondence. Exception Handling. The ability of programs to interrupt run-time errors and other unusual conditions, to take corrective measures, and to continue is also a great aid PROGRAMMING LANGUAGE page 3 CH/S7CS/Nov., 2002 to reliability. iii. iv. e.g. ON ERROR in BASIC Aliasing a) It is having two distinct referencing methods, or names for the same memory cell. b) It is now widely accept that aliasing, without restriction, is too dangerous to justify its advantages. Readability and Writability. A program written in a language that does not support natural way to express the required algorithms will necessarily use unnatural methods. Cost Types of costs i. Cost of training programmers to use the language ii. Cost of writing programs --> use high level language iii. Cost of compiling programs iv. Cost of executing programs. A language that requires many run-time type check, such as PL/1, will prohibit fast code execution. v. Cost of maintaining programs <-- readability There is a simple trade-off can be made between compilation cost and execution speed of the compiled code. The extra compilation effort results in much faster code execution. ** A final note on evaluation criteria: Most criteria, particularly readability and writability, are neither measurable nor scientifically defined. Factors Influencing the Language Design (Reference only) Computer Architecture The most popular languages have all been designed around the prevalent architecture, called the von Neumann architecture. von Neumann architecture Both data and program are stored in the same memory. The processor is a unit separate from the memory. i. Instructions and data must be piped, or transmitted, from memory to the processor. ii. Results of operations in the processor must be moved back to memory. The von Neumann architecture causes the actual features of the imperative languages to be i. variables , which model the memory cells. ii. assignment statements , which are based on the piping operation; store and load. iii. the iterative form of repetition , (the instructions in a von Neumann computer are stored in adjacent cells of memory.) PROGRAMMING LANGUAGE page 4 CH/S7CS/Nov., 2002 Programming Methodologies Software engineering: i. the analysis of both the programming process and programming language design. ii. under intense study since 1970s. An important reason for the research in software engineering was the shift in the major cost of computing from hardware to software. (From 80% hardware, 20% software; To 20 % hardware, 80 % hardware) The primary programming language deficiencies that were discovered in the 1970s were incompleteness of type checking, inadequacy of control statements, and lack of facilities for exception handling. E.g. Process-oriented design and the extensive efforts in the area of concurrency that are taking place in the 1980s are bringing with them the need for complete language facilities for creating and controlling concurrent program units. Another example is the Object-oriented design (OOD) . Object-oriented (OO) approach It emphasizes data design, concentrating on the use of logical, or abstract, data types to solve problems. For data abstraction to be used effectively in software system design, it should be supported by the languages used to write the system. A Data base Application Example: Button Dialog Box Scrollbar ListBox Form Printer Employee People Product User Interface Objects Application Objects Directors Databases Liabilities Tables Database Objects Fields Records OO Terminology Class (noun) i. A Class is a Type of entities which have common attributes and behaviour. ii. Examples: Employee, Printer, etc. Object or Instance (noun) i. An object is a particular entity which has attributes and behaviours as defined by a Class. PROGRAMMING LANGUAGE page 5 CH/S7CS/Nov., 2002 ii. Examples: John, Peter, Printer-1, Printer-2, etc. Attribute (noun or adj.) i. An attribute is a property of an object or class. ii. Example: Employee::Name, Employee::HKID, Printer::ModelNumber, Printer::Weight, etc. iii. An attribute can itself be an object or class. iv. Examples: Employee::Product, etc. Message, Event i. Objects communicate with each other through message or event passing. ii. Examples: Print, Save, Load, etc. Method i. Methods are the functions defined by the class or object. ii. Some methods are used to handle events. iii. Examples: Print, Save, Load, etc. iv. Some methods are used to process information. v. Examples: CalculateAsset, FormatPage, etc. Employee FormatPage Form PrintPage Printer Constructor VS Destructor i. Constructor is a special method of a class which is executed once when the object is first instantiated (created). A good place to put initialisation and setup code for the object. ii. Destructor is a special method of a class which is executed once when the object is destroyed. A good place to release resources holding by the object. OO methods The THREE OO programming methods: Inheritance, Encapsulation and Polymorphism. Encapsulation What? i. It separate the interface from the implementation. ii. It provide a well-define public interface to the user while hiding all internal complexities from them. Why? A well-designed interface can save the user from having to know the complexities of the implementation in order to use your class/object. How? In OO terms, an interface composes of methods and attributes visible to the user of your class. An Employee class: i. Identity the methods of an Employee from the context of the user, e.g. SignIn, SingOut, etc. ii. Identity the attributes of an Employee from the context of the user, e.g. StaffID, Position, PROGRAMMING LANGUAGE page 6 CH/S7CS/Nov., 2002 Department, Salary, etc. Inheritance What? Inheritance specifies the relations between classes having similar properties. is-a-kind-of Base Class (for class Employee) Derive Class (from class People) Employee Why? i. ii. People It improves software reuse, eases software maintenance and eases software integration. is-a-kind-of is-a-kind-of FullTime Employee PartTime Employee OOD provides two class types, i.e. Base Class (Super-class) and Derive Class (Sub-class) Class Inheritance. i. A derive class inherits all methods and attributes of the base class. ii. A derive class conceptually forms an “is-a-kind-of ” relationship with the base class. Polymorphism What? A particular function (e.g. Area()) behaves differently according to the class that the object belongs to. This is true even the object is accessed indirectly through a reference of a base class type. Why? i. It simplifies programming by treating derive classes as base classes. pass-by-reference parameter passing in function calls.) (E.g. for ii. It simplifies maintenance by not having to know the exact identity of the objects, making the code more general and extensible when new derive classes are created. Abstraction. Allows a designer to ignore details and remain focused on the big picture. Start with a general system outline and progressively add more detail. (top-down approach) Object Relationships Is-a-kind-of relationship. i. Represented in OOD as inheritance, a derive class is-a-kind-of base class. ii. E.g. Employee is-a-kind-of People, having all the methods and attributes of People. Consists-of Relationship. i. Occurs when an object is composed of other objects. ii. Represented in OOD as attributes of a class. iii. E.g. People consists-of a Name. Uses Relationship. i. An object (client) uses another object (server) to accomplish some task. ii. An object can be both a client and a server. PROGRAMMING LANGUAGE page 7 CH/S7CS/Nov., 2002 iii. E.g. A Form object uses a Printer object to print the form. Contains Relationship i. Occurs when an object acts as containers for other objects. ii. The containment is dynamic and usually transient, objects can be added to or removed from the container object. iii. Represented in OOD as a container class. iv. E.g. Lists, Queues, Forms, etc. Program Translation Implementation methods The software that provides the high-level language interface to a computer can take several different forms compilers, interpreters and impure interpreters. High-level language program High-level language implementation Operating system Bare machine (machine language interface Figure 1 The layered interfaces, or virtual computers, provided by a typical computer system The software depends not only on the computer’s machine language, but also on a large collection of programs called the operating system that supplies higher-level primitives than those of the machine language. Sample primitives: system resource management, input and output operations, a file management system, program editors, etc. Compiler It goes through all the stages of translation and generates all the user source program codes into machine codes before the program is being executed. Linking may be necessary to connect the user code to the system programs. The user and system code together was sometimes called a load module. Pure Interpreter It allows easy implementation of many source-level debugging operations, because all run-time error message can refer to the source-level units. PROGRAMMING LANGUAGE page 8 CH/S7CS/Nov., 2002 Source program Source program Lexical analyzer Lexical analyzer Lexical units Lexical units Syntax analyzer Syntax analyzer Parse trees Parse trees Intermediate code generator Intermediate code generator Intermediate code Intermediate code Input Interpreter data Code Generator Machine code Computer Input data Computer Results Figure 2 The compilation Figure 3 Results Impure interpretation ** von Neumann bottleneck i. On a von Neumann architecture computers, programs resides in memory but are executed in the processor. ii. Here’s the fetch-decode-execute cycle repeat forever fetch the next instruction decode the instruction execute the instruction iii. The speed of the connection between a computer’s memory and its processor usually determines the speed of computer, because instructions often can be executed faster than they can be moved to the processor for execution. von Neumann bottleneck Impure interpretation They translate high-level language programs to an intermediate language designed to allow easy interpretation. It is faster than pure interpretation because the source language statements are decoded only once. PROGRAMMING LANGUAGE page 9 CH/S7CS/Nov., 2002 From fig. 3, there are three stages of compilation including lexical analysis, syntax analysis and code generation. Source Program Lexical analysis It breaks up the input source codes to the compiler into chunks that are in a form suitable to be analysed by the next stage of the compilation process. The strings of characters representing the source program are broken up into small chunks, called token. It is usual to remove all redundant parts of the source code (such as spaces and comments) during this tokenisation phase. It is also likely in many system that keywords such as END or PROCEDURE will be replaced by a more efficient, shorter token. Interpreter Computer Results Figure 4 Pure interpretation It is the job of the lexical analyser to check that all the keywords used are valid and to group certain symbols with their neighbours so that they can form larger units to be presented in the next stage of the compilation process. A symbol table for programmer-defined identifiers would be created during lexical analysis and would contain details of attributes such as data types. As part of this standardized format, the tokens may be replaced by pointers to symbol tables. Typically entries in the symbol table will show i. ii. iii. iv. v. the identifier or keyword; the kind of item (variable, array, procedure, keyword, etc.); the type of item (integer, real, char, etc.); the run-time address of the item, or its value if it is a constant; and a pointer to accessing information (e.g. for an array, the bounds of the array, or for a procedure, information about each of parameters). Since the lexical analyser spends a great proportion of its time looking up the symbol table, the symbol table must be organised in such a way that entries can be found as quick as possible. Thus, binary search tree may be used. Sample symbol table: 1 2 3 4 5 6 . . item name read pi radius begin writeln no_sides kind of item keyword constant variable keyword keyword array type of item run-time address or value real real 3.14159 (?) integer (?) pointer (?) Syntax Analysis It determines whether the string of input tokens form valid sentences. At this stage the structure of the source program is analysed to see if it conforms to the context-free grammar for the particular language is being compiled. This stage includes PROGRAMMING LANGUAGE page 10 CH/S7CS/Nov., 2002 i. finding out if the number of brackets is correct. (stack may be used, why?) ii. determining the arithmetical operators used within an expression. Complex forms may be broken down into simpler equivalents and more manageable form. The primary formal methods of describing the syntax of programming languages are context-free grammars a formalism that is also known as Backus Naur form and syntax diagram. The syntax of a program language the form of its expressions, statements and program units. The semantic of a program language the meaning of those expression, statements and program units. Backus-Naur Form i. It was presented by Backus in 1959 and Naur in 1960. ii. The BNF is a metalanguage for program languages. that is used to describe another language. iii. It was abstractions for syntactic structures. A Pascal assignment statement, for example, might be represented by the abstraction <assign>. The actual definition of <assign> may be given by <assign> ::= <var> := <expr> iv. The text to the right of ‘::=’ is the definition of the symbol on the left side. definition is called a rule, or production. v. BNF is a generative tool for defining language. The sentences of the language are generated through repeated application of the rules, and such generation is called a derivation. A metalanguage is a language The BNF example 1: A grammar for a small language <program> ::= begin <stmt_list> end <stmt_list> ::= <stmt> | <stmt> ; <stmt_list> <stmt> ::= <var> := <expression> <var> ::= A | B | C <expression> ::= <var> + <var> | <var> - <var> | <var> The above small language has only one statement form, assignment, of which the right hand side allows either a single variable, or two variables and either a + or - operator. The only allowable variable names are A, B and C. Here is a sample program: begin A := B + C; B := C end A derivation of this program in this language follows: <program> begin <stmt_list> end begin <stmt> ; <stmt_list> end begin <var> := <var> + <var> ; <stmt_list> end begin A := <var> + <var> ; <stmt_list> end begin A := B + <var> ; <stmt_list> end begin A := B + C; <stmt_list> end begin A := B + C; <stmt> end begin A := B + C; <var> := <var> end begin A := B + C; B := <var> end begin A := B + C; B := C end PROGRAMMING LANGUAGE page 11 CH/S7CS/Nov., 2002 vi. Example 2. A grammar for simple assignment statements. <assign> ::= <id> := <expr> <id> ::= A | B | C <expr> ::= <id> + <expr> | <id> * <expr> | ( <expr> ) | <id> Remember one of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they defined. Such hierarchical structures are called parse tree. Thus the statement: <assign> A:= B * (A + C) <id> := <expr> A <id> * <expr> B ( <expr> ) <id> + <expr > can be generated by the derivation and form the corresponding parse tree. <assign> ::= <id> := <expr> ::= A := <expr> ::= A := <id> * <expr> ::= A := B * <expr> ::= A := B * (<expr>) ::= A := B*(<id> + <expr>) ::= A := B * (A + <expr>) ::= A := B * (A + <id>) ::= A := B * (A + C) A <id> C Figure 5. A parse tree for a simple assignment. However some grammar are ambiguous, e.g. the sentence A := B + C * A has two distinct parse trees as show in fig. 6 <assign> <assign> <id> := <expr> <id> := <expr> A <expr> + <expr> A <expr> * <expr> <id> <expr> * <expr> <expr> + <expr> <id> B <id> <id> <id> <id> A C A B C <assign> <id> <expr> ::= <id> := <expr> ::= A | B | C ::= <expr> + <expr> | <expr> * <expr> | (<expr>) | <id> Figure 6 Two distinct parse trees for the same grammar. PROGRAMMING LANGUAGE page 12 CH/S7CS/Nov., 2002 viii. Example 3 : An unambiguous grammar for expression <assign> ::= <id> := <expr> <id> ::= A | B | C <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= ( <expr> ) | <id> <assign> <id> := <expr> A <expr> + <term> <term> <term> * <factor> <factor> <id> <id> <id> A B C Figure 7 The unique parse tree using an unambiguous grammar <factor> This grammar generates the same language as the BNF example 2, but it indicates the proper procedure order of multiply and add operators. A derivation of the sentence A := B + C * A will form a unique parse tree. i. Assoicativity of Operators a) The assignment A := B + C + A should form a parse tree as follows: b) Thus B + C is calculated first rather than C + A. associativity. Such is called left When a BNF rule has its LHS also appears the beginning of its RHS, the rule is said to be left recursive, which specifies left associativity. <assign> <id> := <expr> A <expr> + <term> <expr> + <term> <factor> <term> <factor> <id> <factor> <id> A <id> C B c) When a BNF rule has its LHS also appears the end of its RHS, the rule is said to be right recursive, which specifies right associativity. d) Rules such as <factor> <exp> ** <factor> | <exp> <exp> ( <expr> ) | <id> could be used to describe exponentiation as a right associative operator. Extended Backus-Naur Form (EBNF) i. Three extension from BNF a) [ ] Optional part of an RHS e.g. if...then...else in Pascal: <if_stmt> ::= if <logic_expr> then <stmt> [else <stmt>] PROGRAMMING LANGUAGE page 13 CH/S7CS/Nov., 2002 {} the part which can be repeated indefinitely b) e.g. list of identifiers: <ident_list> ::= <identifier> { , <identifier> } c) ( ) A group from which a single element must be chosen. e.g. for...do loop <for_stmt> ::= for <var> := <expr> to <expr> do downto Alternately, <for_stmt> ::= for <var> := <expr> ( to | downto ) <expr> do Syntax Graph (Syntax Diagram) i. e.g. The syntax diagram describing Ada if statement is as follows: if-stmt if condition else-if else-if else elsif then stmts end-if ; then stmts stmts condition Figure 10: The syntax graph description of the Ada if statement. ii. iii. Two kinds of nodes: a) Terminal symbols Circles and ellipses contain terminal symbols, which are lexemes in the language whose syntax is being described. b) Non-terminal symbols Rectangles, each containing the name of a syntactic unit, or abstraction. Advantage: Easier to understand, by allowing us to visualise it. Semantic Analysis (reference only) There is no universal method of describing semantics. Three methods: Operational, Operational semantics i. ii. Axiomatic and Denotational. To use operational semantics to describe the semantics of a programming language requires the construction of two components. a) transfer to convert statement to a close low-level language for a virtual machine. b) the virtual machine itself. e.g. Describing Pascal for...do loop Pascal statement Operational semantics for I := first to last do I := first begin loop: if I > last goto out . . . . end I := I + 1 goto loop out: ... PROGRAMMING LANGUAGE page 14 CH/S7CS/Nov., 2002 Evaluation. It provides an effective means of describing semantics for language users and language implementers, as long as the descriptions are kept as simple and informal as possible. iii. Axiomatic Semantics i. It is based on the mathematical logic. ii. Precondition A predicate, or an assertion, immediately before a statement describes the constraints on the program variable. iii. Postcondition An assertion immediately following a statement describe the new constraints on those variables. iv. e.g. if postcondition { sum > 11 } follows the statement sum := 2 * x + 1, then one possible precondition is { x > 10}. i.e. { x > 10 } sum := 2 * x + 1 { sum > 11 } v. The weakest precondition is the least restriction that will guarantee the validity of the associated postcondition. For the above example, the preconditional {x > 10}, {x > 2000} and {x>15.5} are all valid, but weakest one should be {x > 5}. vi. e.g. a) The postcondition of the statement a := b/2 -1 is {a < 10}. The weakest precondition is {b<22}. Thus {b < 22} a := b/2 -1 {a < 10}. b) In general, {PxE} x := E {P} where x E means substituting E for every occurrence of x in the postcondition. c) There is a wp transformer function used as follows wp( x := E, P) = PxE Sequence vii. If {P1} S1 {P2} and {P2} S2 {P3} we get {P1} S1, S2 {P3}. S1 is S2 is If x1 := E1 x2 := E2 and then we get {P3x2E2} x2 := E2 {P3} {(P3x2x2)x1E1} x1 := E1 {P3x2E2} viii. For while loop while y <> x do y := Y + 1 { y = x} For 0 iteration, the weakest precondition is { y = x } For 1 iteration, wp(y := y+1, {y = x}) = {y=x-1} For 2 iteration, wp(y := y+2, {y = x}) = {y=x-2} For 3 iteration, wp(y := y+3, {y = x}) = {y=x-3} If the postcondition of the loop is loop termination. The weakest precondition is {y x } ix. Evaluation a) A powerful tool for research into program correctness proofs. b) No general methods of creating the predicate transformers function, thus the usefulness is limited. Denotational Semantics i. It defines both a mathematical object for each language entity and a function that maps instances of that entity onto instance of the mathematics object. PROGRAMMING LANGUAGE page 15 CH/S7CS/Nov., 2002 ii. e.g. BNF of binary number <bin_num> 0 |1 | <bin_num> 0 | <bin_num> 1 iii. The semantics function N maps the abstract syntax to the objects in N is as follows: N[[ 0 ]] = 0 N[[ 1 ]] = 1 N[[ <bin_num> 0 ]] = 2 * N[[ <bin_num> ]] N[[ <bin_num> 1 ]] = 2 * N[[ <bin_num> ]] + 1 iv. Evaluation. a) In a similar but complex way objects and functions can be defined for the other syntactic entities of programming languages. This provides a framework for thinking in a highly rigorous way about programming, as well as a method of proving the correctness of programs. b) It can be used as an aid to language design. Attribute Grammars An attribute grammar is a grammar with the following additions: i. Associated with each grammar symbol X is a set of attributes A(X). two disjoint sets, synthesized attributes and inherited attributes. ii. Associated with each grammar rule is a set of semantic functions and a possibly empty set of predicate functions over the attributes of the symbols in the grammar rule. iii. For a rule X0 X1...Xn, a) b) The set consists of The synthesized attributes of X0 are computed with a semantic function of the form S(X0) = f(A(X1), ... , A(Xn)) meaning that their values depend only on the attribute values of their parent nodes. The inherited attributes of Xj, 1 j n, are computed with a semantic function of the form I(Xj) = f(A(X0)) meaning that their values depend only on the attribute values of their parent nodes. Intrinsic Attributes. They are synthesized attributes of leaf nodes, where values are determined outside the parse tree. e.g. The data type of a variable in program could come from a table a symbol table. Example: An attribute grammar for simple assignment statement. 1. Syntax rule: Semantic rule: PROGRAMMING LANGUAGE <assign> <var> := <expr> <var>.env <assign>.env <expr>.env <assign>.env <assign>.lhs_type <var>.actual_type <expr>.expected_type <assign>.lhs_type page 16 CH/S7CS/Nov., 2002 2. Syntax rule: Semantic rule: Predicate: 3. Syntax rule: Semantic rule: Predicate: 4. Syntax rule: Semantic rule: <expr> <var>[2] + <var>[3] <var>[2].env <expr>.env <var>[3].env <expr>.env <expr>.actual_type if (<var>[2].actual_type = int_type) and (<var>[3].actual_type = int_type) then int_type else real_type end if <expr>.actual_type = <expr>.expected_type <expr> <var> <expr>.actual_type <var>.actual_type <var>.env <expr>.env <expr>.actual_type = <expr>.expected_type <var> A | B | C <var>.actual_type look-up (RHS, <var>.env) actual_type. It is associated with the terminals <var> and <expr>. It is used to stores either int_type or real_type. In case of a variable, the actual type is intrinsic. expected_type. i. An inherited attribute associated with the non-terminal <expr>. stores either int_type or real_type. It is used to ii. It is determined by the type of the variable on the left side of the assignment statement. lhs_type. A synthesized attribute associated with <assign>. It is used to move the value of the synthesized actual)type of the LHS of an assignment statement to the inherited attribute expected for the <expr>. env. An inherited attribute associated with the non-terminals <assign>, <expr> and <var>. It carries the reference to the correct symbol table entries to the instances of variables. env <assign> lhs_type expected_type env <expr> actual_type env actual_type env <var> A actual_type env <var> actual_type <var> := A + B Figure 11 The flow of attributes in the tree PROGRAMMING LANGUAGE page 17 CH/S7CS/Nov., 2002 Figure 12. A fully attributed parse tree env=table_1 <assign> env=table_1 lhs_type = real_type env=table_1 <expr> expected_type=real_type actual_type = real_type env=table_1 <var> actual_type = real_type A <var> env=table_1 actual_type = real_type := A <var> + env=table_1 actual_type = int_type B Evaluation. i. It provides a complete description of the syntax and static semantics of program language; they have been used as the formal definition of language that can be input to a compiler generation system. ii. Difficulties. Its size and complexity; a large parse tree which is costly to be evaluated. Code Generation The code specific to the target machine is generated. As the code is machine code then it is usual for several machine code instructions to be generated for each high level language instruction. e.g. LET A = B + C in Basic. In Code Generation, i. remove the redundant word LET. ii. search for the symbol table to see the locations A, B and C. iii. generate the necessary machine code. It should be reminded that parse trees may often be built before this phase, they can be used in the generation. Routines from the system library may often have to be called up, e.g. write procedure of Pascal. Optimisation. Often the code produced by such methods is not the best that could be obtained. It is possible to make more efficient machine code by carrying out a process which is called optimisation. Reverse Polish notation (Postfix) ** The reverse Polish notation is used to parse and represent compiler. arithmetic expressions in Polish notation is also known as prefix notation because each operator precedes its operand. A ‘Normal’ arithmetic expression is as follows (3+5)x(9-7) This is called infix notation because all the operators are inside the expression. notation of it will be as follows The Polish x+35-97 PROGRAMMING LANGUAGE page 18 CH/S7CS/Nov., 2002 The Polish notation has the advantage that there can be no ambiguity in the way that an arithmetic expression can be worked out. It also needs no parentheses to separate the different parts. Another notation is the reverse Polish (or postfix) notation which is very similar in principle and also forms a parentheses-free notation. However, this time reverse Polish notation is particularly suited to computerised methods because of the ability to deal with such expression easily by using a stack. The reverse Polish notation of the above expression is as follows 35+97-x This leads to the following very simple rules for evaluating such expressions:: i. The next symbol encountered must be loaded on to the stack if it is an operand, i.e. a number or variable which is to be operated upon. ii. If the next symbol to be encountered is an operator, i.e. +, /, -, etc. then carry out the required operation on the top two items in the stack. The result of this operation must be left on the top of the stack. To convert an infix string of arithmetic expression to postfix one, a stack and a table of order of precedence should be used. Assume the following rules of precedence are used: Operator Precedence () & ^ *&/ +&= 3 2 1 0 PROGRAMMING LANGUAGE page 19 CH/S7CS/Nov., 2002 The algorithm of the conversion is shown in the following flowchart: Start Read symbol Stop Put on stack ( Other Test Error report Operand Output to postfix string Remove top of stack Read next symbol ( Look at top of stack ) Test Empty Operand Output top of stack to postfix string Yes Space Stack empty ? Yes Test Operation Yes Is operation of higher precedence than that on stack or stack empty? No Stop Is ( on top of stack? No Output top of stack to postfix string No ` Output top of stack to postfix string Figure 13 To convert the expression V+W^X*Y/(Z-1): Symbol being considered Output Postfix String Stack ( bottom) V + W ^ X * Y / ( Z 1 ) end of string V V VW VW VWX VWX^ VWX^Y VWX^Y* VWX^Y* VWX^Y*Z VWX^Y*Z VWX^Y*Z1 VWX^Y*Z1VWX^Y*Z1-/+ + + ^+ ^+ *+ *+ /+ (/+ (/+ -(/+ -(/+ /+ stack empty PROGRAMMING LANGUAGE page 20