COMPILER Presented to: Sir Naeem Presented by: Sahil (BSIT-2024-015) 1 COMPILER A compiler is a computer program that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter takes binary form known as object code It create an executable program 2 Cause Software for early computers was written in assembly language The benefits of reusing software on different CPUs started to become significantly greater than the cost of writing a compiler The first real compiler FORTRAN compilers of the late 1950s 18 person-years to build 3 Structure of Compiler Any compiler must perform two major tasks Analysis of the source program Synthesis of a machine-language program 4 THE STRUCTURE OF A COMPILER (2) Source Program (Character Stream) Scanner Tokens Parser Syntactic Semantic Structure Routines Intermediate Representation Symbol and Attribute Tables Optimizer (Used by all Phases of The Compiler) Code Generator 5 Target machine code THE STRUCTURE OF A COMPILER (3) Source Program (Character Stream) Scanner Tokens Parser Syntactic Semantic Structure Routines Intermediate Representation Scanner The scanner begins the analysis of the source program by reading the input, character by character, and grouping Symbol and characters into individual words and symbols (tokens) Attribute Tables Optimizer The Compiler) Code Generator RE ( Regular expression ) NFA ( Non-deterministic Finite Automata ) (Used by DFA ( Deterministic Finite Automata ) all LEX Phases of 6 Target machine code THE STRUCTURE OF A COMPILER (4) Source Program (Character Stream) Scanner Parser Tokens Parser Syntactic Semantic Structure Routines Given a formal syntax specification (typically as a [CFG] ), the parse reads tokens and groups them icontext-free Symbol and grammar nto units as specified by the productions of the Attribute CFG being used. Tables As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax (Used by all tree. Phases of CFG ( Context-Free Grammar ) BNF ( Backus-Naur Form )The Compiler) GAA ( Grammar Analysis Algorithms ) LL, LR, SLR, LALR Parsers 7 YACC Intermediate Representation Optimizer Code Generator Target machine code THE STRUCTURE OF A COMPILER (5) Source Program (Character Stream) Scanner Tokens Semantic Routines Parser Perform two functions Check the static semantics of each construct Symbol and Do the actual translation Attribute The heart of a compiler Tables Syntactic Semantic Structure Routines Intermediate Representation Optimizer Syntax Directed Translation Semantic Processing Techniques (Used IR (Intermediate Representation) by all Phases of The Compiler) 8 Code Generator Target machine code THE STRUCTURE OF A COMPILER (6) Source Program (Character Stream) Optimizer Scanner Tokens Parser Syntactic Semantic Structure Routines The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but Symbol and improved IR code Attribute This phase can be very complex and slow Tables Peephole optimization loop optimization, register allocation, code scheduling (Used by all Phases of Register and Temporary Management Peephole Optimization The Compiler) 9 Intermediate Representation Optimizer Code Generator Target machine code THE STRUCTURE OF A COMPILER (7) Source Program (Character Stream) Scanner Code Generator Interpretive Code Generation Generating Code from Tree/Dag Grammar-Based Code Generator Tokens Parser Syntactic Semantic Structure Routines Intermediate Representation Optimizer Code Generator 10 Target machine code THE STRUCTURE OF A COMPILER (8) Code Generator [Intermediate Code Generator] Non-optimized Intermediate Cod Scanner [Lexical Analyzer] Tokens Code Optimizer Parser [Syntax Analyzer] Optimized Intermediate Code Parse tree Code Optimizer Semantic Process [Semantic analyzer] Abstract Syntax Tree w/ Attributes 11 Target machine code Language Description Identifier Rules •Identifier can be of maximum length 6. •Identifiers are not case sensitive. •An Indetifier can only have alphanumeric characters( a-z , A-Z , 0-9 ) and underscore(_). •The first character of an identifier can only contain alphabet( a-z , A-Z ). •Keywords are not allowed to be used as Identifiers. •No special characters, such as semicolon, period, whitespaces, slash or comma are permitted to be used in or as Identifier. 12 Data Types: Our language supports only 3 datatypes •Integer •String •Character Expressions 1.Arithmetic operators (+, -, *, /, %) 2.Uniray operator 3.Paranthesis 4.Only Integer supported 5.Relational expression to be supported (>, <, >=, <=, ==, !=) 6. Character string and integer constants 13 Statements •Declaration statement : int a; •Declaration and Initialisation : int a=5; •Assingment Statement : a=6; Conditional statement Simple if (nesting not allowed) if then Endif Switch Statement (nesting not allowed) Switch() Cases Value 1: Break; Value n: break; Endcase 14 Repetition Statement (nesting not allowed) a.Repeat Until () a.While (relational expression) Endwhile a.For = start value, end value, inc/dec ……… Endfor 4 I/O Statement •Input ; •Output ; Program Structure Decleration: Start End 15 1.Sample Program I #mode 10 declaration int r int c int in int flg start r=0 flg = 1 while( flg == 1 ) if( c == 0) then flg = 0 endif c = c-1 endwhile end 16 OUTPUT 1 START: MOV AX, @DATA MOV DS, AX MOV AX, MOV r, AX MOV AX, MOV flg, AX LB01: MOV AX, CMP AX, JNE LB01 MOV AX, CMP AX, JNE LB01 MOV AX, MOV flg, AX LB02: MOV AX, SUB AX, MOV c, AX JMP LB01 LB03: MOV AX, 4C00H INT 21H END START 17 Sample Program II #mode 10 declaration int a ; b int i int k string mes1 end start k=k*1 if(i<9 )then i=i+9 k=k*1 endif i=i-45 repeat i=i+9*k+b k=k*1 output "Hello World" input k until(i<2 ) while(k>3 ) i=i+9 k=k*1 endwhile 18 OUTPUT START: LB01: MOV AX, @DATA MOV AX, i MOV DS, AX SUB AX, 45 MOV AX, k MUL 1 MOV i, AX LB02: MOV k, AX MOV AX, i MOV AX, i ADD AX, 9 CMP AX, 9 MUL k JGE LB01 ADD AX, b MOV AX, i MOV i, AX ADD AX, 9 MOV AX, k MOV i, AX MUL 1 MOV AX, k MOV k, AX MUL 1 MOV k, AX 19 OUTPUT LEA DX, "Hello World" MUL 1 CALL MESSAGE MOV k, AX CALL INDEC JMP LB01 MOV k, AX MOV AX, i MOV AX, i ADD AX, 9 CMP AX, 2 MOV i, AX JGE LB01 MOV AX, k LB03: MUL 1 MOV AX, MOV k, AX CMP AX, 3 JLE LB01 JMP LB01 LB04: MOV AX, i MOV AX, 4C00H ADD AX, 9 INT 21H MOV i, AX MOV AX, k END START 20 SCREENSHOTS 21 22 23 Feasibility and future scope With the growth of technology ease of working is given priority. We have emerged from C , C++ to python ,ruby , etc. which require less lines of code . Our project can be extended to form a new language which is easy to learn, faster , has more inbuilt features and has many more qualities of a good programming language. 24 Conclusion In a compiler the process of Intermediate code generation is independent of machine and the process of conversion of Intermediate code to target code is independent of language used. Thus we have done the front end of compilation process. It includes 3 phases of compilation lexical analysis syntax analysis semantic analysis Followed by intermediate code generation. 25 References •Salomaa, Arto [1973]. Formal Languages. Academic Press, New York •Schulz, Waldean A. [1976]. Semantic Analysis and Target Language Synthesis in a Translator.Ph.D. thesis, University of Colorado, Boulder, CO. •https://www.cs.vt.edu/undergraduate/courses/CS4304 26 27