Muhammad Abdullah Falak BSCS 5TH C 13961 COMPILER CONSTRUCTION MA’AM FARAH NAZ ASSIGNMENT # 01 PROBLEM STATEMENT: Explain the followings: • • • • • Languages Translator Types of Languages Translators Cousins of Compiler Phases of Compiler Overview of Compiler SOLUTION: 1. Languages Translator: Computers are electronic devices that can only understand machine-level binary code (0/1 or on/off), and it is extremely difficult to understand and write a program in machine language, so developers use human-readable high-level and assembly instructions. To bridge that gap, a translator is used, which converts high-level instructions to machine-level instructions (0 and 1). The translator is a programming language processor that converts a high-level or assembly language program to machine-understandable low-level machine language without sacrificing the code's functionality. A translator is a programming language processor that modifies a computer program from one language to another. It takes a program written in the source program and modifies it into a machine program. It can find and detect errors during translation. 2. Types of Languages Translators: There are various types of a translator which are as follows − Compiler − A compiler is a program that translates a high-level language (for example, C, C++, and Java) into a low-level language (object program or machine program). The compiler converts high-level language into low-level language using various phases. A character stream inputted by the customer goes through multiple stages of compilation which at last will provide target language. • Pre-Processor − Pre-Processor is a program that processes the source code before it passes through the compiler. It can perform under the control of what is referred to as pre-processor command lines or directives. • Assembler − An assembler is a translator who translates an assembly language program into an equivalent machine language program of the computer. Assembler provides a friendlier representation than a computer 0’s and 1’s that simplifies writing and reading programs. An assembler reads a single assembly language source document and creates an object document including machine instructions and bookkeeping data that supports to merge of various object files into a program. • • • • Interpreter − An interpreter is a program that executes the programming code directly rather than only translating it into another format. It translates and executes programming language statements one by one. Macros − Many assembly languages support a “macro” facility whereby a macro statement will translate into a sequence of assembly language statements and possibly other macro statements before being translated into machine code. Therefore, a macro facility is a text replacement efficiency. Linker − Linker is a computer program that connects and combines multiple object files to create an executable file. All these files might have been compiled by a separate assembler. The function of a linker is to inspect and find referenced modules/routines in a program and to decide the memory location where these codes will be loaded creating the program instruction to have an absolute reference. • Loader − The loader is an element of the operating framework and is liable for loading executable files into memory and implementing them. It can compute the size of a program (instructions and data) and generate memory space for it. It can initialize several registers to start execution. It creates a new address space for the program. This address space is huge to influence the text and data segments, along with a stack segment. It can repeat instructions and data from the executable file into the new address space. Cousins of Compiler: 1. Preprocessor 2. Assembler 3. Loader and Link-editor Preprocessor A preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. They may perform the following functions : 1. Macro processing 3. Rational Preprocessors 2. File Inclusion 4. Language extension 1. Macro processing: A macro is a rule or pattern that specifies how a certain input sequence should be mapped to an output sequence according to a defined procedure. The mapping process that instantiates a macro into a specific output sequence is known as macro expansion. 2. File Inclusion: The preprocessor includes header files in the program text. When the preprocessor finds an #include directive it replaces it with the entire content of the specified file. 3. Rational Preprocessors: These processors change older languages with more modern flow-ofcontrol and data-structuring facilities. 4. Language extension: These processors attempt to add capabilities to the language by what amounts to built-in macros. For example, the language Equal is a database query language embedded in C. Assembler The assembler creates object code by translating assembly instruction mnemonics into machine code. There are two types of assemblers: · One-pass assemblers go through the source code once and assume that all symbols will be defined before any instruction that references them. · Two-pass assemblers create a table with all symbols and their values in the first pass, and then use the table in a second pass to generate code Fig. 1.7 Translation of a statement Linker and Loader A linker or link editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program. Three tasks of the linker are 1. Searches the program to find library routines used by program, e.g. printf(), and math routines. 2. Determines the memory locations that code from each module will occupy and relocates its instructions by adjusting absolute references 3. Resolves references among files. A loader is the part of an operating system that is responsible for loading programs in memory, one of the essential stages in the process of starting a program. Phases of Compiler: The 6 phases of a compiler are: 1. 2. 3. 4. 5. 6. Lexical Analysis Syntactic Analysis or Parsing Semantic Analysis Intermediate Code Generation Code Optimization Code Generation 1.Lexical Analysis: Lexical analysis or Lexical analyzer is the initial stage or phase of the compiler. This phase scans the source code and transforms the input program into a series of a token. A token is basically the arrangement of characters that defines a unit of information in the source code. NOTE: In computer science, a program that executes the process of lexical analysis is called a scanner, tokenizer, or lexer. You can gain in-depth knowledge of lexical analysis to get a better understanding. Roles and Responsibilities of Lexical Analyzer • It is accountable for terminating the comments and white spaces from the source program. • It helps in identifying the tokens. • Categorization of lexical units. 2. Syntax Analysis: In the compilation procedure, the Syntax analysis is the second stage. Here the provided input string is scanned for the validation of the structure of the standard grammar. Basically, in the second phase, it analyses the syntactical structure and inspects if the given input is correct or not in terms of programming syntax. It accepts tokens as input and provides a parse tree as output. It is also known as parsing in a compiler. Roles and Responsibilities of Syntax Analyzer • • • • Note syntax errors. Helps in building a parse tree. Acquire tokens from the lexical analyzer. Scan the syntax errors, if any. 3. Semantic Analysis: In the process of compilation, semantic analysis is the third phase. It scans whether the parse tree follows the guidelines of language. It also helps in keeping track of identifiers and expressions. In simple words, we can say that a semantic analyzer defines the validity of the parse tree, and the annotated syntax tree comes as an output. Roles and Responsibilities of Semantic Analyzer: • Saving collected data to symbol tables or syntax trees. • It notifies semantic errors. • Scanning for semantic errors. 4. Intermediate Code Generation: The parse tree is semantically confirmed; now, an intermediate code generator develops three address codes. A middlelevel language code generated by a compiler at the time of the translation of a source program into the object code is known as intermediate code or text. Few Important Pointers: • • • A code that is neither high-level nor machine code, but a middle-level code is an intermediate code. We can translate this code to machine code later. This stage serves as a bridge or way from analysis to synthesis. Roles and Responsibilities: • Helps in maintaining the priority ordering of the source language. • Translate the intermediate code into the machine code. • Having operands of instructions. 5. Code optimizer: Now coming to a phase that is totally optional, and it is code optimization. It is used to enhance the intermediate code. This way, the output of the program is able to run fast and consume less space. To improve the speed of the program, it eliminates the unnecessary strings of the code and organizes the sequence of statements. Roles and Responsibilities: • Remove the unused variables and unreachable code. • Enhance runtime and execution of the program. • Produce streamlined code from the intermediate expression. 6. Code Generator: The final stage of the compilation process is the code generation process. In this final phase, it tries to acquire the intermediate code as input which is fully optimized, and map it to the machine code or language. Later, the code generator helps in translating the intermediate code into the machine code. Roles and Responsibilities: • • Translate the intermediate code to the target machine code. Select and allocate memory spots and registers. Overview of Compiler Computers are a balanced mix of software and hardware. Hardware is just a piece of mechanical device and its functions are being controlled by a compatible software. Hardware understands instructions in the form of electronic charge, which is the counterpart of binary language in software programming. Binary language has only two alphabets, 0 and 1. To instruct, the hardware codes must be written in binary format, which is simply a series of 1s and 0s. It would be a difficult and cumbersome task for computer programmers to write such codes, which is why we have compilers to write such codes. Language Processing System We have learnt that any computer system is made of hardware and software. The hardware understands a language, which humans cannot understand. So we write programs in high-level language, which is easier for us to understand and remember. These programs are then fed into a series of tools and OS components to get the desired code that can be used by the machine. This is known as Language Processing System. The high-level language is converted into binary language in various phases. A compiler is a program that converts high-level language to assembly language. Similarly, an assembler is a program that converts the assembly language to machine-level language. Let us first understand how a program, using C compiler, is executed on a host machine. • • • • • User writes a program in C language (high-level language). The C compiler, compiles the program and translates it to assembly program (low-level language). An assembler then translates the assembly program into machine code (object). A linker tool is used to link all the parts of the program together for execution (executable machine code). A loader loads all of them into memory and then the program is executed. Before diving straight into the concepts of compilers, we should understand a few other tools that work closely with compilers. Preprocessor A preprocessor, generally considered as a part of compiler, is a tool that produces input for compilers. It deals with macro-processing, augmentation, file inclusion, language extension, etc. Interpreter An interpreter, like a compiler, translates high-level language into low-level machine language. The difference lies in the way they read the source code or input. A compiler reads the whole source code at once, creates tokens, checks semantics, generates intermediate code, executes the whole program and may involve many passes. In contrast, an interpreter reads a statement from the input, converts it to an intermediate code, executes it, then takes the next statement in sequence. If an error occurs, an interpreter stops execution and reports it. whereas a compiler reads the whole program even if it encounters several errors. Assembler An assembler translates assembly language programs into machine code. The output of an assembler is called an object file, which contains a combination of machine instructions as well as the data required to place these instructions in memory. Linker The linker is a computer program that links and merges various object files together in order to make an executable file. All these files might have been compiled by separate assemblers. The major task of a linker is to search and locate referenced modules/routines in a program and to determine the memory location where these codes will be loaded, making the program instruction have absolute references. Loader The loader is a part of the operating system and is responsible for loading executable files into memory and executing them. It calculates the size of a program (instructions and data) and creates memory space for it. It initializes various registers to initiate execution. Cross-compiler A compiler that runs on the platform (A) and is capable of generating executable code for the platform (B) is called a cross-compiler. Source-to-source Compiler A compiler that takes the source code of one programming language and translates it into the source code of another programming language is called a source-to-source compiler.