Chapter II: Assembler Chapter goal: Overview: Introduce the fundamental Basic Assembler Functions functions that any assembler must perform. Machine-Dependent Assembler Features Assign machine address Translate mnemonic operation codes to machine Machine-Independent Assembler Features language equivalents. Assembler Design Options 2: Assembler 1 Basic Assembler Functions (Using SIC as an Example) Assembler directives: START : Specify name and starting address for the program END : … BYTE : Generate character or hexdecimal constant WORD: Generate one-word integer constant RESB : Reserve the indicated number of bytes for a data area RESW : … 2: Assembler 2 Example of a SIC Assembler Language Program 2: Assembler 3 Example of a SIC Assembler Language Program (cont.) 2: Assembler 4 Example of a SIC Assembler Language Program (cont.) 2: Assembler 5 A Simple SIC Assembler The translation steps Convert mnemonic operation codes to their machine language equivalent. Convert symbolic operands to their equivalent machine addresses. Build the machine instructions in the proper format. Convert the data constants specified in the source program into their internal machine representations. Write the object program and the assembly listing. 2: Assembler 6 Output: the object program 2: Assembler 7 The Object code for the above program 2: Assembler 8 The Object code for the above program (cont.) 2: Assembler 9 The Object code for the above program (cont.) 2: Assembler 10 The Format for Object Program The object program will later be loaded into memory for execution. Three types of records for object program format Header: contains the program name, starting address, and length. Text: contains the translated instructions and data of the program End: marks the end of the object program and specifies the address in the program where execution is to begin. 2: Assembler 11 The Format for Object Program (cont.) 2: Assembler 12 The object program 2: Assembler 13 Two Passes of our Simple Assembler 2: Assembler 14 The Data Structures Two major data structures: Operation code table (OPTAB) Symbol table (SYMTAB) Note: SYMTAB is usually organized as a hash table for efficiently of insertion and retrieval. Location counter (LOCCTR) 2: Assembler 15 The Algorithm (Pass 1) 2: Assembler 16 The Algorithm (Pass 2) 2: Assembler 17 Machine-Dependent Assembler Features (using SIC/XE as an example) Addressing modes Immediate addressing modes: COMP #0 Indirect addressing: J @RETADR The extended instruction format +LDT #4096 Most of the register-to-memory instructions are assembled using either program-counter relative or base relative addressing. If either program-counter relative nor base relative addressing can be used, then the 4-byte (Format 4) must be used.. 2: Assembler 18 Example of a SIC/XE Assembler Language Program 2: Assembler 19 Example of a SIC/XE Assembler Language Program (cont.) 2: Assembler 20 Example of a SIC/XE Assembler Language Program (cont.) 2: Assembler 21 Output: the object program 2: Assembler 22 The Object code for the above program 2: Assembler 23 The Object code for the above program (cont.) 2: Assembler 24 The Object code for the above program (cont.) 2: Assembler 25 Program Relocation An object program that contains the information necessary to perform this kind of modification is called a relocatable program. 2: Assembler 26 Program Relocation (cont.) We can solve the relocation problem in the following way: 1. When the assembler generates the object code for the JSUB instruction we are considering, it will insert the address of RDREC relative to the start of the program. (This is the reason we initialized the location counter to 0 for the assembly) 2. The assembler will also produce a command for the loader, instructing it to add the beginning address of the program to the address field in the JSUB instruction at load time. 2: Assembler 27 Program Relocation (cont.) 2: Assembler 28 Program Relocation (cont.) 2: Assembler 29 Machine-Independent Assembler Features Literals Symbol-Defining Statements Expressions Program Blocks Control Sections and Program Linking 2: Assembler 30 Literal It is often convenient for the programmer to be able to write the values of a constant operand as a part of the instruction that uses it. Such an operands is called a literal. E.g., (In Fig 2.9) 45 215 001A ENDFIL 1062 WLOOP LDA =C’EOF’ TD =X’05’ 032010 E32011 The difference between a literal and an immediate operand. With immediate addressing, the operand value is assembled as part of the machine instruction. With a literal, the assembler generate the specified value as a constant at some other memory location. 2: Assembler 31 Literal (cont.) Literal pools: Normally literals are placed into a pool at the end of the program. The assembly listing of a program containing literals usually includes a listing of this literal pool, which shows the assigned addresses and the generated data values. The assembler directive LTORG is used for creating the literal pool. 2: Assembler 32 Program demonstrating additional assembler features 2: Assembler 33 Program demonstrating additional assembler features (cont.) 2: Assembler 34 Program demonstrating additional assembler features (cont.) 2: Assembler 35 The above program with object code 2: Assembler 36 The above program with object code (cont.) 2: Assembler 37 The above program with object code (cont.) 2: Assembler 38 Symbol-Defining Statements Most assembler provides an assembler directive that allows the programmer to define symbols and specify their values. The assembler directive : EQU E.g., symbol EQU value Usage sample: +LDT #4096 +LDT MAXLEN #MAXLEN EQU 4096 2: Assembler 39 Symbol-Defining Statements (An example…) STAB FLAGS RESB 1100 EQU EQU EQU LDA VALUE,X SYMBOL VALUE STAB STAB+6 STAB+9 2: Assembler 40 Expressions Assembler generally allow arithmetic expressions formed according to the normal rules using the operators +, -, * , and / E.g., MAXLEN EQU BUFEND-BUFFER 2: Assembler 41 Program Blocks The source program logically contained subroutines, data areas, etc. However they were handled by the assembler as one entity, resulting in a single block of object code. Note: The term program blocks refer to segments of code that are rearranged within a single object program unit, and control section to refer to segments that are translated into independent object program units. The assembler directive USE indicates which portions of the source program belong to the various blocks. 2: Assembler 42 Example of a program with multiple program blocks 2: Assembler 43 Example of a program with multiple program blocks (cont.) 2: Assembler 44 Example of a program with multiple program blocks (cont.) 2: Assembler 45 The above program with object code 2: Assembler 46 The above program with object code (cont.) 2: Assembler 47 The above program with object code (cont.) 2: Assembler 48 Program Blocks Pass 1 Use separate location counter for each program block. Pass 2 The assembler needs the address for each symbol relative to the start of the object program. 2: Assembler 49 The object program 2: Assembler 50 The loading processes 2: Assembler 51 Control sections and program linking A control section is a part of the program that maintain its identity after assembly; each such control section can be loaded and relocated independently of the others. Note: 1. The assembler has no idea where any other control section will be loaded at execution time. 2. The reference between control sections are called external reference . Two assembler directive: 1. EXTDEF 2. EXTREF : defined the external symbol that may be used by other sections. : named the symbols that are used in this control section and defined elsewhere. 2: Assembler 52 Illustration of control sections and program linking 2: Assembler 53 Illustration of control sections and program linking (cont.) 2: Assembler 54 Illustration of control sections and program linking (cont.) 2: Assembler 55 The above program with object code 2: Assembler 56 The above program with object code (cont.) 2: Assembler 57 The above program with object code (cont.) 2: Assembler 58 Control sections and program linking (cont.) The two new record types are Define and Refer. 2: Assembler 59 The object program 2: Assembler 60 Assembler Design Options – One-pass Assembler Main problem: One need to solve the forward reference problem. Solution: Require all such areas be defined in the source program before they are referenced. In order to reduce the size of the problem, many one-pass assemblers prohibit forward reference to data items. Usually one-pass assembler generate object code in memory for immediate execution. No object program is written out, and no loader is needed. --------- load-and-go assembler. 2: Assembler 61 Assembler Design Options – One-pass Assembler (cont.) If an instruction operand is a symbol that has not yet been defined, the operand address is omitted when the instruction is assembled. The address of the operand field of the instruction that refers to the undefined symbol is added to a list of forward references associated with the symbol table entry. When the definition for a symbol is encountered, the forward reference list for that symbol is scanned, and the proper address is inserted into any instructions previously generated. 2: Assembler 62 Sample program for a one-pass assembler 2: Assembler 63 Sample program for a one-pass assembler (cont.) 2: Assembler 64 Sample program for a one-pass assembler (cont.) 2: Assembler 65 Object code in memory and symbol table entries for above program (after scanning line 40) 2: Assembler 66 Object code in memory and symbol table entries for above program (after scanning line 160) 2: Assembler 67 Object program from one-pass assembler for above program 2: Assembler 68 Assembler Design Options – Multi-pass Assembler 2: Assembler 69 Example of multi-pass assembler operation 2: Assembler 70 Example of multi-pass assembler operation (cont.) 2: Assembler 71 Example of multi-pass assembler operation (cont.) 2: Assembler 72 Example of multi-pass assembler operation (cont.) 2: Assembler 73 Example of multi-pass assembler operation (cont.) 2: Assembler 74