Assembly Language Part 2 Professor Jennifer Rexford COS 217 1 Goals of Today’s Lecture • Machine language Encoding the operation and the operands Simpler MIPS instruction set as an example • More on IA32 assembly language Different sizes of data Example instructions Addressing modes • Layout of assembly language program 2 Machine Language Using MIPS Architecture as an Example (since it has a simpler instruction set than IA32) 3 Three Levels of Languages • High-level languages (e.g., Java and C) Easier programming by describing operations in a natural language Increased portability of the code • Assembly language (e.g., IA32 and MIPS) Tied to the specifics of the underlying machine Instructions and names to make code human readable • Machine language Also tied to the specifics of the underlying machine In binary format the computer can read and execute Every instruction is a sequence of one or more numbers 4 Machine-Language Instructions An ADD Instruction: add r1 = r2 + r3 Opcode (assembly) Operands Parts of the Instruction: • Opcode (verb) – what operation to perform • Operands (noun) – what to operate upon • Source Operands – where values come from • Destination Operand – where to deposit data values Machine-Language Instruction • Opcode What to do • Source operand(s) Immediate (in the instruction itself) Register Memory location I/O port • Destination operand Register Memory location I/O port • Assembly syntax Opcode source1, [source2,] destination 6 MIPS Has Three Kinds of 32-bit Instructions • R: Registers Two source registers (rs and rt) One destination register (rd) E.g., “rd = rs + rt” or “rd = rs & rt” or “rd = rs xor rt” op Operation and specific variant rs rd rt shamt funct Shift amount 7 MIPS Has Three Kinds of 32-bit Instructions • I: Immediate, transfer, branch One source register (rs) and one 16-bit constant (imm) One destination register (rd) E.g., “rd = rs + imm” or “rd = rs & imm” E.g., “rd = MEM[rs + imm]” (treating rs+imm as address) E.g., “jump to address contained in rs” (rs as address) E.g., “jump to word imm if rs is 0” (i.e., change instruction pointer) op rs rd address/immediate 8 MIPS Has Three Kinds of 32-bit Instructions • J: Jump One 28-bit constant (imm) for # of 32-bit words to jump E.g., “jump by imm words” (i.e., change the instruction pointer) op target address 9 MIPS “Add” Instruction Encoding Add registers 18 and 19, and store result in register 17. add is an R inst 0 18 19 17 0 32 10 MIPS “Subtract” Instruction Encoding Subtract register 19 from register 18 and store in register 17 sub is an R inst 0 18 19 17 0 34 11 Greater Detail on IA32 Assembly: Instruction Set and Data Sizes 12 Earlier Example count=0; while (n>1) { count++; if (n&1) n = n*3+1; else n = n/2; } movl .loop: cmpl jle addl movl andl je movl addl addl addl jmp .else: sarl .endif: jmp .endloop: n %edx count %ecx $0, %ecx $1, %edx .endloop $1, %ecx %edx, %eax $1, %eax .else %edx, %eax %eax, %edx %eax, %edx $1, %edx .endif $1, %edx .loop 13 Size of Variables • Data types in high-level languages vary in size Character: 1 byte Short, int, and long: varies, depending on the computer Pointers: typically 4 bytes Struct: arbitrary size, depending on the elements • Implications Need to be able to store and manipulate in multiple sizes Byte (1 byte), word (2 bytes), and extended (4 bytes) Separate assembly-language instructions – e.g., addb, addw, addl Separate ways to access (parts of) a 4-byte register 14 Four-Byte Memory Words 31 24 23 16 15 87 232-1 0 . . . Byte 7 Byte 6 Byte 5 Byte 4 Byte 3 Byte 2 Byte 1 Byte 0 Memory 0 Byte order is little endian 15 IA32 General Purpose Registers 31 15 87 AL BL CL DL AH BH CH DH SI DI 0 16-bit AX BX CX DX 32-bit EAX EBX ECX EDX ESI EDI General-purpose registers 16 Arithmetic Instructions • Simple instructions add{b,w,l} source, dest sub{b,w,l} source, dest Inc{b,w,l} dest dec{b,w,l} dest neg{b,w,l} dest cmp{b,w,l} source1, source2 dest = source + dest dest = dest – source dest = dest + 1 dest = dest – 1 dest = ^dest source2 – source1 • Multiply mul (unsigned) or imul (signed) mull %ebx # edx, eax = eax * ebx • Divide div (unsigned) or idiv (signed) idiv %ebx # edx = edx,eax / ebx • Many more in Intel manual (volume 2) adc, sbb, decimal arithmetic instructions 17 Bitwise Logic Instructions • Simple instructions and{b,w,l} source, dest or{b,w,l} source, dest xor{b,w,l} source, dest not{b,w,l} dest sal{b,w,l} source, dest (arithmetic) sar{b,w,l} source, dest (arithmetic) dest = source & dest dest = source | dest dest = source ^ dest dest = ^dest dest = dest << source dest = dest >> source • Many more in Intel Manual (volume 2) Logic shift Rotation shift Bit scan Bit test Byte set on conditions 18 Branch Instructions • Conditional jump j{l,g,e,ne,...} target if (condition) {eip = target} Comparison > Signed e ne g Unsigned e ne a < ge l le o no ae b be c nc overflow/carry no ovf/carry “equal” “not equal” “greater,above” “...-or-equal” “less,below” “...-or-equal” • Unconditional jump jmp target jmp *register 19 Setting the EFLAGS Register • Comparison cmpl compares two integers Done by subtracting the first number from the second – Discarding the results, but setting the eflags register Example: – cmpl $1, %edx (computes %edx – 1) – jle .endloop (looks at the sign flag and the zero flag) • Logical operation andl compares two integers Example: – andl $1, %eax – je .else (bit-wise AND of %eax with 1) (looks at the zero flag) • Unconditional branch jmp Example: – jmp .endif and jmp .loop 20 EFLAG Register & Condition Codes 31 Reserved (set to 0) 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 I VI VI A V R 0 N IO OD I T S Z 0 A 0 P 1 C P D P F CM F T L F F F F F F F F F Identification flag Virtual interrupt pending Virtual interrupt flag Alignment check Virtual 8086 mode Resume flag Nested task flag I/O privilege level Overflow flag Direction flag Interrupt enable flag Trap flag Sign flag Zero flag Auxiliary carry flag or adjust flag Parity flag Carry flag 21 Data Transfer Instructions • mov{b,w,l} source, dest General move instruction • push{w,l} source pushl %ebx # equivalent instructions subl $4, %esp movl %ebx, (%esp) esp esp • pop{w,l} dest popl %ebx # equivalent instructions movl (%esp), %ebx addl $4, %esp esp esp • Many more in Intel manual (volume 2) Type conversion, conditional move, exchange, compare and exchange, I/O port, string move, etc. 22 Greater Detail on IA32 Assembly: Addressing Modes 23 Ways to Read and Write Data • Processors have many ways to access data Known as “addressing modes” • Two simplest ways (used in earlier example) Immediate addressing: movl $0, %ecx – Data embedded in the instruction – Initialize register ECX with zero Register addressing: movl %edx, %ecx – Data stored in a register – Copy value in register EDX into register ECX • The others all deal with memory addresses To read and write data from main memory E.g., to get data from memory into a register E.g., to write data from a register back in to memory 24 Direct vs. Indirect Addressing • Read or write from a particular memory location Essentially dereferencing a pointer • Direct addressing: movl 2000, %ecx Address embedded in the instruction E.g., address 2000 corresponds to a global variable Load ECX register with the long located at address 2000 • Indirect addressing: movl (%eax), %ebx Address stored in a register E.g., EAX register is a pointer Load EBX register with long located at address in EAX 25 More Complex Addressing Modes • Base pointer addressing: movl 4(%eax), %ebx Extends indirect addressing by allowing an offset E.g., add “4” to the register EAX to get the address Allows access to a particular field in a structure E.g., if “age” starts at the 4th byte of a record • Indexed addressing: movl 2000(,%ecx,1), %ebx Starts from a base address (e.g., 2000) Adds an offset from a register (e.g., ECX) With a multiplier of 1, 2, 4, or 8 (e.g., 1 to multiply by 1) Allows register to be index for byte, word, or long array 26 Effective Address eax ebx ecx edx esp ebp esi edi Offset = Base + eax ebx ecx edx esp ebp esi edi Index * 1 2 4 8 None 8-bit + 16-bit 32-bit scale displacement • Displacement movl foo, %ebx • Base movl (%eax), %ebx • Base + displacement movl foo(%eax), %ebx movl 1(%eax), %ebx • (Index * scale) + displacement movl (,%eax,4), %ebx • Base + (index * scale) + displacement movl foo(%edx,%eax,4),%ebx 27 Data Access Methods: Summary • Immediate addressing: data stored in the instruction itself movl $10, %ecx • Register addressing: data stored in a register movl %eax, %ecx • Direct addressing: address stored in instruction movl 2000, %ecx • Indirect addressing: address stored in a register movl (%eax), %ebx • Base pointer addressing: includes an offset as well movl 4(%eax), %ebx • Indexed addressing: instruction contains base address, and specifies an index register and a multiplier (1, 2, 4, or 8) movl 2000(,%ecx,1), %ebx 28 Layout of an Assembly Language Program 29 A Simple Assembly Program .section .data .section .text # pre-initialized .globl _start # variables go here _start: # Program starts executing .section .bss # here # variables go here # Body of the program goes # here .section .rodata # Program ends with an # “exit()” system call # pre-initialized # to the operating system # constants go here movl $1, %eax # zero-initialized movl $0, %ebx int $0x80 30 Main Parts of the Program • Break program into sections (.section) Data, BSS, RoData, and Text • Starting the program Making _start a global (.global _start) – Tells the assembler to remember the symbol _start – … because the linker will need it Identifying the start of the program (_start) – Defines the value of the label _start 31 Main Parts of the Program • Exiting the program Specifying the exit() system call (movl $1, %eax) – Linux expects the system call number in EAX register Specifying the status code (movl $0, %ebx) – Linux expects the status code in EBX register Interrupting the operating system (int $0x80) 32 Conclusions • Machine code Binary representation of instructions What operation to do, and on what data • IA32 instructions Manipulate bytes, words, or longs Numerous kinds of operations Wide variety of addressing modes • Next time Calling functions, using the stack 33