CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Chapter 2 Instructions: Language of the Computer Zhao Zhang Iowa State University Revised from original slides provided by MKP Steps required 1. 2. 3. 4. 5. 6. Place parameters in registers Transfer control to procedure Acquire storage for procedure Perform procedure’s operations Place result in register for caller Return to place of call §2.8 Supporting Procedures in Computer Hardware Procedure/Function Calling Chapter 2 — Instructions: Language of the Computer — 2 Register Usage Review $a0 – $a3: arguments (reg’s 4 – 7) $v0, $v1: result values (reg’s 2 and 3) $t0 – $t9: temporaries Can be overwritten by callee $s0 – $s7: saved Must be saved/restored by callee $gp: global pointer for static data (reg 28) $sp: stack pointer (reg 29) $fp: frame pointer (reg 30) $ra: return address (reg 31) Note: There are additional rules for floating point registers Chapter 2 — Instructions: Language of the Computer — 3 Procedure Call Instructions Procedure call: jump and link jal ProcedureLabel Address of following instruction put in $ra Jumps to target address Procedure return: jump register jr $ra Copies $ra to program counter Can also be used for computed jumps e.g., for case/switch statements Chapter 2 — Instructions: Language of the Computer — 4 Leaf Procedure Example C code: int leaf_example (int g, h, i, j) { int f; f = (g + h) - (i + j); return f; } Arguments g, …, j in $a0, …, $a3 f in $s0 (hence, need to save $s0 on stack) Result in $v0 Chapter 2 — Instructions: Language of the Computer — 5 Leaf Procedure Example MIPS code: leaf_example: addi $sp, $sp, -4 sw $s0, 0($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra Save $s0 on stack Procedure body Result Restore $s0 Return Chapter 2 — Instructions: Language of the Computer — 6 Exercise Write MIPS code for int add2(int x, int y) { return x + y; } Chapter 1 — Computer Abstractions and Technology — 7 Exercise First version, with stack frame # x in $a0, y in $a1, return in $v0 add2: addi $sp, $sp, -4 # alloc frame sw $s0, 0($sp) # save $s0 add $s0, $a0, $a1 # tmp = x + y add lw addi jr $v0, $s0, $zero # $v0 = tmp $s0, 0($sp) # restore $s0 $sp, $sp, 4 # release frame $ra Chapter 1 — Computer Abstractions and Technology — 8 Exercise Optimized version, w/o stack frame # x in $a0, y in $a1, return in $v0 add2: add $v0, $a0, $a1 # $v0 = x + y jr $ra In this case, we have nothing to store in stack frame Chapter 1 — Computer Abstractions and Technology — 9 Exercise Write MIPS code for int max(int x, int y) { if (x > y) return x; else return y; } Chapter 1 — Computer Abstractions and Technology — 10 Exercise # x in $a0, y in $a1, return in $v0 max: slt $t0, $a1, $a0 # y < x? beq else # no, do else add $v0, $a0, $zero # to return x jal $ra # return else: add $v0, $a1, $zero # to return y jal $ra Chapter 1 — Computer Abstractions and Technology — 11 Non-Leaf Procedures Procedures that call other procedures For nested call, caller needs to save on the stack: Its return address Any arguments and temporaries needed after the call Restore from the stack after the call Chapter 2 — Instructions: Language of the Computer — 12 Stack Frame Contents A complete stack frame may hold Extra arguments exceeding $a0-$a3 Save registers ($s0-$s7) that will be overwritten Return address ($ra) Local, automatic variables A non-leaf function must have a stack frame, because $ra has to be saved Chapter 1 — Computer Abstractions and Technology — 13 Local Data on the Stack Local data allocated by callee e.g., C automatic variables Procedure frame (activation record) Used by some compilers to manage stack storage Our examples do not use $fp Chapter 2 — Instructions: Language of the Computer — 14 Non-Leaf Procedure Example Write MIPS code for int max3(int x, int y, int z) { return max(max(x, y), z); } We have to use a procedure frame in stack (stack frame) Chapter 1 — Computer Abstractions and Technology — 15 Non-Leaf Procedure Example # x in $a0, y in $a1, z in $a2, ret in $v0 max3: addi $sp, $sp, -8 # alloc stack frame sw $ra, 4($sp) # preserve $ra sw $a2, 0($sp) # preserve z jal max # call max(x, y) add $a0, $v0, $zero # $a0 = max(x, y) lw $a1, 0($sp) # $a1 = z jal max # 2nd call max(…) lw $ra, 4($sp) # restore $ra addi $sp, $sp, 8 # free stack frame jr $ra # return Chapter 1 — Computer Abstractions and Technology — 16 Non-Leaf Procedure Example Write MIPS code for int add3(int x, int y, int z) { return add2(add2(x, y), z); } Chapter 1 — Computer Abstractions and Technology — 17 Non-Leaf Procedure Example C code: int fact (int n) { if (n < 1) return f; else return n * fact(n - 1); } Argument n in $a0 Result in $v0 Chapter 2 — Instructions: Language of the Computer — 18 Non-Leaf Procedure Example MIPS code: fact: addi sw sw slti beq addi addi jr L1: addi jal lw lw addi mul jr $sp, $ra, $a0, $t0, $t0, $v0, $sp, $ra $a0, fact $a0, $ra, $sp, $v0, $ra $sp, -8 4($sp) 0($sp) $a0, 1 $zero, L1 $zero, 1 $sp, 8 $a0, -1 0($sp) 4($sp) $sp, 8 $a0, $v0 # # # # adjust stack for 2 items save return address save argument test for n < 1 # # # # # # # # # # if so, result is 1 pop 2 items from stack and return else decrement n recursive call restore original n and return address pop 2 items from stack multiply to get result and return Chapter 2 — Instructions: Language of the Computer — 19 Memory Layout Text: program code Static data: global variables Dynamic data: heap e.g., static variables in C, constant arrays and strings $gp initialized to address allowing ±offsets into this segment E.g., malloc in C, new in Java Stack: automatic storage Chapter 2 — Instructions: Language of the Computer — 20 Byte-encoded character sets ASCII: 128 characters Latin-1: 256 characters 95 graphic, 33 control ASCII, +96 more graphic characters §2.9 Communicating with People Character Data Unicode: 32-bit character set Used in Java, C++ wide characters, … Most of the world’s alphabets, plus symbols UTF-8, UTF-16: variable-length encodings Chapter 2 — Instructions: Language of the Computer — 21 Byte/Halfword Operations Could use bitwise operations MIPS byte/halfword load/store String processing is a common case lb rt, offset(rs) Sign extend to 32 bits in rt lbu rt, offset(rs) lhu rt, offset(rs) Zero extend to 32 bits in rt sb rt, offset(rs) lh rt, offset(rs) sh rt, offset(rs) Store just rightmost byte/halfword Chapter 2 — Instructions: Language of the Computer — 22 String Copy Example C code (array-based version) Null-terminated string void strcpy (char x[], char y[]) { int i = 0; while ((x[i] = y[i]) != '\0') i++; } Addresses of x, y in $a0, $a1 i in $s0 Chapter 2 — Instructions: Language of the Computer — 23 String Copy Example MIPS code: strcpy: addi sw add L1: add lbu add sb beq addi j L2: lw addi jr $sp, $s0, $s0, $t1, $t2, $t3, $t2, $t2, $s0, L1 $s0, $sp, $ra $sp, -4 0($sp) $zero, $zero $s0, $a1 0($t1) $s0, $a0 0($t3) $zero, L2 $s0, 1 0($sp) $sp, 4 # # # # # # # # # # # # # adjust stack for 1 item save $s0 i = 0 addr of y[i] in $t1 $t2 = y[i] addr of x[i] in $t3 x[i] = y[i] exit loop if y[i] == 0 i = i + 1 next iteration of loop restore saved $s0 pop 1 item from stack and return Chapter 2 — Instructions: Language of the Computer — 24 String Copy Example C code, pointer-based version void strcpy (char *x, char *y) { while ((*x++ = *y++) != '\0') { } } A good optimizing compiler may generate the same, efficient code for both versions (see next) Chapter 2 — Instructions: Language of the Computer — 25 Strcpy: Optimized Version strcpy: # reg: x in $a0, y in $a1, *y in $t0 Loop: lbu $t0, 0($a1) # load *y sb $t0, 0($a0) # store to *x addi $a0, $a0, 1 # x++ addi $a1, $a1, 1 # y++ bne $t0, $zero, Loop # *y != 0? jr $ra # return 5 vs. 7 instructions in the loop 6 vs. 13 instructions in the function Chapter 1 — Computer Abstractions and Technology — 26 Array indexing involves Multiplying index by element size Adding to array base address Pointers correspond directly to memory addresses §2.14 Arrays versus Pointers Arrays vs. Pointers Can avoid indexing complexity Chapter 2 — Instructions: Language of the Computer — 27 Another Example Clear an array, Array access clear1(int array[], int size) { int i; for (i = 0; i < size; i++) { array[i] = 0; } } Chapter 1 — Computer Abstractions and Technology — 28 Array Access MIPS Code # array in $a0, size in $a1 clear1: move $t0,$zero # i = 0 loop1: sll $t1, $t0, 2 # $t1 = i * 4 add $t2, $a0, $t1 # $t2 = &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0, $t0, 1 # i = i + 1 slt $t3, $t0, $a1 # $t3 = (i < size) bne $t3, $zero, loop1 # if true, repeat Chapter 1 — Computer Abstractions and Technology — 29 Pointer Access Clear an array, array access clear2(int *array, int size) { int *p; for (p = array; p < array + size; p++) { *p = 0; } } Chapter 1 — Computer Abstractions and Technology — 30 Pointer Access MIPS Code clear2: move $t0, $a0 # p = array sll $t1, $a1, 2 # $t1 = size * 4 add $t2, $a0, $t1 # $t2 = &array[size] j loop2_cond loop2: sw $zero, 0($t0) # *p = 0 addi $t0, $t0, 4 # p++ loop2_cond: slt $t3, $t0, $t2 # p < &array[size]? bne $t3, $zero, loop2 $jr $ra Chapter 1 — Computer Abstractions and Technology — 31 Comparison of Array vs. Ptr Multiply “strength reduced” to shift Array version requires shift to be inside loop Part of index calculation for incremented i c.f. incrementing pointer Compiler can achieve same effect as manual use of pointers Induction variable elimination Better to make program clearer and safer Chapter 2 — Instructions: Language of the Computer — 32 For-Loop Example Calculate the sum of array int array_sum(int X[], int size) { int sum = 0; for (int i = 0; i < size; i++) sum += X[i]; return sum; } Chapter 1 — Computer Abstractions and Technology — 33 FOR Loop Control and Data Flow Graph Linear Code Layout Init-expr Init-expr Jump For-body For-body Incr-expr Incr-expr Test cond Cond F T Branch if true (Optional: prologue and epilogue) 34 For-Loop MIPS Code # X in $a0, size in $a1, return in $v0 array_sum: add $v0, $zero, $zero # sum = 0 add $t0, $zero, $zero # i = 0 j for_cond for_loop: sll $t1, $t0, 2 # $t1 = i*4 add $t1, $a0, $t1 # $t1 = &X[i] lw $t1, 0($t1) # $t1 = X[i] add $v0, $v0, $t1 # sum += X[i] addi $t0, $t0, 1 # i++ for_cond: slt $t1, $t0, $a1 # i < size? bne $t1, $zero, for_loop # if true, repeat jr $ra Chapter 1 — Computer Abstractions and Technology — 35 For-Loop: Pointer Version Calculate the sum of array int array_sum(int X[], int size) { int *p, sum = 0; for (p = X; p < &X[size]; p++) sum += *p; return sum; } Again, do not write pointer version for performance – A good compiler will take care of it. Chapter 1 — Computer Abstractions and Technology — 36 Optimized MIPS Code # X in $a0, size in $a1, return in $v0 array_sum: add $v0, $zero, $zero # sum = 0 add $t0, $a0, $zero # p = X sll $a1, $a1, 2 # $a1 = 4*size add $a1, $a0, $a1 # $a1 = &X[size] j for_cond for_loop: lw $t1, 0($t0) # $t1 = *p add $v0, $v0, $t1 # sum += *p addi $t0, $t0, 4 # p++ for_cond: slt $t1, $t0, $a1 # p < &X[size]? bne $t1, $zero, for_loop # if true, repeat jr $ra Chapter 1 — Computer Abstractions and Technology — 37 Most constants are small 16-bit immediate is sufficient For the occasional 32-bit constant lui rt, constant Copies 16-bit constant to left 16 bits of rt Clears right 16 bits of rt to 0 lhi $s0, 61 0000 0000 0111 1101 0000 0000 0000 0000 ori $s0, $s0, 2304 0000 0000 0111 1101 0000 1001 0000 0000 §2.10 MIPS Addressing for 32-Bit Immediates and Addresses 32-bit Constants Chapter 2 — Instructions: Language of the Computer — 38 32-bit Constants Translate C to MIPS f = 0x10203040; # Assume f in $s0 lui $s0, 0x1020 ori $s0, $s0, 0x3040 Chapter 1 — Computer Abstractions and Technology — 39 32-bit Constants Load a big value in MIPS int *p = array; # assume p in $s0 la $s0, array MIPS assembly supports pseudo instruction “la”, equivalent to lui $s0, upper_of_array ori $s0, $s0, lower_of_array The assembler decides the value for upper_of_array and lower_of_array Chapter 1 — Computer Abstractions and Technology — 40 Shift Instructions Ex: 0 -- rt rd shamt 0 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits sll sll rd, rt, shamt $s0, $s0, 4 ; shift left logic ; sll by 4 bits 0 rs rt rd 0 4 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits sllv rd, rt, rs Ex: sllv $s0, $s0, $t0 Source: textbook B-55, B56 ; SLL variable ; ssl by $t0 bits Chapter 1 — Computer Abstractions and Technology — 41 Shift Instructions Other shift instructions srl rd, rt, shamt srlv rd, rt, rs # shift right logic # SRL varaible sra rd, rt, shamt srav rd, rt, rs # shift right arithmetic # SRA variable Chapter 1 — Computer Abstractions and Technology — 42 Branch Addressing Branch instructions specify Opcode, two registers, target address Most branch targets are near branch Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits PC-relative addressing Target address = PC + offset × 4 PC already incremented by 4 by this time Chapter 2 — Instructions: Language of the Computer — 43 Jump Addressing Jump (j and jal) targets could be anywhere in text segment Encode full address in instruction op address 6 bits 26 bits (Pseudo)Direct jump addressing Target address = PC31…28 : (address × 4) Chapter 2 — Instructions: Language of the Computer — 44 Target Addressing Example Loop code from earlier example Assume Loop at location 80000 Loop: sll $t1, $s3, 2 80000 0 0 19 9 2 0 add $t1, $t1, $s6 80004 0 9 22 9 0 32 lw $t0, 0($t1) 80008 35 9 8 0 bne $t0, $s5, Exit 80012 5 8 21 2 19 19 1 addi $s3, $s3, 1 80016 8 j 80020 2 Exit: … Loop 20000 80024 Chapter 2 — Instructions: Language of the Computer — 45 Branching Far Away If branch target is too far to encode with 16-bit offset, assembler rewrites the code Example beq $s0,$s1, L1 ↓ bne $s0,$s1, L2 j L1 L2: … Chapter 2 — Instructions: Language of the Computer — 46 Addressing Mode Summary Chapter 2 — Instructions: Language of the Computer — 47