CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Exam 1 Review Dr. Zhao Zhang Iowa State University What We Have Learned Ch. 1: Computer Abstraction and Technology Technology Trends CPU Performance Instruction count, CPI, and cycle time Processor power efficiency Processor manufacturing and cost Chapter 1 — Computer Abstractions and Technology — 2 Question Styles and Coverage Short conceptual questions Calculation questions Performance improvement (speedup) Power rate and energy saving CPU time, CPI, Instruction Count, Cycle Time CPU time = # Cycles × CT = IC × CPI × CT Speedup = Old Time / New Time The coverage excludes Manufacturing and cost Chapter 1 — Computer Abstractions and Technology — 3 Question 1 A MIPS processor runs at 1.0GHz, and for a given benchmark program its CPI is 1.5. A design optimization will improve the clock rate to 1.5GHz and increase the CPI to 1.8. What is the speedup from the optimization? Instruction count remains the same Clock rate change: 1.5/1.0 = 1.5x Cycle time improvement factor is 1.50x CPI change: 1.8/1.5 = 1.2x Improvement factor is 0.83x (degradation) Overall performance improvement is 1.50*0.83 = 1.25x Chapter 1 — Computer Abstractions and Technology — 4 Question 2 A processor spends 60% time on load/store instructions. A new design improve load/store performance by 2.0 times. What is the overall performance improvement? Amdahl’s Law: Speedup = 1/((1-f)+f/s) f: Fraction of time that the optimization applies to s: The improvement factor of the optimization Speedup = 1/(0.4 + 0.6/2.0) = 1/0.7 = 1.43 Chapter 1 — Computer Abstractions and Technology — 5 What We Have Learned Ch. 2, Instructions: Language of the Computer Instruction set architecture MIPS binary instruction format Plus floating-point instructions Chapter 1 — Computer Abstractions and Technology — 6 Question 3 Translate the following C statement into MIPS. Variables f, g, h are global and located at 100($gp), 104($gp) and 108($gp), respectively. extern int f, g, h; f = g + 4 * h; Try to predict how many instructions that you have to use Chapter 1 — Computer Abstractions and Technology — 7 Question 3 # Load g, load h, multiply, add, store lw lw sll add sw $t0, $t1, $t1, $t0, $t0, 104($gp) 108($gp) $t1, 2 $t0, $t1 100($gp) # # # # # load g load h 4*h g+4*h store f Chapter 1 — Computer Abstractions and Technology — 8 Exam Strategy In your exam, write comments with the MIPS code It helps you write the code It helps the grader understand your code You may get more partial credit In case your code is not 100% correct Chapter 1 — Computer Abstractions and Technology — 9 Load and Store Three factors: address, size and extension Load/store word: lw, sw Half word: lh, lhu, sh Byte: lb, lbu, sb Choose sign extension or zero extension, when loading a half word or a byte Floating points load and store Single precision: lwc1, swc1 Double precision: ldc1, sdc1 Chapter 1 — Computer Abstractions and Technology — 10 Array access Load from an array element extern unsigned short X[]; h = X[i]; Assume h in $s2, X in $s0, i in $s1. sll $t0, $s1, 1 # $t0=i*2 add $t0, $s0, $t0 # $t0=&X[i] lhu $s2, 0($t0) # h=X[i] Chapter 1 — Computer Abstractions and Technology — 11 Array Access Store to an array element extern int Y[]; Y[j] = g; Assume g in $s2, Y in $s0, j in $s1. sll $t0, $s1, 2 # $t0=j*4 add $t0, $s0, $t0 # $t0=&Y[j] sw $s2, 0($t0) # Y[j]=g Chapter 1 — Computer Abstractions and Technology — 12 Array Access Load and store floating point numbers extern double X[], Y[]; Y[i] = X[i]; Assume i in $s0, X in $a0, j in $a1 sll add ldc1 add sdc1 $t0, $t0, $f0, $t1, $f0, $s0, 3 $a0, $t0 0($t0) $a1, $t0 0($t1) # # # # # $t0=8*i $t0=&X[i] $f0:f1=X[i] $t1=&Y[i] $f0:f1=Y[i] Chapter 1 — Computer Abstractions and Technology — 13 16-bit and 32-bit Constants Load a 16-bit immediate f = 0x1000; // f in $s0 addi $s0, 0x1000 Load an 32-bit immediate f = 0xFFFF1000; lui ori $s0, 0xFFFF $s0, $s0, 0x1000 Chapter 1 — Computer Abstractions and Technology — 14 Pointer Access Pointer access int h, *p; Assume h in $t0, p in $s0. h = *p; lw $t0, 0($s0) # h = *p *p = h; sw $t0, 0($s0) # h = *p Chapter 1 — Computer Abstractions and Technology — 15 Branches Only two branches in the original MIPS beq rs, rt, label bne rs, rt, label Branch if true/non-zero bne rs, $zero, label Branch if false/zero beq rs, $zero, label Chapter 1 — Computer Abstractions and Technology — 16 If-else Statement Evaluate condition, branch if false if (a < 0) a = -a; Assume a in $s0 slt beq sub endif: $t0, $s0, $zero # a < 0? endif # false? skip $s0, $zero, $s0 # a = -a Chapter 1 — Computer Abstractions and Technology — 17 If-else Structure Evaluate condition, branch if false if (a > b) max = a; else max = b; Assume max in $s2, a in $s0, b in $s1 slt beq add j else: add endif: $t0, $s1, $s0 $t0, $zero, else $s2, $s0, $zero endif $s2, $s1, $zero # b < a # false? # max = a # max = b Chapter 1 — Computer Abstractions and Technology — 18 FOR Loop Control and Data Flow Graph Linear Code Layout Init-expr Init-expr Jump For-body For-body Incr-expr Incr-expr Test cond Cond F T Branch if true (Optional: prologue and epilogue) 19 Function with For-loop Translate the following C function into MIPS short checksum(short X[], int N) { int i; short checksum = 0; for (i = 0; i < N; i++) checksum = checksum ^ X[i]; return checksum; } Chapter 1 — Computer Abstractions and Technology — 20 Function with For-loop checksum: addi addi j # X=>$a0, N=>$a1, i=>$t0, # checksum=>$v0 $v0, $zero, 0 # checksum = 0 $t0, $zero, 0 # i = 0 loop_cond loop: sll add lh xor addi loop_cond: slt bne jr $t1, $t1, $t1, $v0, $t0, $t0, 1 $a0, $t1 0($t1) $v0, $t1 $t0, 1 # # # # # i*2 &X[i] load X[i] checksum ^= X[i] i++ $t1, $t0, $a1# i < N $t1, $zero, loop # loop $ra Chapter 1 — Computer Abstractions and Technology — 21 Leaf and Non-Leaf Functions Leaf function doesn’t call another function Stack frame is not necessary Prefer to use temp registers (t-registers) Non-leaf function calls some other functions(s) Must use a stack frame, has to save $ra Usually has to use save registers (s-registers) Chapter 1 — Computer Abstractions and Technology — 22 Non-Leaf Function What is the size of the frame? extern short xor(short, short); short checksum(short X[], int N) { int i; short checksum = 0; for (i = 0; i < N; i++) checksum = xor(checksum, X[i]); return checksum; } Chapter 1 — Computer Abstractions and Technology — 23 Non-Leaf Function X, N, i, and $ra must be preserved Need a stack frame of 16 bytes addi sw sw sw sw $sp, $ra, $s2, $s1, $s0, $sp, -16 12($sp) 8($sp) 4($sp) 0($sp) add add addi $s0, $a0, $zero # $s0 = X $s1, $a1, $zero # $s1 = N $s2, $zero, 0 # i = 0 # for return address Chapter 1 — Computer Abstractions and Technology — 24 Non-Leaf Function … # function body lw lw lw lw addi jr $s0, $s1, $s2, $ra, $sp, $ra 0($sp) 4($sp) 8($sp) 12($sp) $sp, 16 Chapter 1 — Computer Abstractions and Technology — 25 Register Name and Call Convention NAME Number 6 Preserved? $zero 0 Constant value 0 N/A $at 1 Assembler temporary No $v0-$v1 2-3 Values for function results and expression evaluation No $a0-$a3 4-7 Arguments No $t0-$t7 8-15 Temporaries No $s0-$s7 16-23 Saved temporaries Yes $t8-$t9 24-25 Temporaries No $k0-$k1 26-27 Saved for OS kernel No 6 24 Use $gp 28 Global pointer Yes $sp 29 Stack pointer Yes $fp 30 Frame pointer Yes $ra 31 Return address Yes Chapter 1 — Computer Abstractions and Technology — 26 MIPS Call Convention: FP The first two FP parameters in registers 1st parameter in $f12 or $f12:$f13 A double-precision parameter takes two registers 2nd FP parameter in $f14 or $f14:$f15 Extra parameters in stack $f0 stores single-precision FP return value $f0:$f1 stores double-precision FP return value $f0-$f19 are FP temporary registers $f20-$f31 are FP saved temporary registers Chapter 1 — Computer Abstractions and Technology — 27 FP Example: Call a Function extern double a, b, c; extern double max(double, double); c = max(a, b); ldc1 ldc1 jal sdc1 $f12, 100($gp) $f14, 108($gp) max $f0, 116($gp) # $f12:$f13 = a # $f14:$f15 = b # c = $f0:$f1 Assume a, b, c assigned to 100($gp), 108($gp), and 116($gp) Chapter 1 — Computer Abstractions and Technology — 28 FP Instructions in MIPS Single-precision arithmetic add.s, sub.s, mul.s, div.s e.g., add.s $f0, $f1, $f6 Double-precision arithmetic add.d, sub.d, mul.d, div.d e.g., mul.d $f4, $f4, $f6 Chapter 3 — Arithmetic for Computers — 29 FP Instructions in MIPS Single- and double-precision comparison c.xx.s, c.xx.d (xx is eq, lt, le, …) Sets or clears FP condition-code bit e.g. c.lt.s $f3, $f4 Branch on FP condition code true or false bc1t, bc1f e.g., bc1t TargetLabel Chapter 1 — Computer Abstractions and Technology — 30