Load and Store - Iowa State University

advertisement
CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Exam 1 Review
Dr. Zhao Zhang
Iowa State University
What We Have Learned

Ch. 1: Computer Abstraction and
Technology





Technology Trends
CPU Performance
Instruction count, CPI, and cycle time
Processor power efficiency
Processor manufacturing and cost
Chapter 1 — Computer Abstractions and Technology — 2
Question Styles and
Coverage


Short conceptual questions
Calculation questions




Performance improvement (speedup)
Power rate and energy saving
CPU time, CPI, Instruction Count, Cycle Time
CPU time = # Cycles × CT = IC × CPI × CT
Speedup = Old Time / New Time
The coverage excludes

Manufacturing and cost
Chapter 1 — Computer Abstractions and Technology — 3
Question 1

A MIPS processor runs at 1.0GHz, and for a given
benchmark program its CPI is 1.5. A design
optimization will improve the clock rate to 1.5GHz
and increase the CPI to 1.8. What is the speedup
from the optimization?
Instruction count remains the same
Clock rate change: 1.5/1.0 = 1.5x
Cycle time improvement factor is 1.50x
CPI change: 1.8/1.5 = 1.2x
Improvement factor is 0.83x (degradation)
Overall performance improvement is 1.50*0.83 = 1.25x
Chapter 1 — Computer Abstractions and Technology — 4
Question 2
A processor spends 60% time on load/store
instructions. A new design improve
load/store performance by 2.0 times. What is
the overall performance improvement?
Amdahl’s Law: Speedup = 1/((1-f)+f/s)
f: Fraction of time that the optimization applies to
s: The improvement factor of the optimization
Speedup = 1/(0.4 + 0.6/2.0) = 1/0.7 = 1.43
Chapter 1 — Computer Abstractions and Technology — 5
What We Have Learned

Ch. 2, Instructions: Language of the
Computer



Instruction set architecture
MIPS binary instruction format
Plus floating-point instructions
Chapter 1 — Computer Abstractions and Technology — 6
Question 3
Translate the following C statement into MIPS.
Variables f, g, h are global and located at
100($gp), 104($gp) and 108($gp), respectively.
extern int f, g, h;
f = g + 4 * h;
Try to predict how many instructions that you have
to use
Chapter 1 — Computer Abstractions and Technology — 7
Question 3
# Load g, load h, multiply, add, store
lw
lw
sll
add
sw
$t0,
$t1,
$t1,
$t0,
$t0,
104($gp)
108($gp)
$t1, 2
$t0, $t1
100($gp)
#
#
#
#
#
load g
load h
4*h
g+4*h
store f
Chapter 1 — Computer Abstractions and Technology — 8
Exam Strategy
In your exam, write comments with the MIPS
code
 It helps you write the code
 It helps the grader understand your code
 You may get more partial credit

In case your code is not 100% correct
Chapter 1 — Computer Abstractions and Technology — 9
Load and Store

Three factors: address, size and extension





Load/store word: lw, sw
Half word: lh, lhu, sh
Byte: lb, lbu, sb
Choose sign extension or zero extension,
when loading a half word or a byte
Floating points load and store


Single precision: lwc1, swc1
Double precision: ldc1, sdc1
Chapter 1 — Computer Abstractions and Technology — 10
Array access

Load from an array element
extern unsigned short X[];
h = X[i];
Assume h in $s2, X in $s0, i in $s1.
sll
$t0, $s1, 1
# $t0=i*2
add
$t0, $s0, $t0
# $t0=&X[i]
lhu
$s2, 0($t0)
# h=X[i]
Chapter 1 — Computer Abstractions and Technology — 11
Array Access

Store to an array element
extern int Y[];
Y[j] = g;
Assume g in $s2, Y in $s0, j in $s1.
sll
$t0, $s1, 2
# $t0=j*4
add
$t0, $s0, $t0
# $t0=&Y[j]
sw
$s2, 0($t0)
# Y[j]=g
Chapter 1 — Computer Abstractions and Technology — 12
Array Access

Load and store floating point numbers
extern double X[], Y[];
Y[i] = X[i];
Assume i in $s0, X in $a0, j in $a1
sll
add
ldc1
add
sdc1
$t0,
$t0,
$f0,
$t1,
$f0,
$s0, 3
$a0, $t0
0($t0)
$a1, $t0
0($t1)
#
#
#
#
#
$t0=8*i
$t0=&X[i]
$f0:f1=X[i]
$t1=&Y[i]
$f0:f1=Y[i]
Chapter 1 — Computer Abstractions and Technology — 13
16-bit and 32-bit Constants


Load a 16-bit immediate
f = 0x1000; // f in $s0
addi $s0, 0x1000
Load an 32-bit immediate
f = 0xFFFF1000;
lui
ori
$s0, 0xFFFF
$s0, $s0, 0x1000
Chapter 1 — Computer Abstractions and Technology — 14
Pointer Access

Pointer access
int h, *p;
Assume h in $t0, p in $s0.
h = *p;
lw
$t0, 0($s0)
# h = *p
*p = h;
sw
$t0, 0($s0)
# h = *p
Chapter 1 — Computer Abstractions and Technology — 15
Branches



Only two branches in the original MIPS
beq
rs, rt, label
bne
rs, rt, label
Branch if true/non-zero
bne
rs, $zero, label
Branch if false/zero
beq
rs, $zero, label
Chapter 1 — Computer Abstractions and Technology — 16
If-else Statement
Evaluate condition, branch if false
if (a < 0)
a = -a;
Assume a in $s0

slt
beq
sub
endif:
$t0, $s0, $zero # a < 0?
endif
# false? skip
$s0, $zero, $s0 # a = -a
Chapter 1 — Computer Abstractions and Technology — 17
If-else Structure

Evaluate condition, branch if false
if (a > b) max = a; else max = b;
Assume max in $s2, a in $s0, b in $s1
slt
beq
add
j
else: add
endif:
$t0, $s1, $s0
$t0, $zero, else
$s2, $s0, $zero
endif
$s2, $s1, $zero
# b < a
# false?
# max = a
# max = b
Chapter 1 — Computer Abstractions and Technology — 18
FOR Loop
Control and Data Flow
Graph
Linear Code Layout
Init-expr
Init-expr
Jump
For-body
For-body
Incr-expr
Incr-expr
Test cond
Cond
F
T
Branch if true
(Optional: prologue and epilogue)
19
Function with For-loop
Translate the following C function into MIPS
short checksum(short X[], int N)
{
int i;
short checksum = 0;
for (i = 0; i < N; i++)
checksum = checksum ^ X[i];
return checksum;
}
Chapter 1 — Computer Abstractions and Technology — 20
Function with For-loop
checksum:
addi
addi
j
# X=>$a0, N=>$a1, i=>$t0,
# checksum=>$v0
$v0, $zero, 0 # checksum = 0
$t0, $zero, 0 # i = 0
loop_cond
loop:
sll
add
lh
xor
addi
loop_cond:
slt
bne
jr
$t1,
$t1,
$t1,
$v0,
$t0,
$t0, 1
$a0, $t1
0($t1)
$v0, $t1
$t0, 1
#
#
#
#
#
i*2
&X[i]
load X[i]
checksum ^= X[i]
i++
$t1, $t0, $a1# i < N
$t1, $zero, loop # loop
$ra
Chapter 1 — Computer Abstractions and Technology — 21
Leaf and Non-Leaf Functions

Leaf function doesn’t call another function



Stack frame is not necessary
Prefer to use temp registers (t-registers)
Non-leaf function calls some other
functions(s)


Must use a stack frame, has to save $ra
Usually has to use save registers (s-registers)
Chapter 1 — Computer Abstractions and Technology — 22
Non-Leaf Function
What is the size of the frame?
extern short xor(short, short);
short checksum(short X[], int N)
{
int i;
short checksum = 0;
for (i = 0; i < N; i++)
checksum = xor(checksum, X[i]);
return checksum;
}
Chapter 1 — Computer Abstractions and Technology — 23
Non-Leaf Function

X, N, i, and $ra must be preserved

Need a stack frame of 16 bytes
addi
sw
sw
sw
sw
$sp,
$ra,
$s2,
$s1,
$s0,
$sp, -16
12($sp)
8($sp)
4($sp)
0($sp)
add
add
addi
$s0, $a0, $zero # $s0 = X
$s1, $a1, $zero # $s1 = N
$s2, $zero, 0
# i = 0
# for return address
Chapter 1 — Computer Abstractions and Technology — 24
Non-Leaf Function
…
# function body
lw
lw
lw
lw
addi
jr
$s0,
$s1,
$s2,
$ra,
$sp,
$ra
0($sp)
4($sp)
8($sp)
12($sp)
$sp, 16
Chapter 1 — Computer Abstractions and Technology — 25
Register Name and Call Convention
NAME
Number
6
Preserved?
$zero
0
Constant value 0
N/A
$at
1
Assembler temporary
No
$v0-$v1
2-3
Values for function results and expression
evaluation
No
$a0-$a3
4-7
Arguments
No
$t0-$t7
8-15
Temporaries
No
$s0-$s7
16-23
Saved temporaries
Yes
$t8-$t9
24-25
Temporaries
No
$k0-$k1
26-27
Saved for OS kernel
No
6
24
Use
$gp
28
Global pointer
Yes
$sp
29
Stack pointer
Yes
$fp
30
Frame pointer
Yes
$ra
31
Return address
Yes
Chapter 1 — Computer Abstractions and Technology — 26
MIPS Call Convention: FP

The first two FP parameters in registers

1st parameter in $f12 or $f12:$f13







A double-precision parameter takes two registers
2nd FP parameter in $f14 or $f14:$f15
Extra parameters in stack
$f0 stores single-precision FP return value
$f0:$f1 stores double-precision FP return
value
$f0-$f19 are FP temporary registers
$f20-$f31 are FP saved temporary registers
Chapter 1 — Computer Abstractions and Technology — 27
FP Example: Call a Function
extern double a, b, c;
extern double max(double, double);
c = max(a, b);
ldc1
ldc1
jal
sdc1

$f12, 100($gp)
$f14, 108($gp)
max
$f0, 116($gp)
# $f12:$f13 = a
# $f14:$f15 = b
# c = $f0:$f1
Assume a, b, c assigned to 100($gp), 108($gp),
and 116($gp)
Chapter 1 — Computer Abstractions and Technology — 28
FP Instructions in MIPS


Single-precision arithmetic
 add.s, sub.s, mul.s, div.s
 e.g., add.s $f0, $f1, $f6
Double-precision arithmetic
 add.d, sub.d, mul.d, div.d
 e.g., mul.d $f4, $f4, $f6
Chapter 3 — Arithmetic for Computers — 29
FP Instructions in MIPS


Single- and double-precision comparison
 c.xx.s, c.xx.d (xx is eq, lt, le, …)
 Sets or clears FP condition-code bit
 e.g. c.lt.s $f3, $f4
Branch on FP condition code true or false
 bc1t, bc1f
 e.g., bc1t TargetLabel
Chapter 1 — Computer Abstractions and Technology — 30
Download