lecture_3_review_instruction_sets

advertisement
EEL 5708
High Performance Computer Architecture
Lecture 3
Review: Instruction Sets
Sept 1, 2004
Lotzi Bölöni
Fall 2004
Fall 2004
EEL5708/Bölöni
Lec 3.1
Acknowledgements
• All the lecture slides were adopted from the slides of
David Patterson (1998, 2001) and David E. Culler
(2001), Copyright 1998-2002, University of
California Berkeley
Fall 2004
EEL5708/Bölöni
Lec 3.2
Review: Instruction sets
Fall 2004
EEL5708/Bölöni
Lec 3.3
The Instruction Set: a Critical Interface
software
instruction set
hardware
Fall 2004
EEL5708/Bölöni
Lec 3.4
Levels of Representation
temp = v[k];
High Level Language
Program
Compiler
Assembly Language
Program
Assembler
Machine Language
Program
v[k] = v[k+1];
v[k+1] = temp;
lw $15,0($2)
lw $16,4($2)
sw
$16, 0($2)
sw
$15, 4($2)
0000
1010
1100
0101
1001
1111
0110
1000
1100
0101
1010
0000
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
Specification
ALUOP[0:3] <= InstReg[9:11] & MASK
°
°
Fall 2004
EEL5708/Bölöni
Lec 3.5
Instruction Set Architecture
... the attributes of a [computing] system as seen
by the programmer, i.e. the conceptual structure
and functional behavior, as distinct from the
organization of the data flows and controls the logic
design, and the physical implementation.
– Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
-- Organization of Programmable
Storage
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Formats
-- Instruction (or Operation Code) Set
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
Fall 2004
EEL5708/Bölöni
Lec 3.6
Review: MIPS R3000 (core)
r0
r1
°
°
°
r31
PC
lo
hi
0
Programmable storage
Data types ?
2^32 x bytes
Format ?
31 x 32-bit GPRs (R0=0)
Addressing Modes?
32 x 32-bit FP regs (paired DP)
HI, LO, PC
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV
Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR
Control
32-bit instructions on word boundary
J, JAL, JR, JALR
Fall 2004
BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL
EEL5708/Bölöni
Lec 3.7
Review: Basic ISA Classes
Accumulator:
1 address
1+x address
Stack:
0 address
General Purpose
2 address
3 address
Load/Store:
3 address
Fall 2004
add A
addx A
add
Register:
add A B
add A B C
add Ra Rb Rc
load Ra Rb
store Ra Rb
acc acc + mem[A]
acc acc + mem[A + x]
tos tos + next
EA(A) EA(A) + EA(B)
EA(A) EA(B) + EA(C)
Ra Rb + Rc
Ra mem[Rb]
mem[Rb] Ra
EEL5708/Bölöni
Lec 3.8
Instruction Formats
Variable:
…
Fixed:
Hybrid:
•Addressing modes
–each operand requires address specifier => variable format
•code size => variable length instructions
•performance => fixed length instructions
–simple decoding, predictable operations
•With load/store instruction arch, only one memory
address and few addressing modes
•=> simple format, address mode given by opcode
Fall 2004
EEL5708/Bölöni
Lec 3.9
MIPS Addressing Modes & Formats
• Simple addressing modes
• All instructions 32 bits wide
Register (direct)
op
rs
rt
rd
register
Immediate
Base+index
op
rs
rt
immed
op
rs
rt
immed
register
PC-relative
op
rs
PC
rt
Memory
+
immed
Memory
+
• Register Indirect?
Fall 2004
EEL5708/Bölöni
Lec 3.10
Execution Cycle
Instruction
Obtain instruction from program storage
Fetch
Instruction
Determine required actions and instruction size
Decode
Operand
Locate and obtain operand data
Fetch
Execute
Result
Compute result value or status
Deposit results in storage for later use
Store
Next
Instruction
Fall 2004
Determine successor instruction
EEL5708/Bölöni
Lec 3.11
Review: Measuring performance
Fall 2004
EEL5708/Bölöni
Lec 3.12
Which is faster?
Plane
DC to
Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concorde
3 hours
1350 mph
132
178,200
• Time to run the task (ExTime)
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns …
(Performance)
– Throughput, bandwidth
Fall 2004
EEL5708/Bölöni
Lec 3.13
Definitions
• Performance is in units of things per sec
– bigger is better
• If we are primarily concerned with response time
– performance(x) =
1
execution_time(x)
" X is n times faster than Y" means
Execution_time(Y)
Performance(X)
n
=
=
Performance(Y)
Fall 2004
Execution_time(X)
EEL5708/Bölöni
Lec 3.14
CPI
Computer Performance
inst count
CPU time
= Seconds
= Instructions x
Program
CPI
Program
Compiler
X
(X)
Inst. Set.
X
X
Technology
x Seconds
Instruction
Inst Count
X
Organization
Fall 2004
Program
Cycles
X
Cycle time
Cycle
Clock Rate
X
X
EEL5708/Bölöni
Lec 3.15
Cycles Per Instruction
(Throughput)
“Average Cycles per Instruction”
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count
n
CPU time  Cycle Time   CPI j  I j
j 1
n
CPI   CPI j  Fj
j 1
where Fj 
Ij
Instruction Count
“Instruction Frequency”
Fall 2004
EEL5708/Bölöni
Lec 3.16
Example: Calculating CPI bottom up
Base Machine
Op
ALU
Load
Store
Branch
(Reg /
Freq
50%
20%
10%
20%
Reg)
Cycles
1
2
2
2
Typical Mix of
instruction types
in program
Fall 2004
CPI(i)
.5
.4
.2
.4
1.5
(% Time)
(33%)
(27%)
(13%)
(27%)
EEL5708/Bölöni
Lec 3.17
Example: Branch Stall Impact
• Assume CPI = 1.0 ignoring branches (ideal)
• Assume solution was stalling for 3 cycles
• If 30% branch, Stall 3 cycles on 30%
• Op
• Other
• Branch
Freq
70%
30%
Cycles CPI(i) (% Time)
1
.7
(37%)
4
1.2
(63%)
• => new CPI = 1.9
• New machine is 1/1.9 = 0.52 times faster (i.e. slow!)
Fall 2004
EEL5708/Bölöni
Lec 3.18
Download