CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic

advertisement
CS 152 Computer Architecture and
Engineering
Lecture 4 - Pipelining
Krste Asanovic
Electrical Engineering and Computer Sciences
University of California at Berkeley
http://www.eecs.berkeley.edu/~krste
http://inst.eecs.berkeley.edu/~cs152
Last time in Lecture 3
• Microcoding became less attractive as gap between
RAM and ROM speeds reduced
• Complex instruction sets difficult to pipeline, so
difficult to increase performance as gate count grew
• Iron-law explains architecture design space
– Trade instruction/program, cycles/instruction, and time/cycle
• Load-Store RISC ISAs designed for efficient
pipelined implementations
– Very similar to vertical microcode
– Inspired by earlier Cray machines
2/5/2008
CS152-Spring’08
2
“Iron Law” of Processor Performance
Time = Instructions
Cycles
Time
Program
Program * Instruction * Cycle
– Instructions per program depends on source code, compiler
technology, and ISA
– Cycles per instructions (CPI) depends upon the ISA and the
microarchitecture
– Time per cycle depends upon the microarchitecture and the
base technology
Lecture 3
This lecture
2/5/2008
Microarchitecture
Microcoded
Single-cycle unpipelined
Pipelined
CS152-Spring’08
CPI
>1
1
1
cycle time
short
long
short
3
An Ideal Pipeline
stage
1
stage
2
stage
3
stage
4
• All objects go through the same stages
• No sharing of resources between any two stages
• Propagation delay through all pipeline stages is equal
• The scheduling of an object entering the pipeline
is not affected by the objects in other stages
These conditions generally hold for industrial
assembly lines.
But can an instruction pipeline satisfy the last
condition?
2/5/2008
CS152-Spring’08
4
Pipelined MIPS
To pipeline MIPS:
• First build MIPS without pipelining with CPI=1
• Next, add pipeline registers to reduce cycle time while
maintaining CPI=1
2/5/2008
CS152-Spring’08
5
Pipelined Datapath
0x4
Add
PC
addr
rdata
Inst.
Memory
IR
we
rs1
rs2
rd1
ws
wd rd2
GPRs
Imm
Ext
ALU
we
addr
rdata
Data
Memory
wdata
write
fetch
decode & Reg-fetch
execute
memory
-back
phase
phase
phase
phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined
2/5/2008
CS152-Spring’08
6
Technology Assumptions
• A small amount of very fast memory (caches)
backed up by a large, slower memory
• Fast ALU (at least for integers)
• Multiported Register files (slower!)
Thus, the following timing assumption is reasonable
tIM tRF tALU tDM  tRW
A 5-stage pipelined Harvard architecture will be
the focus of our detailed design
2/5/2008
CS152-Spring’08
7
5-Stage Pipelined Execution
0x4
Add
PC
addr
rdata
we
rs1
rs2
rd1
ws
wd rd2
GPRs
IR
Inst.
Memory
I-Fetch
(IF)
Data
Memory
Imm
Ext
wdata
Write
Decode, Reg. Fetch Execute
Memory
(ID)
(EX)
(MA)
Back
t0 t1 t2 t3 t4 t5 t6 t7 . . . . (WB)
time
instruction1
instruction2
instruction3
instruction4
instruction5
2/5/2008
ALU
we
addr
rdata
IF1
ID1 EX1 MA1 WB1
IF2 ID2 EX2 MA2 WB2
IF3 ID3 EX3 MA3 WB3
IF4 ID4 EX4 MA4 WB4
IF5 ID5 EX5 MA5 WB5
CS152-Spring’08
8
5-Stage Pipelined Execution
Resource Usage Diagram
0x4
Add
PC
addr
rdata
we
rs1
rs2
rd1
ws
wd rd2
GPRs
IR
Inst.
Memory
Resources
I-Fetch
(IF)
2/5/2008
we
addr
rdata
ALU
Data
Memory
Imm
Ext
wdata
t5
Write
Memory
(MA)
Back
t6 t7 . . . . (WB)
I5
I4
I3
I2
I5
I4
I3
Decode, Reg. Fetch Execute
(ID)
(EX)
time
IF
ID
EX
MA
WB
t0
I1
t1
I2
I1
t2
I3
I2
I1
t3
I4
I3
I2
I1
t4
I5
I4
I3
I2
I1
CS152-Spring’08
I5
I4
I5
9
Pipelined Execution:
ALU Instructions
0x4
IR
Add
PC
addr
IR
IR
31
inst IR
Inst
Memory
we
rs1
rs2
rd1
ws
wd rd2
A
ALU
GPRs
Y
B
we
addr
rdata
Data
Memory
Imm
Ext
R
wdata
wdata
MD1
MD2
Not quite correct!
We need an Instruction Reg (IR) for each stage
2/5/2008
CS152-Spring’08
10
Pipelined MIPS Datapath
without jumps
F
D
E
M
IR
W
IR
IR
31
0x4
Add
RegDst
RegWrite
PC
addr
inst IR
Inst
Memory
we
rs1
rs2
rd1
ws
wd rd2
OpSel
MemWrite
A
ALU
GPRs
Y
we
addr
rdata
B
Data
Memory
wdata
Imm
Ext
R
wdata
MD1
ExtSel
WBSrc
MD2
BSrc
Control Points Need to
Be Connected
2/5/2008
CS152-Spring’08
11
How Instructions can Interact with
each other in a pipeline
• An instruction in the pipeline may need a resource
being used by another instruction in the pipeline 
structural hazard
• An instruction may depend on something produced
by an earlier instruction
– Dependence may be for a data value
 data hazard
– Dependence may be for the next instruction’s address
 control hazard (branches, exceptions)
2/5/2008
CS152-Spring’08
12
Data Hazards
r1 …
r4  r1 …
0x4
IR
Add
PC
addr
31
inst IR
Inst
Memory
we
rs1
rs2
rd1
ws
wd rd2
A
ALU
GPRs
Y
B
we
addr
rdata
Data
Memory
Imm
Ext
R
wdata
wdata
MD1
...
r1 r0 + 10
r4 r1 + 17
...
2/5/2008
IR
IR
MD2
r1 is stale. Oops!
CS152-Spring’08
13
Resolving Data Hazards (1)
Strategy 1:
Wait for the result to be available by freezing
earlier pipeline stages  interlocks
2/5/2008
CS152-Spring’08
14
Feedback to Resolve Hazards
FB1
FB2
stage
1
•
FB3
stage
2
FB4
stage
3
stage
4
Later stages provide dependence information to earlier stages
which can stall (or kill) instructions
• Controlling a pipeline in this manner works provided
the instruction at stage i+1 can complete without
any interference from instructions in stages 1 to i
(otherwise deadlocks may occur)
2/5/2008
CS152-Spring’08
15
Interlocks to resolve Data Hazards
Stall Condition
0x4
nop
Add
PC
addr
IR
IR
31
inst IR
Inst
Memory
...
r1 r0 + 10
r4 r1 + 17
...
2/5/2008
IR
we
rs1
rs2
rd1
ws
wd rd2
GPRs
A
ALU
Y
B
we
addr
rdata
Data
Memory
Imm
Ext
R
wdata
wdata
MD1
CS152-Spring’08
MD2
16
Stalled Stages and Pipeline Bubbles
time
t0 t1 t2 t3 t4 t5
(I1) r1 (r0) + 10 IF1 ID1 EX1 MA1 WB1
(I2) r4 (r1) + 17
IF2 ID2 ID2 ID2 ID2
(I3)
IF3 IF3 IF3 IF3
(I4)
stalled stages
(I5)
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1
I1
I2
I1
t2
I3
I2
I1
t3
I3
I2
nop
I1
t4
I3
I2
nop
nop
I1
t5
I3
I2
nop
nop
nop
t6
CS152-Spring’08
....
EX2 MA2 WB2
ID3 EX3 MA3 WB3
IF4 ID4 EX4 MA4 WB4
IF5 ID5 EX5 MA5 WB5
t6
I4
I3
I2
nop
nop
nop 
2/5/2008
t7
t7
I5
I4
I3
I2
nop
....
I5
I4
I3
I2
I5
I4
I3
I5
I4
pipeline bubble
17
I5
Interlock Control Logic
stall
ws
Cstall
rs
rt
?
0x4
nop
Add
PC
addr
IR
IR
IR
31
inst IR
Inst
Memory
we
rs1
rs2
rd1
ws
wd rd2
GPRs
A
ALU
Y
B
we
addr
rdata
Data
Memory
Imm
Ext
R
wdata
wdata
MD1
MD2
Compare the source registers of the instruction in the decode
stage with the destination register of the uncommitted
instructions.
2/5/2008
CS152-Spring’08
18
Interlock Control Logic
ignoring jumps & branches
stall
Cstall
rs
rt
ws
we
?
re1
re2
nop
Add
PC
addr
Cdest
Cdest
Cre
0x4
ws
ws we
we
IR
IR
IR
31
inst IR
Inst
Memory
we
rs1
rs2
rd1
ws
wd rd2
GPRs
Cdest
A
ALU
Y
B
we
addr
rdata
Data
Memory
Imm
Ext
R
wdata
wdata
MD1
MD2
Should we always stall if the rs field matches some rd?
not every instruction writes a register we
not every instruction reads a register re
2/5/2008
CS152-Spring’08
19
Source & Destination Registers
R-type:
op
rs
rt
I-type:
op
rs
rt
J-type:
op
ALU
ALUi
LW
SW
BZ
J
JAL
JR
JALR
2/5/2008
rd
func
immediate16
immediate26
rd  (rs) func (rt)
rt  (rs) op imm
rt M [(rs) + imm]
M [(rs) + imm]  (rt)
cond (rs)
true: PC  (PC) + imm
false: PC  (PC) + 4
PC  (PC) + imm
r31  (PC), PC  (PC) + imm
PC  (rs)
r31  (PC), PC  (rs)
CS152-Spring’08
source(s) destination
rs, rt
rd
rs
rt
rs
rt
rs, rt
rs
rs
rs
rs
31
31
20
Deriving the Stall Signal
Cdest
ws = Case opcode
ALU
rd
ALUi, LW
rt
JAL, JALR
R31
Cre
we = Case opcode
ALU, ALUi, LW (ws  0)
JAL, JALR
on
...
off
Cstall
2/5/2008
re1 = Case opcode
ALU, ALUi,
LW, SW, BZ,
JR, JALR
on
J, JAL
off
re2 = Case opcode
on
ALU, SW
off
...
stall = ((rsD =wsE).weE +
(rsD =wsM).weM +
(rsD =wsW).weW) . re1D
((rtD =wsE).weE +
(rtD =wsM).weM +
(rtD =wsW).weW) . re2D
CS152-Spring’08
+
21
Hazards due to Loads & Stores
Stall Condition
What if
(r1)+7 = (r3)+5 ?
0x4
nop
Add
PC
addr
IR
31
inst IR
Inst
Memory
we
rs1
rs2
rd1
ws
wd rd2
GPRs
A
ALU
Y
B
...
M[(r1)+7]  (r2)
r4  M[(r3)+5]
...
we
addr
rdata
Data
Memory
Imm
Ext
2/5/2008
IR
IR
R
wdata
wdata
MD1
MD2
Is there any possible data hazard
in this instruction sequence?
CS152-Spring’08
22
Load & Store Hazards
...
M[(r1)+7]  (r2)
r4  M[(r3)+5]
...
(r1)+7 = (r3)+5  data hazard
However, the hazard is avoided because our
memory system completes writes in one cycle !
Load/Store hazards are sometimes resolved in
the pipeline and sometimes in the memory
system itself.
More on this later in the course.
2/5/2008
CS152-Spring’08
23
CS152 Administrivia
• Krste, office hours, Monday 1-3pm, 645 Soda
– email for alternate time
• Henry office hours, 511 Soda
– 9:30-10:30AM Mondays
– 2:00-3:00PM Fridays
• First lab and practice problems this evening
• In-class quiz dates:
– Q1: Tuesday February 19 (ISAs, microcode, simple pipelining)
– Q2: Tuesday March 4 (memory hierarchies)
2/5/2008
CS152-Spring’08
24
Course Schedule
• Check website for schedule
• Generally, a lab, practice problem set, plus quiz
every two weeks
• Lab and practice problems out on Tuesday (this
evening for first)
• Lab overview in section next day (Wednesday)
• Practice problems due following Tuesday in class,
solutions available on line
• Problem review in section next day (Wednesday)
• Lab due in class Thursday
• Quiz following Tuesday, with new PS and lab.
2/5/2008
CS152-Spring’08
25
Resolving Data Hazards (2)
Strategy 2:
Route data as soon as possible after it is
calculated to the earlier pipeline stage  bypass
2/5/2008
CS152-Spring’08
26
Bypassing
time
(I1) r1 r0 + 10
(I2) r4 r1 + 17
(I3)
(I4)
(I5)
t0
IF1
t1 t2 t3 t4 t5
ID1 EX1 MA1 WB1
IF2 ID2 ID2 ID2 ID2
IF3 IF3 IF3 IF3
stalled stages
t6
t7
....
EX2 MA2 WB2
ID3 EX3 MA3
IF4 ID4 EX4
IF5 ID5
Each stall or kill introduces a bubble in the pipeline
CPI > 1
A new datapath, i.e., a bypass, can get the data from
the output of the ALU to its input
time
(I1) r1 r0 + 10
(I2) r4  r1 + 17
(I3)
(I4)
(I5)
2/5/2008
t0
t1
IF1
t2 t3
ID1 EX1
IF2 ID2
IF3
CS152-Spring’08
t4
MA1
EX2
ID3
IF4
t5
WB1
MA2
EX3
ID4
IF5
t6
t7
....
WB2
MA3 WB3
EX4 MA4 WB4
ID5 EX5 MA5 WB5
27
Adding a Bypass
stall
r4  r1...
0x4
r1 ...
nop
Add
IR
E
IR
M
IR
31
ASrc
PC
addr
inst IR
Inst
Memory
D
we
rs1
rs2
rd1
ws
wd rd2
GPRs
A
ALU
Y
B
we
addr
rdata
Data
Memory
Imm
Ext
wdata
wdata
MD1
MD2
When does this bypass help?
...
(I1)
r1 r0 + 10
r1 M[r0 + 10]
JAL 500
(I2)
r4 r1 + 17
r4 r1 + 17
r4 r31 + 17
yes
no
no
28
2/5/2008
CS152-Spring’08
R
W
The Bypass Signal
Deriving it from the Stall Signal
stall = ( ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW).re1D
+((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW).re2D )
ws = Case opcode
ALU
rd
ALUi, LW
rt
JAL, JALR R31
we = Case opcode
ALU, ALUi, LW (ws  0)
JAL, JALR
on
...
off
ASrc = (rsD=wsE).weE.re1D
Is this correct?
No because only ALU and ALUi instructions can benefit
from this bypass
Split weE into two components: we-bypass, we-stall
2/5/2008
CS152-Spring’08
29
Bypass and Stall Signals
Split weE into two components: we-bypass, we-stall
we-bypassE = Case opcodeE
ALU, ALUi  (ws  0)
...
off
we-stallE = Case opcodeE
LW
 (ws  0)
JAL, JALR on
...
off
ASrc
= (rsD =wsE).we-bypassE . re1D
stall
= ((rsD =wsE).we-stallE +
(rsD=wsM).weM + (rsD=wsW).weW). re1D
+((rtD = wsE).weE + (rtD = wsM).weM + (rtD = wsW).weW). re2D
2/5/2008
CS152-Spring’08
30
Fully Bypassed Datapath
PC for JAL, ...
stall
0x4
nop
Add
PC
addr
ASrc
inst IR
Inst
Memory
D
Is there still
a need for the
stall signal ?
2/5/2008
IR
M
IR
31
we
rs1
rs2
rd1
ws
wd rd2
A
ALU
GPRs
Imm
Ext
IR
E
Y
B
BSrc
we
addr
rdata
Data
Memory
R
wdata
wdata
MD1
MD2
stall = (rsD=wsE). (opcodeE=LWE).(wsE0 ).re1D
+ (rtD=wsE). (opcodeE=LWE).(wsE0 ).re2D
CS152-Spring’08
31
W
Resolving Data Hazards (3)
Strategy 3:
Speculate on the dependence. Two cases:
Guessed correctly  do nothing
Guessed incorrectly  kill and restart
2/5/2008
CS152-Spring’08
32
Instruction to Instruction Dependence
• What do we need to calculate next PC:
– For Jumps
» Opcode, offset and PC
– For Jump Register
» Opcode and Register value
– For Conditional Branches
» Opcode, PC, Register (for condition), and
offset
– For all others
» Opcode and PC
2/5/2008
CS152-Spring’08
33
PC Calculation Bubbles
time
t0 t1 t2 t3
(I1) r1 (r0) + 10 IF1 ID1 EX1 MA1
(I2) r3 (r2) + 17
IF2 IF2 ID2
(I3)
IF3
(I4)
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1 t2
I1
nop I2
I1
nop
I1
t3
t4 t5
WB1
EX2 MA2
IF3 ID3
IF4
t4
nop I3
I2
nop
nop I2
I1
nop
I1
t6
t5
t6
nop I4
I3
nop
nop I3
I2
nop
nop I2
CS152-Spring’08
....
WB2
EX3 MA3 WB3
IF4 ID4 EX4 MA4 WB4
nop 
2/5/2008
t7
t7
....
I4
nop I4
I3
nop I4
nop I3
nop I4
pipeline bubble
34
Speculate next address is PC+4
PCSrc (pc+4 / jabs / rind/ br)
stall
Add
nop
0x4
Add
Jump?
PC
104
I1
I2
I3
I4
addr
M
IR
IR
I1
IR
Inst
Memory
096
100
104
304
2/5/2008
inst
E
I2
ADD
J 304
ADD
ADD
kill
A jump instruction kills (not stalls)
the following instruction
How?
CS152-Spring’08
35
Pipelining Jumps
PCSrc (pc+4 / jabs / rind/ br)
stall
To kill a fetched
instruction -- Insert
a mux before IR
Add
nop
0x4
Add
Jump?
304
104
I1
I2
I3
I4
addr
nop
Inst
Memory
096
100
104
304
2/5/2008
inst
IR
nop
I2
ADD
J 304
ADD
ADD
kill
M
IR
IR
II21
I1
Any
interaction
between
stall and
jump?
IRSrcD
PC
E
IRSrcD = Case opcodeD
J, JAL
nop
...
IM
CS152-Spring’08
36
Jump Pipeline Diagrams
(I1)
(I2)
(I3)
(I4)
096:
100:
104:
304:
ADD
J 304
ADD
ADD
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1 t2
IF1 ID1 EX1
IF2 ID2
IF3
time
t0 t1
I1
I2
I1
t2
I3
I2
I1
t3
MA1
EX2
nop
IF4
t4
WB1
MA2
nop
ID4
t5
t3
I4
nop
I2
I1
t4
I5
I4
nop
I2
I1
t5
t6
CS152-Spring’08
....
WB2
nop nop
EX4 MA4 WB4
t6
t7
I5
I4
I5
nop I4
I5
I2
nop I4
nop 
2/5/2008
t7
....
I5
pipeline bubble
37
Pipelining Conditional Branches
PCSrc (pc+4 / jabs / rind / br)
stall
Add
nop
0x4
Add
E
M
IR
IR
I1
BEQZ?
zero?
IRSrcD
PC
104
I1
I2
I3
I4
096
100
104
304
2/5/2008
addr
inst
nop
Inst
Memory
ADD
BEQZ r1 +200
ADD
ADD
A
IR
ALU
Y
I2
Branch condition is not known until
the execute stage
what action should be taken in the
decode stage ?
CS152-Spring’08
38
Pipelining Conditional Branches
PCSrc (pc+4 / jabs / rind / br)
stall
?
Add
E
nop
0x4
Add
M
BEQZ?
IR
IR
I2
I1
zero?
IRSrcD
PC
108
I1
I2
I3
I4
096
100
104
304
2/5/2008
addr
inst
Inst
Memory
nop
A
IR
ALU
Y
I3
If the branch is taken
ADD
- kill the two following instructions
BEQZ r1 +200
- the instruction at the decode stage
ADD
is not valid
ADD
 stall signal is not valid
CS152-Spring’08
39
Pipelining Conditional Branches
stall
Add
PCSrc (pc+4/jabs/rind/br)
E
IRSrcE
nop
0x4
Add
Jump?
M
BEQZ?
IR
IR
I2
I1
zero?
PC
PC
108
I1
I2
I3
I4
096
100
104
304
2/5/2008
addr
IRSrcD
inst
Inst
Memory
nop
A
IR
ALU
Y
I3
If the branch is taken
ADD
- kill the two following instructions
BEQZ r1 200
- the instruction at the decode stage
ADD
is not valid
ADD
 stall signal is not valid
CS152-Spring’08
40
New Stall Signal
stall = (
((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW).re1D
+ ((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW).re2D
) . !((opcodeE=BEQZ).z + (opcodeE=BNEZ).!z)
Don’t stall if the branch is taken. Why?
Instruction at the decode stage is invalid
2/5/2008
CS152-Spring’08
41
Control Equations for PC and IR Muxes
PCSrc = Case opcodeE
BEQZ.z, BNEZ.!z
br
...

Case opcodeD
J, JAL
JR, JALR
...
 jabs
 rind
 pc+4
IRSrcD = Case opcodeE
BEQZ.z, BNEZ.!z
nop
...

Case opcodeD
J, JAL, JR, JALR nop
...
IM
Give priority
to the older
instruction,
i.e., execute
stage instruction
over decode
stage instruction
IRSrcE = Case opcodeE
BEQZ.z, BNEZ.!z
nop
...
stall.nop + !stall.IRD
2/5/2008
CS152-Spring’08
42
Branch Pipeline Diagrams
(resolved in execute stage)
(I1)
(I2)
(I3)
(I4)
(I5)
time
t0 t1 t2
096: ADD
IF1 ID1 EX1
100: BEQZ 200
IF2 ID2
104: ADD
IF3
108:
304: ADD
Resource
Usage
IF
ID
EX
MA
WB
time
t0 t1
I1
I2
I1
t2
I3
I2
I1
t3
MA1
EX2
ID3
IF4
t4
WB1
MA2
nop
nop
IF5
t5
t3
I4
I3
I2
I1
t4
I5
nop
nop
I2
I1
t5
t6
CS152-Spring’08
....
WB2
nop nop
nop nop nop
ID5 EX5 MA5 WB5
t6
t7
....
I5
nop I5
nop nop I5
I2
nop nop I5
nop 
2/5/2008
t7
pipeline bubble
43
Reducing Branch Penalty
(resolve in decode stage)
• One pipeline bubble can be removed if an extra
comparator is used in the Decode stage
PCSrc (pc+4 / jabs / rind/ br)
E
Add
nop
0x4
IR
Add
PC
addr
nop
inst
Inst
Memory
IR
D
we
rs1
rs2
rd1
ws
wd rd2
Zero detect on
register file output
GPRs
Pipeline diagram now same as for jumps
2/5/2008
CS152-Spring’08
44
Branch Delay Slots
(expose control hazard to software)
•
Change the ISA semantics so that the instruction that follows a jump or
branch is always executed
–
I1
I2
I3
I4
gives compiler the flexibility to put in a useful instruction where normally a pipeline bubble
would have resulted.
096
100
104
304
ADD
BEQZ r1 200
ADD
ADD
Delay slot instruction
executed regardless of
branch outcome
• Other techniques include branch prediction, which can
dramatically reduce the branch penalty... to come later
2/5/2008
CS152-Spring’08
45
Why an Instruction may not be
dispatched every cycle (CPI>1)
• Full bypassing may be too expensive to implement
– typically all frequently used paths are provided
– some infrequently used bypass paths may increase cycle time and
counteract the benefit of reducing CPI
•
Loads have two cycle latency
– Instruction after load cannot use load result
– MIPS-I ISA defined load delay slots, a software-visible pipeline hazard
(compiler schedules independent instruction or inserts NOP to avoid
hazard). Removed in MIPS-II.
•
Conditional branches may cause bubbles
– kill following instruction(s) if no delay slots
Machines with software-visible delay slots may execute significant
number of NOP instructions inserted by the compiler.
2/5/2008
CS152-Spring’08
46
Acknowledgements
• These slides contain material developed and
copyright by:
–
–
–
–
–
–
Arvind (MIT)
Krste Asanovic (MIT/UCB)
Joel Emer (Intel/MIT)
James Hoe (CMU)
John Kubiatowicz (UCB)
David Patterson (UCB)
• MIT material derived from course 6.823
• UCB material derived from course CS252
2/5/2008
CS152-Spring’08
47
Download