CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructor:

advertisement
CS 61C: Great Ideas in Computer
Architecture (Machine Structures)
MIPS Control, MIPS Pipelining Intro
Instructor:
Michael Greenbaum
6/27/2016
Summer 2011 -- Lecture #21
1
Levels of
Representation/Interpretation
High Level Language
Program (e.g., C)
Compiler
Assembly Language
Program (e.g., MIPS)
Assembler
Machine Language
Program (MIPS)
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw
lw
sw
sw
0000
1010
1100
0101
$t0, 0($2)
$t1, 4($2)
$t1, 0($2)
$t0, 4($2)
1001
1111
0110
1000
1100
0101
1010
0000
Anything can be represented
as a number,
i.e., data or instructions
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine
Interpretation
Hardware Architecture Description
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)Spring 2011 -- Lecture #18
6/27/2016
2
Review: A Single Cycle Datapath
RegDst
32
0
5
5
5
Rw Ra Rb
RegFile
busA
busB
32
16
Extender
imm16
MemtoReg
MemWr
Rs Rt
clk
clk
ALUctr
Zero
32
z
ALU
busW
PC
PC Ext
Adder
Mux
00
RegWr
Adder
4
Rt Rd Imm16
Rd Rt
1
Instruction<31:0>
<0:15>
nPC_sel
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
0
32
1
32
Data In
clk
imm16
ExtOp
ALUSrc
32
0
WrEn Adr
Data
Memory
1
Agenda
•
•
•
•
•
Overview of Control Signals
Administrivia
Control Implementation
Break
Pipelining Intro
6/27/2016
Summer 2011 -- Lecture #21
4
Processor Design Process
• Five steps to design a processor:
Step 1: Analyze instruction set to determine datapath
requirements
Step 2: Select set of datapath components & establish
clocking methodology
Step 3: Assemble datapath components to meet the
requirements
Step 4: Analyze implementation of each instruction to
determine setting of control signals that implements
the register transfer
Step 5: Assemble the control logic
6/27/2016
Spring 2011 -- Lecture #18
5
A Single Cycle Datapath
Instruction<31:0>
Rs Rt Rd Imm16
ALUctr
MemtoReg
Rd Rt
1
RegWr
0
5
Rs Rt
5
busB
32
16
Extender
clk
imm16
busA
ExtOp
32
MemWr
Z
ALU
RegFile
32
zero
5
Rw Ra Rb
busW
<0:15>
RegDst
<11:15>
clk
<16:20>
instr
fetch
unit
nPC_sel
<21:25>
• We have everything
but the values of
control signals
0
32
1
32
Data In
clk
ALUSrc
32
0
WrEn Adr
Data
Memory
1
Values of the Control Signals
“n”=next
“+4” 0  PC <– PC + 4
“br” 1  PC <– PC + 4 + {SignExt(Im16) , 00 }
Inst Address
0
PC
Mux
Adder
PC Ext
imm16
nPC_sel
Adder
4
00
• nPC_sel:
1
clk
Values of the Control Signals
• ExtOp:
• ALUsrc:
• ALUctr:
°
°
°
°
zero, sign
0  reg;
1  imm
ADD, SUB, OR
MemWr: 1  write memory
MemtoReg: 0  alu; 1  mem
RegDst:
0  rt; 1  rd
RegWr:
1  write register
ALUctr
RegDst Rd Rt
1
RegWr
0
Rs Rt
5
5
5
Rw Ra Rb
RegFile
32
busA
busB
32
imm16
16
ExtOp
Extender
clk
32
0
ALU
busW
MemtoReg
MemWr
32
0
32 WrEn Adr
1
32
Data In
ALUSrc clk
Data
Memory
1
RTL: The Add Instruction
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
11
6
0
rd
shamt
funct
5 bits
5 bits
6 bits
add rd, rs, rt
– MEM[PC]
Fetch the instruction from memory
– R[rd] = R[rs] + R[rt] The actual operation
– PC = PC + 4 Calculate the next instruction’s address
6/27/2016
Spring 2011 -- Lecture #18
9
Single Cycle Datapath during Add
31
26
op
21
16
rs
11
rt
rd
6
0
shamt
funct
R[rd] = R[rs] + R[rt]
RegWr=1
rs
5
5
Rw
busW
rt
5
Ra Rb
busB
32
imm16
16
ExtOp=x
Extender
clk
Rs Rt Rd Imm16
zero ALUctr=ADD
MemtoReg=0
MemWr=0
32
=
ALU
RegFile
32
6/27/2016
busA
32
0
0
32
1
Data In
32
ALUSrc=0
Spring 2011 -- Lecture #18
<0:15>
0
<11:15>
1
<16:20>
rt
<21:25>
rd
Instruction<31:0>
instr
fetch
unit
nPC_sel=+4
RegDst=1
clk
clk
WrEn Adr
Data
Memory
1
10
Instruction Fetch Unit for Add
• PC = PC + 4
– Same for all
instructions except:
Branch and Jump
Inst
Memory
nPC_sel=+4
Inst Address
Adder
4
00
PC
Mux
Adder
PC Ext
clk
imm16
6/27/2016
Spring 2011 -- Lecture #18
11
Single Cycle Datapath during Or Immediate
31
26
21
op
16
rs
0
rt
immediate
• R[rt] = R[rs] OR ZeroExt[Imm16]
Rs Rt
5
5
Rw
busW
5
Ra Rb
busA
busB
32
imm16
16
ExtOp=
Extender
clk
32
=
ALU
RegFile
32
6/27/2016
Rs Rt Rd
zero ALUctr=
0
<0:15>
RegWr=
<11:15>
1
clk
Instruction<31:0>
<16:20>
Rd Rt
instr
fetch
unit
<21:25>
nPC_sel=
RegDst=
Imm16
MemtoReg=
MemWr=
32
0
0
32
1
Data In
32
ALUSrc=
Spring 2011 -- Lecture #18
clk
WrEn Adr
Data
Memory
1
12
Single Cycle Datapath during Or Immediate
31
26
21
op
16
rs
0
rt
immediate
• R[rt] = R[rs] OR ZeroExt[Imm16]
5
Rw
busW
Rs Rt
5
5
Ra Rb
busA
busB
32
imm16
16
ExtOp=zero
Extender
clk
32
=
ALU
RegFile
32
6/27/2016
Rs Rt Rd
zero ALUctr=OR
0
<0:15>
RegWr=1
<11:15>
clk
Rd Rt
1
instr
fetch
unit
<21:25>
RegDst=0
Instruction<31:0>
<16:20>
nPC_sel=+4
Imm16
MemtoReg=0
MemWr=0
32
0
0
32
1
Data In
32
ALUSrc=1
Spring 2011 -- Lecture #18
clk
WrEn Adr
Data
Memory
1
13
Single Cycle Datapath during Load
31
26
21
op
16
rs
0
rt
immediate
• R[rt] = Data Memory {R[rs] + SignExt[imm16]}
Rs Rt
5
5
Rw
busW
5
Ra Rb
busA
busB
32
imm16
16
ExtOp=
Extender
clk
32
=
ALU
RegFile
32
6/27/2016
Rs Rt Rd
zero ALUctr=
0
<0:15>
RegWr=
<11:15>
1
clk
Instruction<31:0>
<16:20>
Rd Rt
instr
fetch
unit
<21:25>
nPC_sel=
RegDst=
Imm16
MemtoReg=
MemWr=
32
0
0
32
1
Data In
32
ALUSrc=
clk
Spring 2011 -- Lecture #18
WrEn Adr
Data
Memory
1
14
Single Cycle Datapath during Load
31
26
21
op
16
rs
0
rt
immediate
• R[rt] = Data Memory {R[rs] + SignExt[imm16]}
5
Ra Rb
busB
32
imm16
16
ExtOp=sign
Extender
clk
Rs Rt Rd Imm16
zero ALUctr=ADD
MemtoReg=1
MemWr=0
32
=
ALU
RegFile
32
6/27/2016
busA
<0:15>
Rw
busW
5
<11:15>
5
Rs Rt
<16:20>
RegWr=1
0
<21:25>
Rd Rt
1
Instruction<31:0>
instr
fetch
unit
nPC_sel=+4
RegDst=0
clk
32
0
0
32
1
Data In
32
ALUSrc=1
clk
Spring 2011 -- Lecture #18
WrEn Adr
Data
Memory
1
15
Single Cycle Datapath during Branch
31
26
21
op
•
16
rs
0
rt
immediate
if (R[rs] - R[rt] == 0) then Zero = 1 ; else Zero = 0
Rs Rt
5
5
Rw
busW
5
Ra Rb
busA
busB
32
imm16
16
ExtOp=
Extender
clk
32
=
ALU
RegFile
32
6/27/2016
Rs Rt Rd
zero ALUctr=
0
<0:15>
RegWr=
<11:15>
1
clk
<16:20>
Rd Rt
Instruction<31:0>
<21:25>
nPC_sel=
RegDst=
instr
fetch
unit
Imm16
MemtoReg=
MemWr=
32
0
0
32
1
Data In
32
ALUSrc=
clk
Spring 2011 -- Lecture #18
WrEn Adr
Data
Memory
1
16
Single Cycle Datapath during Branch
31
26
21
op
•
16
rs
0
rt
immediate
if (R[rs] - R[rt] == 0) then Zero = 1 ; else Zero = 0
5
Rw
busW
5
Ra Rb
busB
32
imm16
16
ExtOp=x
Extender
clk
Rs Rt Rd Imm16
zero ALUctr=SUB
MemtoReg=x
MemWr=0
32
=
ALU
RegFile
32
6/27/2016
busA
<0:15>
5
<11:15>
Rs Rt
<16:20>
RegWr=0
0
<21:25>
Rd Rt
1
Instruction<31:0>
instr
fetch
unit
nPC_sel=br
RegDst=x
clk
32
0
0
32
1
Data In
32
ALUSrc=0
clk
Spring 2011 -- Lecture #18
WrEn Adr
Data
Memory
1
17
Instruction Fetch Unit at the End of Branch
31
26
op
21
16
rs
0
rt
immediate
• if (Zero == 1) then PC = PC + 4 + SignExt[imm16]*4 ; else PC = PC + 4
Inst
Memory
Adr
nPC_sel
Instruction<31:0>
Zero
0
Mux
PC
Adder
6/27/2016
PC Ext
imm16
Adder
4
00
MUX
ctrl
nPC_sel
1
clk
(nPC_sel = 1 if branch instr.)
nPC_sel
0
1
1
zero?
x
0
1
Spring 2011 -- Lecture #18
MUX
0
0
1
Q: What logic
gate?
18
Summary: Datapath’s Control Signals
• ExtOp:
• ALUsrc:
• ALUctr:
•
•
•
•
“zero”, “sign”
0  regB;
1  immed
“ADD”, “SUB”, “OR”
MemWr:
MemtoReg:
RegDst:
RegWr:
ALUctr
MemtoReg
MemWr
RegDst Rd Rt
1
Inst Address
RegWr
4
0
Rs Rt
5
5
Rw
busW
5
Ra Rb
busA
RegFile
busB
PC
Mux
32
clk
imm16
16
Extender
PC Ext
Adder
1
imm16
0
32 WrEn Adr
1
Data In
ALUSrc
clk
32
ExtOp
6/27/2016
32
0
32
clk
32
ALU
Adder
0
00
nPC_sel
1  write memory
0  ALU; 1  Mem
0  “rt”; 1  “rd”
1  write register
Spring 2011 -- Lecture #18
1
Data
Memory
19
Administrivia
• HW3 Due Wednesday at midnight
• Project 2 Part 2 due Sunday.
– Slides at end of July 12 lecture contain useful info.
– Project 2 Extra Credit: If you have time after finishing
Part 2, try to make your program as fast as possible!
Top submissions in the class get lots of extra credit,
substantial improvement will get some.
• Lab 12 cancelled!
– Replaced with free study session where you can catch
up on labs / work on project 2.
– The TA’s will still be there.
6/27/2016
Spring 2011 -- Lecture #18
20
cs61c in the News
• Use power dissipation from servers as source of heating?
• So-called “Data Furnaces.” Sell the heat to consumers.
• “Additionally, such a setup would
also provide lower network
latency as the storage and
computation systems can be
located closer to areas of high
population density and therefore
those using them.”
http://www.gizmag.com/data-furnaces-to-heat-the-home/19314/
http://research.microsoft.com/pubs/150265/heating.pdf
6/27/2016
Summer 2011 -- Lecture #21
21
Processor Design Process
• Five steps to design a processor:
Step 1: Analyze instruction set to determine datapath
requirements
Step 2: Select set of datapath components & establish
clocking methodology
Step 3: Assemble datapath components to meet the
requirements
Step 4: Analyze implementation of each instruction to
determine setting of control signals that implements
the register transfer
Step 5: Assemble the control logic
6/27/2016
Spring 2011 -- Lecture #18
22
Given Datapath: RTL  Control
Instruction<31:0>
Rd
<0:15>
Rs
<11:15>
Rt
<16:20>
Op Fun
<21:25>
<0:5>
<26:31>
Inst
Memory
Adr
Imm16
Control
nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr
MemWr MemtoReg
DATA PATH
6/27/2016
Spring 2011 -- Lecture #18
23
Summary of the Control Signals (1/2)
inst
Register Transfer
add
R[rd]  R[rs] + R[rt]; PC  PC + 4
ALUsrc=RegB, ALUctr=“ADD”, RegDst=rd, RegWr, nPC_sel=“+4”
sub
R[rd]  R[rs] – R[rt]; PC  PC + 4
ALUsrc=RegB, ALUctr=“SUB”, RegDst=rd, RegWr, nPC_sel=“+4”
ori
R[rt]  R[rs] + zero_ext(Imm16); PC  PC + 4
ALUsrc=Im, Extop=“Z”, ALUctr=“OR”, RegDst=rt,RegWr, nPC_sel=“+4”
R[rt]  MEM[ R[rs] + sign_ext(Imm16)]; PC  PC + 4
lw
ALUsrc=Im, Extop=“sn”, ALUctr=“ADD”, MemtoReg, RegDst=rt, RegWr,
nPC_sel = “+4”
MEM[ R[rs] + sign_ext(Imm16)]  R[rs]; PC  PC + 4
sw
ALUsrc=Im, Extop=“sn”, ALUctr = “ADD”, MemWr, nPC_sel = “+4”
Beq
if (R[rs] == R[rt]) then PC  PC + sign_ext(Imm16)] || 00
else PC  PC + 4
nPC_sel = “br”,
6/27/2016
ALUctr = “SUB”
Spring 2011 -- Lecture #18
24
Summary of the Control Signals (2/2)
See
Appendix A
func 10 0000 10 0010
op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
add
sub
ori
lw
sw
beq
jump
RegDst
1
1
0
0
x
x
x
ALUSrc
0
0
1
1
1
0
x
MemtoReg
0
0
0
1
x
x
x
RegWrite
1
1
1
1
0
0
0
MemWrite
0
0
0
0
1
0
0
nPCsel
0
0
0
0
0
1
?
Jump
0
0
0
0
0
0
1
ExtOp
x
x
0
1
1
x
Add
Subtract
Or
Add
Add
x
Subtract
ALUctr<2:0>
31
26
21
16
R-type
op
rs
rt
I-type
op
rs
rt
J-type
op
6/27/2016
We Don’t Care :-)
11
rd
6
shamt
immediate
target address
Spring 2011 -- Lecture #18
x
0
funct
add, sub
ori, lw, sw, beq
jump
25
Boolean Expressions for Controller
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
nPCsel
Jump
ExtOp
ALUctr[0]
ALUctr[1]
=
=
=
=
=
=
=
=
=
=
add + sub
ori + lw + sw
lw
add + sub + ori + lw
sw
beq
jump
lw + sw
sub + beq
(assume ALUctr is
or
00 ADD,
01: SUB,
10: OR)
Where:
rtype
ori
lw
sw
beq
jump
=
=
=
=
=
=
~op5
~op5
op5
op5
~op5
~op5






~op4
~op4
~op4
~op4
~op4
~op4






~op3
op3
~op3
op3
~op3
~op3






~op2
op2
~op2
~op2
op2
~op2






~op1
~op1
op1
op1
~op1
op1
 ~op0,
 op0
 op0
 op0
 ~op0
 ~op0
How do we
implement this in
gates?
add = rtype  func5  ~func4  ~func3  ~func2  ~func1  ~func0
sub = rtype  func5  ~func4  ~func3  ~func2  func1  ~func0
6/27/2016
Spring 2011 -- Lecture #18
26
Controller Implementation
opcode
func
“AND” logic
6/27/2016
add
sub
ori
lw
sw
beq
jump
“OR” logic
Spring 2011 -- Lecture #18
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
nPCsel
Jump
ExtOp
ALUctr[0]
ALUctr[1]
27
An Abstract View of the Implementation
Ideal
Instruction
Memory
PC
clk
32
Instruction
Rd Rs Rt
5 5
5
Rw Ra Rb
Register
File
clk
Control Signals Conditions
A
32
ALU
Next Address
Instruction
Address
Control
B
32
32
Datapath
Data
Addr
Data
Ideal
Out
Data
Memory
Data
In clk
Call home, we’ve made HW/SW contact!
High Level Language
Program (e.g., C)
Compiler
Assembly Language
Program (e.g., MIPS)
Assembler
Machine Language
Program (MIPS)
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw
lw
sw
sw
0000
1010
1100
0101
$t0, 0($2)
$t1, 4($2)
$t1, 0($2)
$t0, 4($2)
1001
1111
0110
1000
1100
0101
1010
0000
Anything can be represented
as a number,
i.e., data or instructions
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine
Interpretation
Hardware Architecture Description
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)Spring 2011 -- Lecture #18
6/27/2016
29
Agenda
•
•
•
•
•
Overview of Control Signals
Administrivia
Control Implementation
Break
Pipelining Intro
6/27/2016
Summer 2011 -- Lecture #21
30
Agenda
•
•
•
•
•
Overview of Control Signals
Administrivia
Control Implementation
Break
Pipelining Intro
6/27/2016
Summer 2011 -- Lecture #21
31
Single Cycle Performance
• Assume time for actions are
– 100ps for register read or write; 200ps for other events
• Clock rate is?
Instr
Instr
fetch
Register
read
ALU op
Memory
access
Register
write
Total
time
lw
200ps
100 ps
200ps
200ps
100 ps
800ps
sw
200ps
100 ps
200ps
200ps
R-format
200ps
100 ps
200ps
beq
200ps
100 ps
200ps
700ps
100 ps
600ps
500ps
• What can we do to improve clock rate?
• Will this improve performance as well?
Want increased clock rate to mean faster programs
6/27/2016
Summer 2011 -- Lecture #21
32
Pipeline Analogy: Doing Laundry
• Ann, Brian, Cathy, Dave
each have one load of clothes to
wash, dry, fold, and put away
A B C D
– Washer takes 30 minutes
– Dryer takes 30 minutes
– “Folder” takes 30 minutes
– “Stasher” takes 30 minutes to put
clothes into drawers
6/27/2016
Summer 2011 -- Lecture #21
33
Sequential Laundry
6 PM 7
T
a
s
k
O
r
d
e
r
A
8
9
10
11
12
1
2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
B
C
D
6/27/2016
• Sequential laundry takes
8 hours for 4 loads
Summer 2011 -- Lecture #21
34
Pipelined Laundry
6 PM 7
T
a
s
k
8
9
11
10
3030 30 30 30 30 30
12
1
2 AM
Time
A
B
C
O
D
r
d
e
r •
6/27/2016
Pipelined laundry takes
3.5 hours for 4 loads!
Summer 2011 -- Lecture #21
35
Pipelining Lessons (1/2)
6 PM
T
a
s
k
8
9
Time
30 30 30 30 30 30 30
A
B
O
r
d
e
r
7
C
D
6/27/2016
• Pipelining doesn’t help latency
of single task, it helps
throughput of entire workload
• Multiple tasks operating
simultaneously using different
resources
• Potential speedup = Number
pipe stages
• Time to fill pipeline and time
to drain it reduces speedup:
2.3X v. 4X in this example
Summer 2011 -- Lecture #21
36
Pipelining Lessons (2/2)
6 PM
T
a
s
k
8
9
Time
30 30 30 30 30 30 30
A
B
O
r
d
e
r
7
C
D
6/27/2016
• Suppose new Washer
takes 20 minutes, new
Stasher takes 20
minutes. How much
faster is pipeline?
• Pipeline rate limited by
slowest pipeline stage
• Unbalanced lengths of
pipe stages reduces
speedup
Summer 2011 -- Lecture #21
37
Recall: Steps in Executing MIPS
1) IF: Instruction Fetch, Increment PC
2) ID: Instruction Decode, Read Registers
3) EX:
Mem-ref: Calculate Address
Arith-log: Perform Operation
4) Mem:
Load: Read Data from Memory
Store: Write Data to Memory
5) WB: Write Data Back to Register
6/27/2016
Summer 2011 -- Lecture #21
38
+4
1. Instruction
Fetch
6/27/2016
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Redrawn Single-Cycle Datapath
imm
2. Decode/
3. Execute 4. Memory
Register Read
Summer 2011 -- Lecture #21
5. Write
Back
39
+4
1. Instruction
Fetch
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Pipelined Datapath
imm
2. Decode/
3. Execute 4. Memory
Register Read
5. Write
Back
• Add registers between stages
– Hold information produced in previous cycle
6/27/2016
Summer 2011 -- Lecture #21
40
More Detailed Pipeline
6/27/2016
Summer 2011 -- Lecture #21
41
IF for Load, Store, …
6/27/2016
Summer 2011 -- Lecture #21
42
ID for Load, Store, …
6/27/2016
Summer 2011 -- Lecture #21
43
EX for Load
6/27/2016
Summer 2011 -- Lecture #21
44
MEM for Load
6/27/2016
Summer 2011 -- Lecture #21
45
WB for Load
Wrong
register
number
6/27/2016
Summer 2011 -- Lecture #21
46
Corrected Datapath for Load
6/27/2016
Summer 2011 -- Lecture #21
47
Pipelined Execution Representation
Time
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
IF
ID
EX
Mem WB
• Every instruction must take same number of steps, also
called pipeline stages, so some will go idle sometimes
6/27/2016
Summer 2011 -- Lecture #21
48
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Graphical Pipeline Diagrams
imm
+4
1. Instruction
Fetch
2. Decode/
3. Execute 4. Memory
Register Read
5. Write
Back
• Use datapath figure below to represent pipeline
IF
6/27/2016
Reg
EX
ALU
I$
ID
Mem WB
D$
Reg
Summer 2011 -- Lecture #21
49
Graphical Pipeline Representation
(In Reg, right half highlight read, left half write)
Time (clock cycles)
Reg
Reg
D$
Reg
I$
Reg
D$
Reg
I$
Reg
ALU
D$
Reg
I$
Reg
ALU
I$
D$
ALU
Reg
ALU
I$
ALU
I
n
Load
s
t Add
r.
Store
O
Sub
r
d Or
e
r6/27/2016
D$
Summer 2011 -- Lecture #21
Reg
50
Pipeline Performance
• Assume time for stages is
– 100ps for register read or write
– 200ps for other stages
• What is pipelined clock rate?
– Compare pipelined datapath with single-cycle datapath
Instr
Instr fetch Register
read
ALU op
Memory
access
Register
write
Total time
lw
200ps
100 ps
200ps
200ps
100 ps
800ps
sw
200ps
100 ps
200ps
200ps
R-format
200ps
100 ps
200ps
beq
200ps
100 ps
200ps
6/27/2016
700ps
100 ps
600ps
500ps
Summer 2011 -- Lecture #21
51
Student Roulette?
Pipeline Performance
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
6/27/2016
Summer 2011 -- Lecture #21
52
Pipeline Speedup
• If all stages are balanced
– i.e., all take the same time
– Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
• If not balanced, speedup is less
• Speedup due to increased throughput
– Latency (time for each instruction) does not
decrease
6/27/2016
Summer 2011 -- Lecture #21
53
Instruction Level Parallelism (ILP)
• ILP – e.g., Pipelined Instruction Execution
– 5 stage pipeline => 5 instructions executing
simultaneously, one at each pipeline stage
6/27/2016
Summer 2011 -- Lecture #21
54
And in Conclusion, …
• Can determine control signal values for each
instruction.
• Control implementation: Just combinational
logic (at least for a single cycle CPU…)
• Pipelining
– Reduce critical path by inserting registers between
each stage
– N-stage pipeline => N operations done in parallel!
6/27/2016
Summer 2011 -- Lecture #21
55
Bonus slides
• Note: You are still responsible for knowing the
material covered in these slides.
• The slides will appear in the order they would
have in the normal presentation
Single Cycle Datapath during Store
31
26
21
op
16
rs
0
rt
immediate
• Data Memory {R[rs] + SignExt[imm16]} = R[rt]
Rs Rt
5
5
Rw
busW
5
Ra Rb
busA
busB
32
imm16
16
ExtOp=
Extender
clk
32
=
ALU
RegFile
32
6/27/2016
Rs Rt Rd
zero ALUctr=
0
<0:15>
RegWr=
<11:15>
1
clk
Instruction<31:0>
<16:20>
Rd Rt
instr
fetch
unit
<21:25>
nPC_sel=
RegDst=
Imm16
MemtoReg=
MemWr=
32
0
0
32
1
Data In
32
ALUSrc=
clk
Spring 2011 -- Lecture #18
WrEn Adr
Data
Memory
1
57
Single Cycle Datapath during Store
31
26
21
op
16
rs
0
rt
immediate
• Data Memory {R[rs] + SignExt[imm16]} = R[rt]
Rw
busW
5
5
Ra Rb
busB
32
imm16
16
ExtOp=sign
Extender
clk
Rs Rt Rd Imm16
zero ALUctr=ADD
MemtoReg=x
MemWr=1
32
=
ALU
RegFile
32
6/27/2016
busA
<0:15>
5
Rs Rt
<11:15>
RegWr=0
0
<16:20>
Rd Rt
<21:25>
nPC_sel=+4
RegDst=x
clk
1
Instruction<31:0>
instr
fetch
unit
32
0
0
32
1
Data In
32
ALUSrc=1
clk
Spring 2011 -- Lecture #18
WrEn Adr
Data
Memory
1
58
Single Cycle Datapath during Jump
31
J-type
26 25
0
op
target address
jump
• New PC = { PC[31..28], target address, 00 }
Instruction<31:0>
Jump=
<0:25>
Data In32
ALUSrc =
Spring 2011 -- Lecture #18
0
32
Clk
WrEn Adr
32
Mux
32
<0:15>
ExtOp =
1
<11:15>
6/27/2016
16
Extender
imm16
Rs Rd Imm16 TA26
MemtoReg =
Zero MemWr =
ALU
busA
Rw Ra Rb
32
32 32-bit
Registers busB
0
32
<16:20>
5
Rt
ALUctr =
Rs Rt
5
5
Mux
32
Clk
Clk
1 Mux 0
RegWr =
busW
Rt
<21:25>
RegDst =
Rd
Instruction
Fetch Unit
nPC_sel=
1
Data
Memory
59
Single Cycle Datapath during Jump
31
J-type
26 25
0
op
target address
jump
• New PC = { PC[31..28], target address, 00 }
Instruction<31:0>
Jump=1
<0:25>
Data In32
ALUSrc = x
Spring 2011 -- Lecture #18
0
32
Clk
WrEn Adr
32
Mux
32
<0:15>
ExtOp = x
1
<11:15>
6/27/2016
16
Extender
imm16
Rs Rd Imm16 TA26
MemtoReg = x
Zero MemWr = 0
ALU
busA
Rw Ra Rb
32
32 32-bit
Registers busB
0
32
<16:20>
5
Rt
ALUctr =x
Rs Rt
5
5
Mux
32
Clk
Clk
1 Mux 0
RegWr = 0
busW
Rt
<21:25>
RegDst = x
Rd
Instruction
Fetch Unit
nPC_sel=?
1
Data
Memory
60
Instruction Fetch Unit at the End of Jump
31
26 25
J-type
0
op
target address
• New PC = { PC[31..28], target address, 00 }
Jump
Inst
Memory
nPC_sel
jump
Instruction<31:0>
Adr
Zero
nPC_MUX_sel
Adder
0
imm16
PC
Mux
Adder
6/27/2016
00
4
How do we modify this
to account for jumps?
1
Clk
Spring 2011 -- Lecture #18
61
Instruction Fetch Unit at the End of Jump
31
26 25
J-type
0
op
target address
• New PC = { PC[31..28], target address, 00 }
Jump
Inst
Memory
nPC_sel
jump
Instruction<31:0>
Adr
Zero
imm16
00
TA
Mux
Adder
6/27/2016
1
4 (MSBs)
1
PC
Adder
0
26
Mux
4
00
nPC_MUX_sel
0
Clk
Spring 2011 -- Lecture #18
Query
• Can Zero still
get asserted?
• Does nPC_sel
need to be 0?
• If not, what?
62
Download