The Control - Iowa State University

advertisement
CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Midterm Review 2
Dr. Zhao Zhang
Iowa State University
Announcement





No quiz today
No homework this Friday
Exam on Monday 9:00-9:50
HW9 deadline extended to next Friday
HW8 solutions will be posted today
Chapter 1 — Computer Abstractions and Technology — 2
Exam 2 Coverage

Coverage: Ch. 4, The Processor






Datapath and control
Simple MIPS pipeline
Data hazards and forwarding
Load-use hazard and pipeline stall
Control hazards
Arithmetic will NOT be covered


Will be covered in the final exam
Final exam is comprehensive
Chapter 1 — Computer Abstractions and Technology — 3
Question Styles and
Coverage



Short answer
True/False or multi-choice
Design and Analysis




Performance analysis and optimization



Signal values in the datapath and control
Identify critical path
Support a new MIPS instruction
Identify pipeline bubbles in program execution
Reorder instructions to improve performance
And others
Chapter 1 — Computer Abstractions and Technology — 4
Nine-Instruction MIPS


They’re enough to illustrate the most aspects of
CPU design, particularly datapath and control
design
Some questions will use it as the baseline design
Memory reference: LW and SW
Arithmetic/logic: ADD, SUB, AND, OR, SLT
Branch: BEQ, J
Chapter 1 — Computer Abstractions and Technology — 5
Datapath With Jumps Added
Chapter 4 — The Processor — 6
The Control

Control signals for the nine-instruction
implementation
Inst
RegDst
ALU- Mem- Reg- Mem Mem Bran
toReg Write Read Write ch
Src
ALU
Op1
ALU
Op0
Jum
p
R-
1
0
0
1
0
0
0
1
0
0
lw
0
1
1
1
1
0
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
0
beq
X
0
X
0
0
0
1
0
1
0
j
X
X
X
0
0
0
0
X
X
1
Note: “R-” means R-format
Chapter 1 — Computer Abstractions and Technology — 7
ALU Control

Truth table for ALU Control

Extend it as a secondary control unit in
projects B & C, with more control signal output
opcode
ALUOp
Operation
funct
ALU function
ALU control
lw
00
load word
XXXXXX
add
0010
sw
00
store word
XXXXXX
add
0010
beq
01
branch equal
XXXXXX
subtract
0110
R-type
10
add
100000
add
0010
subtract
100010
subtract
0110
AND
100100
AND
0000
OR
100101
OR
0001
set-on-less-than
101010
set-on-less-than
0111
Chapter 4 — The Processor — 8
Extend the Single-Cycle Processor
For each instruction, do we need
1. Any new or revised datapath element(s)?
2. Any new control signal(s)?
Then revise, if necessary,
1. Datapath: Add new elements or revise
existing ones, add new connections
2. Control Unit: Add/extend control signals,
extend the truth table
3. ALU Control: Extend the truth table
Chapter 1 — Computer Abstractions and Technology — 9
Support JAL
jal target
000011
address
31:26
25:0
PC = JumpAddr
R[31] = PC_plus_4
PC_plus_4 = PC+4
JumpAddr = PC_plus_4[31:28]
& Inst[25:0] & “00”
Chapter 1 — Computer Abstractions and Technology — 10
Support JAL
Make what changes to
the datapath?
Chapter 4 — The Processor — 11
Support JAL

Analyze the instruction execution


Writes register $ra ($31)
Update PC with jump target


Analyze datapath



This part already done for supporting J
Needs another input, fixed at 31, to “Write
register” port of register file
Needs another input, PC+4, to “Write data” port of
register file
Revise control


Add a “link” signal
The (main) control unit can tell it by reading the
opcode
Chapter 1 — Computer Abstractions and Technology — 12
SCPv1 + JAL
Revises the two muxes
• Add another input
• Extend the select signals
Alternatively, use extra mux
Chapter 4 — The Processor — 13
Control Signals

Control signals for the nine-instruction
implementation
Inst
RegDst
ALUSrc
MemtoReg
RegWrite
Mem
Read
Mem
Write
Branc
h
ALUO
p1
ALUO
p0
Jump
R-
1
0
0
1
0
0
0
1
0
0
lw
0
1
1
1
1
0
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
0
beq
X
0
X
0
0
0
1
0
1
0
j
X
X
X
0
0
0
0
X
X
1
Link
jal
• Add a new row for jal
• Extend RegDst
• Add a control line link Chapter 1 — Computer Abstractions and Technology — 14
Control Signals

Control signals for the nine-instruction
implementation
Inst
RegDst
ALUSrc
MemtoReg
RegWrite
Mem
Read
Mem
Write
Branc
h
ALUO
p1
ALUO
p0
Jump
Link
R-
1
0
0
1
0
0
0
1
0
0
0
lw
0
1
1
1
1
0
0
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
0
0
beq
X
0
X
0
0
0
1
0
1
0
0
j
X
X
X
0
0
0
0
X
X
1
0
jal
0
X
0
1
0
0
X
X
X
1
1
• Extend control input to RegDst Mux: RegDst & Link
• Extend control input to MemtoReg Mux: MemtoReg & Link
Chapter 1 — Computer Abstractions and Technology — 15
Simple Pipeline

Add pipeline registers hold information
produced in each cycle
Chapter 4 — The Processor — 16
Pipelined Control
Chapter 4 — The Processor — 17
Hazards

Situations that prevent starting the next
instruction safely in the next cycle


Structure hazards


A required resource is busy
Data hazard


The simple pipeline won’t work correctly
Need to wait for previous instruction to
complete its data read/write
Control hazard

Deciding on control action depends on
previous instruction
Chapter 4 — The Processor — 18
Data Hazards
Program with data dependence
sub
and
or
add
sw
$2, $1,$3
$12,$2,$5
$13,$6,$2
$14,$2,$2
$15,100($2)
Program with control dependence
beq $1, $3, +4
addi $2, $2, 1
addi $4, $4, 1
Chapter 1 — Computer Abstractions and Technology — 19
Data Forwarding
sub
and
or
add
sw
$2, $1,$3 # MEM=>EX forwarding
$12,$2,$5 # WB =>EX forwarding
$13,$6,$2
$14,$2,$2
$15,100($2)
IF
ID
EX
MEM
WB
or
and
sub
…
…
add
or
and
sub
…
AND gets forwarded
new $2 value
sw
add
or
and
sub
SUB gets forwarded
new $2 value
Chapter 1 — Computer Abstractions and Technology — 20
Data Forwarding Paths
Chapter 4 — The Processor — 21
Detecting the Need to Forward

Input
 rs and rt from EX
 rd and RegWrite from MEM
 rd and RegWrite from WB

Output


FwdA, FwdB
Caveats



Check RegWrite
Check if rd = 0
Forwarding from MEM wins over WB
Review slides and textbook for details
Chapter 4 — The Processor — 22
Load-Use Data Hazard
lw $s0, 20($t1)
sub $t2, $s0, $t3
Can’t always avoid stalls by forwarding
Must stall pipeline by one cycle

Chapter 4 — The Processor — 23
Datapath with Hazard Detection
Chapter 4 — The Processor — 24
Hazard Detection Unit

Input



rs and rt from ID
rt and MemRead from EX
Output


PCWrite, IF/IDWrite (0 for holding instructions)
Select signal to a MUX to insert bubble in EX
Read slides/textbook for details
Chapter 4 — The Processor — 25
Pipeline Stall

The nop has all control signals set to zero


It does nothing at EX, MEM and WB
Prevent update of PC and IF/ID register



Using instruction is decoded again (OK)
Following instruction is fetched again (OK)
1-cycle stall allows MEM to read data for lw

Can subsequently forward from WB to EX
Chapter 4 — The Processor — 26
Code Scheduling to Avoid Stalls


Reorder code to avoid use of load result in
the next instruction
C code for A = B + E; C = B + F;
stall
stall
lw
lw
add
sw
lw
add
sw
$t1,
$t2,
$t3,
$t3,
$t4,
$t5,
$t5,
0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)
13 cycles
lw
lw
lw
add
sw
add
sw
$t1,
$t2,
$t4,
$t3,
$t3,
$t5,
$t5,
0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)
11 cycles
Chapter 4 — The Processor — 27
Control Hazards

Branch determines flow of control



Two branch outcomes: Taken or Not-Taken
The CPU doesn’t recognize a branch until
it reaches the end of the ID stage
Every cycle, the CPU has to fetch one
instruction
Chapter 4 — The Processor — 28
Control Hazards

The MIPS pipeline in textbook always
predict “not-taken”




Pipeline flush on every taken branch
OK to flush because mis-fetched instructions
don’t write to register/memory
But this incurs pipeline bubbles (performance
penalty)
The revised MIPS pipeline move branch
comparison to the ID stage



Doable for BEQ and BNE
Reduce pipeline bubbles from 3 to 1 per taken
branch
Complicate data forwarding and hazard detection
Chapter 4 — The Processor — 29
Revised MIPS Pipeline
Chapter 4 — The Processor — 30
Revised MIPS Pipeline
Note: Branch does nothing in EX, MEM and WB
Chapter 4 — The Processor — 31
Performance Penalty

Any pipeline bubbles?
loop:
addi $1, $1, -1
lw
$1, addr
add
$4, $5, $6
add $4, $5, $6
beq
$1, $zero, loop
beq $1, $4, target
Chapter 1 — Computer Abstractions and Technology — 32
Delayed Branch
Delayed branch may remove the one-cycle stall
The instruction right after the beq is executed no
matter the branch is taken or not (sub instruction
in the example)
 Alternatingly saying, the execution of beq is
delayed by one cycle
sub $10, $4, $8
beq $1, $3, 7
beq $1, $3, 7 => sub $10, $4, $8
and $12, $2, $5
and $12, $2, $5
Must find an independent instruction, otherwise



May have to fill in a nop instruction, or
Need two variants of beq, delayed and not delayed
Chapter 1 — Computer Abstractions and Technology — 33
Other Topics
Exception handling
 Multi-issue pipeline
Those topics will be covered in the final
exam


Exam 2 will NOT cover them
Chapter 1 — Computer Abstractions and Technology — 34
Download