Lecture 11 - Pipelining Quiz

advertisement
Lecture 11: Pipelining and
Branch Prediction
EEN 312: Processors:
Hardware, Software, and
Interfacing
Department of Electrical and Computer Engineering
Spring 2014, Dr. Rozier (UM)
THE QUIZ SHOW!
Today’s class will be a quiz show
• We will be solving puzzles involving pipelining, branch
prediction, and the stack.
• Form up into groups of 8 individuals
• Points for correct solutions, the extra credit points awarded to
the top teams:
–
–
–
–
4 pts for 1st place
3 pts for 2nd place
2 pts for 3rd place
1 pt for 4th place
The Rules!
• Each group will elect a “buzzer” when the
buzzer raises his hand, your group will be
called on to solve the puzzle.
• One representative will be sent up per group.
They will give their answer and explain it.
• Once the buzzer has raised his hand, your
group must stop discussing the answer!
PIPELINING
Pipelining
• Assume r5 != r4
• Assume there is one memory for
instructions and data.
• During a cycle either data can be
loaded for an instruction OR an
instruction can be fetched, not
both.
(100) A structural hazard exists.
What is it?
str
ldr
cmp
beq
add
add
r0, [r1, #16]
r0, [r1, #8]
r5, r4
label
r5, r2, r4
r5, r5, r0
Pipelining
• Assume r5 != r4
• Assume there is one memory for
instructions and data.
• During a cycle either data can be
loaded for an instruction OR an
instruction can be fetched, not
both.
(200) Can this structural hazard be
eliminated by adding “bubbles”
to the pipeline in the form of
NOP instructions?
str
ldr
cmp
beq
add
add
r0, [r1, #16]
r0, [r1, #8]
r5, r4
label
r5, r2, r4
r5, r5, r0
Pipelining
• Assume r5 != r4
• Assume there is one memory for
instructions and data.
• During a cycle either data can be
loaded for an instruction OR an
instruction can be fetched, not
both.
(300) To guarantee forward
progress, how must this hazard be
resolved? In favor of data access, or
instruction fetching? Why?
str
ldr
cmp
beq
add
add
r0, [r1, #16]
r0, [r1, #8]
r5, r4
label
r5, r2, r4
r5, r5, r0
Pipelining
• Assume r5 != r4
• Assume there is one memory for
instructions and data.
• During a cycle either data can be
loaded for an instruction OR an
instruction can be fetched, not
both.
(400) Draw the 5-stage pipeline for
this code, assume the stages are:
Fetch, Decode, Execute, Memory,
Writeback.
What is the total execution time?
str
ldr
cmp
beq
add
add
r0, [r1, #16]
r0, [r1, #8]
r5, r4
label
r5, r2, r4
r5, r5, r0
Pipelining
• Assume r5 != r4
• Assume there is one memory for
instructions and data.
• During a cycle either data can be
loaded for an instruction OR an
instruction can be fetched, not
both.
(500) Assume we have a new
processor such that when the offset
is zero on a memory operation, the
Execute stage (ALU) can be skipped.
The MEM and EXECUTE can now be
overlapped in the pipeline. What
speedup is achieved with this new
architecture?
str
ldr
cmp
beq
add
add
r0, [r1, #0]
r0, [r10, #0]
r5, r4
label
r5, r2, r4
r5, r5, r0
DATA DEPENDENCIES
Data Dependencies
(100) Find all data
dependencies in this
sequence.
ldr
and
ldr
ldr
r1, [r1, #0]
r1, r1, r2
r2, [r1, #0]
r1, [r3, #0]
Data Dependencies
(200) Find all hazards in
this sequence, with and
without forwarding, for a
5-stage pipeline assume
the stages are:
Fetch, Decode, Execute,
Memory, Writeback.
ldr
and
ldr
ldr
r1, [r1, #0]
r1, r1, r2
r2, [r1, #0]
r1, [r3, #0]
Data Dependencies
(300) To reduce the clock cycle time,
we are considering a split of the
MEM stage into two stages.
Find all hazards in this sequence for
a 5-stage pipeline, with and without
forwarding, assume the stages are:
Fetch, Decode, Execute, Memory,
Writeback.
add
ldr
ldr
or
r1, r2, r1
r2, [r1, #0]
r1, [r1, #4]
r3, r1, r2
Data Dependencies
• Assume all data memory values
are 0’s.
• Assume:
–
–
–
–
r0 = 0
r1 = -1
r2 = 31
r3 = 1500
• Assume the processor has
forwarding logic for hazards.
(400) What value is the first one to
be forwarded, and what is the
value it overrides?
add
ldr
ldr
or
r1, r2, r1
r2, [r1, #0]
r1, [r1, #4]
r3, r1, r2
Data Dependencies
• Assume all data memory values
are 0’s.
• Assume:
–
–
–
–
r0 = 0
r1 = -1
r2 = 31
r3 = 1500
(500) The hazard detection unit assumes
forwarding was implemented, but
the processor designers, (UF
students) forgot to implement it!
What are the final register values?
What should they be?
Add NOPs to this sequence to ensure
correct execution despite UF’s screw
up!
add
ldr
ldr
or
r1, r2, r1
r2, [r1, #0]
r1, [r1, #4]
r3, r1, r2
BRANCH PREDICTION
Branch Prediction
(100) When building a branch prediction unit, define
for the following cases if the best choice is “branch
not taken” or “branch taken” for the prediction:
1.Branches associated with “If” statements
2.Branches associated with “Else if” statements
3.Branches associated with “Else” Statements
4.Branches associated with “For” Statements
Branch Prediction
(200) Design a dynamic branch predictor for if
statements and loops. Describe how to implement it
in hardware. What new hardware might it require?
Branch Prediction
•
•
Assume branch prediction is handled
by branch not taken.
Assume one element of the array at
r2 is equal to 100.
(300) How many times is the branch
predicted correctly versus
incorrectly?
00:
01:
LOOP:
02:
03:
04:
05:
LABEL:
06:
07:
08:
09:
10:
mov
mov
r1, #0
r2, #DEADBEEF
ldr
r3, [r2, r0 lsl 2]
cmp
r3, #100
beq LABEL
mov
r4, r3
add
r0, r0, #1
cmp
r0, #5
beq LOOP
mov
r0, r4
add
r0, r0, #1
Branch Prediction
•
•
•
•
•
•
Assume branch prediction is handled
by branch not taken.
Assume one element of the array at
r2 is equal to 100.
Assume the PC pipeline is three
instructions deep
Assume the PC pipeline can be
flushed in one cycle, and on a miss
prediction must be fully flushed.
Assume a pipeline with the phases:
Fetch, Decode, Issue, Execute,
Memory, and Writeback
Assume branches are evaluated in
the issue step, and the pipeline
flushed during execute
(400) How many cycles does the
loop take?
00:
01:
LOOP:
02:
03:
04:
05:
LABEL:
06:
07:
08:
09:
10:
mov
mov
r1, #0
r2, #DEADBEEF
ldr
r3, [r2, r0 lsl 2]
cmp
r3, #100
beq LABEL
mov
r4, r3
add
r0, r0, #1
cmp
r0, #5
beq LOOP
mov
r0, r4
add
r0, r0, #1
Branch Prediction
•
•
•
•
•
Assume branch prediction is handled
by branch not taken.
Assume the PC pipeline is three
instructions deep
Assume the PC pipeline can be
flushed in one cycle, and on a miss
prediction must be fully flushed.
Assume a pipeline with the phases:
Fetch, Decode, Issue, Execute,
Memory, and Writeback
Assume branches are evaluated in
the issue step, and the pipeline
flushed during execute
(500) Act as the compiler. Optimize
the code for branch not taken. How
many cycles does it take?
00:
01:
LOOP:
02:
03:
04:
05:
LABEL:
06:
07:
08:
09:
10:
mov
mov
r1, #0
r2, #DEADBEEF
ldr
r3, [r2, r0 lsl 2]
cmp
r3, #100
beq LABEL
mov
r4, r3
add
r0, r0, #1
cmp
r0, #5
beq LOOP
mov
r0, r4
add
r0, r0, #1
PROCESSOR ARCHITECTURE
Processor Architecture
(100) For a five stage pipeline with stages:
Fetch, Decode, Execute, Memory, and
Writeback, describe what happens in each
stage.
Processor Architecture
(200) Describe the purpose of a clock signal in a
processor. Why do processors need clock
signals?
Processor Architecture
(300) Describe how during the Decode phase
registers are selected from the register file.
How is this accomplished in hardware?
Processor Architecture
(400) Why must we allocate new registers in
the datapath for the writeback register instead
of reading it from the decode phase?
Processor Architecture
(500) Design a one bit full adder.
REPRESENTATION OF DATA
Representation of Data
(100) Describe the difference between big
endian and little endian representations.
Representation of Data
(200) Represent the following data in big
endian and little endian formats:
1. 00ac8eff
2. 54897743
3. be88fac8
Representation of Data
(300) Represent the following data as
hexadecimal numbers in big and little endian
formats. Assume unsigned integers
1. 128
2. 976
Representation of Data
(400) Represent the following data as
hexadecimal numbers in big and little endian
formats. Assume signed integers
1. -55
2. 99
Representation of Data
(500) Write assembly code which takes data
from one register in Big Endian format and
stores it in a new register in Little Endian
format.
You may use temporary registers.
FINAL QUESTION
Final Question
• Each team should decide an amount of points
to bid.
• Write down your bids on a sheet of paper and
hand them in.
• You will have only 60 seconds to answer the
next question as a team, write your answers
down by the time limit.
– Answer correctly and you will add your bid to your
score.
– Answer incorrectly and you will lose those points.
Final Question
In order to detect data hazards, new hardware
must be added. Assuming that the registers ids
involved in an instruction are available during
the decode stage, what hardware would be
necessary to check for data hazards?
WRAP UP
For next time
• Enjoy your spring break!
• Read Chapter 5, sections 5.1 – 5.3
Download