15-740/18-740 Computer Architecture Homework 4 Due Monday, October 17, at 12:00 PM

advertisement
15-740/18-740 Computer Architecture
Homework 4
Due Monday, October 17, at 12:00 PM
1. The Reorder Buffer
The reorder buffer (ROB) plays a key role in ensuring in-order execution semantics are exposed to the programmer despite the underlying hardware possibly executing instructions out of program order. In this problem, you
will perform the tasks of the ROB.
(a) Say a program consists of the following instruction stream:
Assume:
EY
0x0040: LD r0 ← [0xABCD]
0x0044: ADD r1 ← r2, r3
• A LD takes 3 cycles to execute and an ADD takes 1 cycle to execute.
• The LD and ADD can write their results to the ROB on the same cycle they are ready (i.e., a bypass
path for both tag and data values are used in this system).
• It takes one cycle for the ROB to write data ready to be committed to architectural state to the architectural register file.
• The initial contents of the architectural register file is r0 = 5, r1 = 4, r2 = 3, and r3 = 2.
• The value at address 0xABCD is 1.
K
Fill in the contents of the architectural register file and a ROB over the next six cycles for a processor
which can issue 1 instruction per cycle. Here is the initial state to get you started:
Cycle 0:
Architectural register file
r0 5
r1 4
r2 3
r3 2
Valid
0
0
ROB
Done
-
1
InstructionAddr
-
ResultRegister
-
ResultValue
-
Cycle 1:
Architectural register file
r0 5
r1 4
r2 3
r3 2
Cycle 2:
Architectural register file
r0 5
r1 4
r2 3
r3 2
Cycle 3:
Architectural register file
r0 5
r1 4
r2 3
r3 2
Valid
1
0
Done
0
-
InstructionAddr
0x0040
-
ResultRegister
r0
-
ResultValue
-
ResultRegister
r0
r1
ResultValue
-
ResultRegister
r0
r1
ResultValue
5
ROB
Valid
1
1
Done
0
0
InstructionAddr
0x0040
0x0044
ROB
Valid
1
1
Done
0
1
InstructionAddr
0x0040
0x0044
EY
Cycle 4:
Architectural register file
r0 5
r1 4
r2 3
r3 2
ROB
Valid
1
1
Cycle 6:
Architectural register file
r0 1
r1 5
r2 3
r3 2
Done
1
1
InstructionAddr
0x0040
0x0044
ResultRegister
r0
r1
ResultValue
1
5
ResultRegister
r1
-
ResultValue
5
-
ResultRegister
-
ResultValue
-
ROB
Valid
1
0
K
Cycle 5:
Architectural register file
r0 1
r1 4
r2 3
r3 2
ROB
Valid
0
0
Done
1
-
InstructionAddr
0x0044
-
ROB
Done
-
InstructionAddr
-
(b) Now assume that on a system without an ROB an exception occurs when the ADD is executed. What will
the contents of the architectural register file be at the time of the exception (when it is exposed to the
programmer)? Why is this bad?
The contents will be the same as the initial state. This is bad because the programmer will not know if
the LD executed and stored its value in architectural register r0, making programs harder to debug.
2. Out-of-Order Execution
Out-of-order processors make efficient use of their functional units by executing instructions according to the
flow of data between them.
(a) In class, we learned about a graphical way of showing the data dependencies (edges) of instructions (vertices) using a data flow graph. For the instruction stream below, draw the corresponding data flow graph.
2
ADD
MUL
ADD
DIV
ADD
r3
r4
r5
r6
r7
←
←
←
←
←
r1,
r1,
r8,
r1,
r6,
r2
r3
r9
r3
r5
r1
r2
r8
r9
r3
r4
r6
r5
r7
EY
(b) Of course, while a data flow graph is a helpful way of visualizing what goes on in an out-of-order processor,
real machines execute instructions using multiple pipeline stages. Take the code from above and find the
number of cycles it would take to execute on a processor with in-order fetch and out-of-order dispatch.
Assume the following:
K
• It takes one cycle to insert an instruction into a reservation station and another cycle to schedule the
instruction from the reservation station to an execution unit and these stages can be pipelined.
• Tags are broadcast one cycle before their corresponding operation finishes executing.
• A single tag broadcast bus exists and if more than one reservation station entry to the same execution
unit is ready to be dispatched, the older reservation station entry will use the bus and the younger
ones will have to stall. Similarly, if more than one instruction contends for writing to the ROB or
architectural register file during a single cycle, the oldest one receives access and the younger ones
must stall.
• ADDs take 3 cycles and are pipelined, MULs take 5 cycles and are pipelined, and DIVs take 7 cycles
and are pipelined.
18 cycles:
ADD
MUL
ADD
DIV
ADD
1 2 3 4 5 6 7 8 9 1
0
F D D E E E R W
F D D - - E E E E
F D D E E E R F D D E E E E
F D D - - -
1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8
E
E
-
R
E
-
W
- W
E R W
- E E E R W
(c) How many cycles will the program take to execute on an in-order fetch and in-order dispatch machine,
under the same assumptions as above?
20 cycles:
ADD
MUL
ADD
DIV
ADD
1 2 3 4 5 6 7 8 9 1
0
F D D E E E R W
F D D - - E E E E
F D - - D E E E
F - - D D E E
F D D -
1 1 1 1 1 1 1 1 1 2
1 2 3 4 5 6 7 8 9 0
E
R
E
-
R
E
-
W
- W
E E E R W
- - - E E E R W
3
Download