Spring 2015 – Exam 2

advertisement
CPSC 3300 – Spring 2015 – Exam 2
Name: ______________________
1. Matching. Write the correct term from the list into each blank. (2 pts. each)
control signal
control store
hardwired
microprogrammed
structural hazard
data hazard
control hazard
forwarding
(a) _____________________ when hardware cannot support the combination of instructions we want to
execute in the same clock cycle
(b) _____________________ providing a data value to any unit where it is needed after the data value has
been produced but before it is available in the register file
(c) _____________________ value used for selecting a mux input or selecting the operation of a functional
unit
True/False. Circle T or F. (2 pts. each)
2. T / F A microprogrammed control unit is typically faster than a hardwired control unit.
3. T / F Predict-untaken is easier to implement than predict-taken since a branch target address is not needed.
4. T / F Branch prediction combined with speculative execution requires some form of branch misprediction
recovery hardware, such as a branch history shift register (BHSR).
5. T / F In a VLIW compute system, dependency checking between instructions is done by the hardware.
❶
❷
❸
❸
6. Consider the MIPS “subtract” instruction as implemented on the single-cycle datapath above (Figure 4.2 from
textbook):
subtract R3, R1, R2 // Reg[3] <- Reg[1] - Reg[2]
Circle the correct value 0 or 1 for the control signals (a-d) and circle whether each of the three muxes (e-g)
selects its upper input, lower input, or don't care. For the ALU operation (h) circle one of the function names.
(The Zero condition signal will be assumed to be 0.) (8 pts.)
(a) Branch
(b) MemRead
(c) MemWrite
(d) RegWrite
=
=
=
=
0
0
0
0
1
1
1
1
(e) Mux1 (upper left; output to PC)
= upper, lower, don't care
(f) Mux2 (upper middle; output to Data port of Regs)
= upper, lower, don't care
(g) Mux3 (lower middle; output to bottom leg of ALU) = upper, lower, don't care
(h) ALU operation
= and, or, add, subtract, set-on-less-than, nor
7. Briefly explain what each stage does when the “subtract R3,R1,R2” instruction is executed by the five-stage pipeline.
(15 pts.)
8. The branch CPI penalty is calculated as extra CPI = (branch freq.)*(misprediction freq.)*(mispredict penalty).
Consider a five stage pipeline with static predict-untaken and where the branch target address and the branch
direction are resolved at the end of the EX stage. What is the mispredict penalty? (5 pts.)
9. Consider using a two-bit saturating counter for branch prediction. Assume the state is initialized to binary 00.
Each taken branch (T) increments the counter unless the state is already binary 11. Each untaken branch (U)
decrements the counter unless the state is already binary 00. What is the state of the predictor after the
branch sequence “T T T T U”? (4 pts.)
10. For a loop branch with the sequence of “T T T T U T T T T U”, give the accuracy of the following predictors.
(a) static predict taken (2 pts.)
(b) static predict untaken (2 pts.)
(c) dynamic predict using a one-bit history initialized to 0 (untaken = 0, taken = 1) (4 pts.)
(d) dynamic predict using a 2-bit saturating counter (see 9 above) initialized to 00. States 00 and 01 predict
untaken, and state 10 and 11 predict taken. (6 pts.)
11. Draw the data dependency diagram for the register data flow in the following MIPS code. Destination registers
are listed first. (16 pts.)
add r3, r1, r2
lw
r4, 8(r3)
sub r6, r4, r5
xor r7, r4, r6
// r3 <- r1 + r2
// r4 <- memory[r3+8]
// r6 <- r4 - r5
// r7 <- r4 ^ r6
12. For the MIPS instruction sequence given in question 11, show the pipeline cycle (“staircase”) diagram
for the standard 5-stage pipeline with forwarding. (9 pts.)
add r3, r1, r2
lw
r4, 8(r3)
sub
r6, r4, r5
xor
r7, r4, r6
13. Consider the following datapath. (Assume all registers are edge-triggered and thus immune from races.)
Control signal identifiers are given for the in and out control points of the registers. Additional control signals
include memory signals Mem, R (read), W (write), and 3-bit ALU function field F.
ALU functions (three-bit F field)
--------------------------------000: C = A + B
100: C = A - B
001: C = A
101: C = not A
010: C = A + 1
110: C = A - 1
011: C = A << 1
111: C = A >> 1
Complete the step-by-step RTL and the control signal sequence to fetch and execute an add instruction
“add X”. Assume that the instruction is composed of two memory words: a one-word opcode followed
by a one-word address. Assume also that the address of the instruction is in the PC, and that the memory
memory is word-addressable. The actions of the instruction are ACC <- ACC + memory[X], for the memory
address X given in the second word of the instruction. (15 pts.)
// fetch opcode and place in IR
MAR <- PC
PC <- PC + 1
MBR <- memory[MAR]
IR <- MBR
// control signals
5 (A=PC),
F=001 (C=A),
5 (A=PC),
F=010 (C=A+1),
Mem, R
1 (A=MBR),
F=001 (C=A),
10 (MAR=C)
11 (PC=C)
13 (IR=C)
XC. For the datapath depicted below, assume the following latencies (delays):
1 nsec to select and transfer a register value across the bus
2 nsec setup and hold time for temporary register W, Y, or Z (i.e., write into)
3 nsec setup and hold time for register file R0-R3 (i.e., write into)
4 nsec incrementer
6 nsec ALU operation
+------+
.-.
+-------------+
| R0 |<-->| |-->| incrementer |--.
+------+
| |
+-------------+ |
| R1 |<-->| |
v
+------+
| |
+-------+
| R2 |<-->| |<---------------|
W
|
+------+
| |
+-------+
| R3 |<-->| |
+------+
| |--------------------.
| |
|
| |
+-------+
|
| |-->|
Y
|
|
| |
+-------+
|
| |
v
v
| |
--------| |
\
\______/
/
| |
\
/
| |
\
ALU
/
| |
\__________/
bus | |
v
| |
+-------+
| |<---------|
Z
|
`-'
+-------+
There is one datapath action per clock cycle, according to these rules:
(1) a datapath action starts with a register value being placed on the bus and ends with a register being written
(2) the action may be merely transferring data from one register to another, or there may be a computation
performed in between the register accesses (e.g., increment or addition)
(3) the Y register cannot be used to pass through a value from the bus to the ALU in the same cycle (i.e., it cannot be
written and re-read in the same cycle)
(4) only one value may be placed on the bus in a given cycle
For example, one datapath action is W<-R0+1 and the path is “R0 to bus to incrementer to W”. There is no extra
cost to read from a register so the path delay is 1 + 4 + 2 = 7 nsec, where the bus select and transfer takes 1 nsec,
the incrementer takes 4 nsec, and writing into temporary register W takes 2 nsec.
Consider the paths and path delays across the range of possible datapath actions. What is the critical path that
limits the clock frequency? (5 pts.)
Download