Spring 2015 -

advertisement
CPSC 330 Spring 2015 -- Final Exam -- No calculators.
Name: ____________________
1. For a processor with a 100 MHz clock frequency, what is the clock cycle time? (2 pts.)
2. Find the execution time for a program that executes 10 billion instructions on a processor with an avg.
CPI of 2.5 and a clock frequency of 2 GHz. (3 pts.)
3. What is the MIPS rate for the processor in question 2? (3 pts.)
4. Consider improvements to the processor in question 2. If the clock frequency can be increased to 3
GHz and the average CPI can be reduced to 1.25, what is the overall speedup? (3 pts.)
5. Using Amdahl's law, what is the overall speedup if a performance enhancement with a four times
speedup can be used 50% of the time? Give your answer as a fraction. (3 pts.)
6. Determine the simplest logic expression for the values in this Kmap. (x is don’t care.) (3 pts.)
\ BC
A \ 00
01
11
10
+----+----+----+----+
0 | 0 | 0 | 1 | x |
+----+----+----+----+
1 | 1 | x | 0 | 1 |
+----+----+----+----+
F(A,B,C) = _____________
7. Show the simplified logic expressions for D and E. (6 pts.)
A B C | D E
----------+------0 0 0 | 0 1
0 0 1 | 1 0
0 1 0 | 1 1
0 1 1 | 0 0
1 0 0 | 0 0
1 0 1 | 1 1
1 1 0 | 1 0
1 1 1 | 0 1
8. Consider this circuit. (image credit: gabrieljcs at stackexchange)
Give the logic expression for Y in terms of I0, I1, and S. If this is a standard component, identify it. (5 pts.)
9. Consider the following two bit branch predictor state diagram. (image credit: Diana Franklin, UCSB)
Give the state transition table where T=1 and NT=0. The output function should be 1 for Predict Taken
and 0 for Predict Not Taken. (6 pts.)
10. Analyze the accuracy of the predictor of question 9 for a loop branch with the following actual
behavior. Assume that the predictor starts in state 0. U = untaken and is equivalent to NT = not taken
above. Show the predictions, and indicate which ones are mispredictions. (5 pts.)
T
T
T
T
U
T
T
T
T
U
XC. A friend suggests you instead use a two-level adaptive branch predictor with a two-bit BHSR and a
four-entry PHT with each entry having one bit of history. Your friend says it can learn the alternating
TUTU… branching pattern, and once learned can predict with complete accuracy.
BHSR
UU
TU
UT
TU
UT
TU
UT
…
4-entry PHT1
U/U/U/U
T/U/U/U
T/U/U/U
T/T/U/U
T/T/U/U
T/T/U/U
T/T/U/U
prediction
U
U
U
U
T
U
T
actual
T
U
T
U
T
U
T
consequence
mispredict
mispredict
update PHT2
T/U/U/U
T/U/U/U
T/T/U/U
T/T/U/U
T/T/U/U
T/T/U/U
T/T/U/U
update BHSR3
TU
UT
TU
UT
TU
UT
TU
Analyze this predictor’s performance for a loop branch. (up to 10 pts.)
T
BHSR
UU
1
T
4-entry PHT
U/U/U/U
T
T
prediction
U
U
actual
T
T
T
T
consequence
mispredict
T
U
update PHT
T/U/U/U
update BHSR
TU
selected PHT entry is underlined, e.g. BHSR value of UU = index 0 into PHT, UT = index 1, TU = index 2, …
updated PHT entry is underlined; new value is actual branch direction
3 new bit in BHSR is underlined; BHSR is being right-shifted and incoming bit value is actual branch direction
2
11. Consider the following datapath. (image credit: Brian Shelburne) Assume all registers are edgetriggered and thus immune from races. Control signal identifiers are given for the in and out control
points of the registers. Additional control signals include memory signals Mem, R (read), W (write), and
3-bit ALU function field F.
ALU functions (three-bit F field)
--------------------------------000: C = A + B
100: C = A - B
001: C = A
101: C = not A
010: C = A + 1
110: C = A - 1
011: C = A << 1
111: C = A >> 1
Complete the step-by-step RTL and the control signal sequence to fetch and execute an add instruction
“store X”. Assume that the instruction is composed of two memory words: a one-word opcode followed
by a one-word address. Assume also that the address of the instruction is in the PC, and that the
memory is word-addressable. The actions of the instruction are memory[X] <- ACC, for the memory
address X given in the second word of the instruction. (10 pts.)
// fetch opcode and place in IR
MAR <- PC
PC <- PC + 1
MBR <- memory[MAR]
IR <- MBR
// control signals
5 (A=PC),
F=001 (C=A),
5 (A=PC),
F=010 (C=A+1),
Mem, R
1 (A=MBR),
F=001 (C=A),
12. Why are superscalar processors more popular than VLIW processors? (3 pts.)
10 (MAR=C)
11 (PC=C)
13 (IR=C)
13. For the MIPS instruction sequence below, show the data dependency diagram. (8 pts.)
ld r2,0(r1) // r2<-memory[r1+0]
ld r3,0(r2) // r3<-memory[r2+0]
ld r2,0(r4) // r2<-memory[r4+0]
14. Show the stairstep cycle diagram for the three instructions in question 13 on the five-stage scalar
pipeline we studied in class. Assume the use of forwarding. (6 pts.)
15. Define temporal locality. (4 pts.)
16. Define spatial locality. (4 pts.)
17. Explain why a set-associative cache is typically preferred to a direct-mapped cache, even though a
direct-mapped cache has a slightly faster hit time. (3 pts.)
18. What is a write-back operation for a cache? When does it happen? (4 pts.)
19. What is a “burst transfer” over the memory bus, and why is it useful for performing cache refills and
write-backs? (4 pts.)
20. Consider a word-addressed computer system with one-word instructions. If a program has an initial
sequential execution section and then a loop with instruction addresses as follows (in decimal):
.-----.
| 0-10|
`-----'
.---->|
| .-----.
| |11-30|
| `-----'
`-----'
two iterations
the instruction fetch address stream is:
0,1,...,9,10, 11,12,...,29,30, 11,12,...,29,30
\iteration 1/
\iteration 2/
Consider a 16-word direct-mapped instruction cache with 4 words per line. Determine the number of
cache hits and misses when executing the program above. Show also the final contents of the cache.
(15 pts.)
index
0
1
2
3
_________________
|v|tag|__/__/__/__|
|v|tag|__/__/__/__|
|v|tag|__/__/__/__|
|v|tag|__/__/__/__|
memory block mapping
0- 3, 16-19, ...
4- 7, 20-23, ...
8-11, 24-27, ...
12-15, 28-31, ...
Download