CPSC 3300 Fall 2014 -- Final Exam Name: 1. Find the execution

advertisement
CPSC 3300 Fall 2014 -- Final Exam
Name: ____________________
1. Find the execution time for a program that executes 4 billion instructions on
a processor with an avg. CPI of 1.5 and a clock frequency of 2 GHz. (3 pts.)
2. What is the MIPS rate for the processor in question (1)? (3 pts.)
3. For the following instruction set workload and cycle values, find the average
CPI. (1 pt.)
type
| freq cycles
-------+-------------alu
| 0.1
1
branch | 0.24
2
ld/st | 0.66
3
4. The compiler group proposes new optimizations that use only 80% of the
instruction count as compared to the compiler used for question 3 and also
produces a workload distribution as follows:
type
| freq cycles
-------+-------------alu
| 0.6
1
branch | 0.2
2
ld/st | 0.2
3
Would the computer running programs compiled with the optimized compiler be
faster, and if so, what would be the speedup? (4 pts.)
5. What is the overall speedup if an enhancement with speedup of 8 can be used
4/5ths of the time? Express your answer as a fraction. (4 pts.)
Arithmetic/Harmonic/Geometric. Circle one of A, H, or G, as applies. (1 pt. each)
6.
A
H
G
Used for averaging execution rates.
7.
A
H
G
Used for averaging execution times.
8.
A
H
G
Used for averaging by SPEC when reporting scores for benchmark suites.
9. Draw a circuit diagram with logic AND, OR, and NOT logic gates that implements
a 2-to-1 multiplexer. (4 pts.)
10. Does a 2-to-1 multiplexer have an associated state diagram? If so, draw it. If
not, explain why not. (4 pts.)
11. Fill in each blank with either “hardwired” or “microprogramming”. (1 pt. each)
a) ______________________ uses a control store
b) ______________________ typically produces a faster implementation
c) ______________________ is more flexible and easier to change
Multiple choice. Circle one response. (1 pt. each)
12. A modern microprocessor found in laptops, such as the Intel Core i5, is:
a) totally hardwired
b) totally microprogrammed
c) a combination of hardwired and microprogrammed
d) neither hardwired nor microprogrammed
13. A modern microprocessor found in laptops, such as the Intel Core i5, is:
a) superscalar
b) VLIW
c) neither superscalar nor VLIW
14. What is the relationship between a superscalar processor and the compiler?
a) Correct execution requires the compiler to carefully order the instructions.
b) Performance can increase when the compiler carefully orders the instructions.
c) Neither (a) nor (b).
15. What are the five stages in the standard pipeline we studied, and what action
does each perform for the store instruction sw r1,4(r2), which implements the
action memory[ reg[2] + 4 ] <- reg[1]? (8 pts.)
16. For the MIPS instruction sequence below, identify the dependencies in a data
dependency diagram. (8 pts.)
i1:
i2:
i3:
i4:
i5:
i6:
lw
lw
mul
sub
sw
add
r4,
r5,
r6,
r8,
r8,
r1,
0( r1 )
4( r1 )
r4, r5
r6, r7
8( r1 )
r1, r2
//
//
//
//
//
//
reg[4] <- memory[ reg[1] + 0 ]
reg[5] <- memory[ reg[1] + 4 ]
reg[6] <- reg[4] * reg[5]
reg[8] <- reg[6] - reg[7]
memory[ reg[1] + 8 ] <- reg[8]
reg[1] <- reg[1] + reg[2]
17. Explain why gshare can be better than a Branch History Table of two-bit
saturating counters for branch prediction. (3 pts.)
18. Consider a block diagram (high-level circuit) showing the two-dimensional
organization of a RAM, and identify the components and signals required to
access the RAM. Place the appropriate letter, a-j, of the correct component
or signal in the blanks numbered 1-10. (0.5 pts. each)
a)
b)
c)
d)
address
data bits
row decoder
row buffer
e)
f)
g)
h)
column decoder
read/write control signal
memory cell array
sense/write circuitry
i) column address strobe (CAS)
j) row address strobe (RAS)
1) ______
2) ______
+-+
+-------------+
3) ______ --------------------->| |------>|
|
.--------/----->| | ...
|
4K x 4K
|
|
high bits
| |------>|
|
4) ______ --/--<
+-+
+-------------+
|
| | ... | |
|
+-------------+
|
| 5) ______ |<-- 6) ______
|
+-------------+
|
+-+
| | ... | |
|
low bits
| |-->+-------------+
`-------/---------->| |...| 8) ______ |
7) ______ ------------------------->| |-->+-------------+
+-+
^
^
|...|
9) ______
v
v
10) ______
19. Identify at least three difference between DRAM and SRAM. (6 pts.)
20. Define temporal locality. (3 pts.)
21. Define spatial locality. (3 pts.)
22. C stores matrices in row-major order. Which of these two program segments in C
will be faster? Explain your choice. (4 pts.)
sum = 0;
for(i=0; i<N; i++){
for(j=0; j<N; j++){
sum = sum + c[i][j];
}
}
sum = 0;
for(j=0; j<N; j++){
for(i=0; i<N; i++){
sum = sum + c[i][j];
}
}
23. Explain why a set-associative cache is typically preferred to a direct-mapped
cache, even though a direct-mapped cache has a slightly faster hit time.
(3 pts.)
24. Consider a 4 GB byte-addressable main memory with a level 1 data cache that is
three-way set-associative, 96 KB in size, and has a 32-byte line size.
a) How many total lines are there in the data cache? (not just per bank) (1 pt.)
b) How many lines are there in a bank? (1 pt.)
c) Show how the main memory address is partitioned into fields for the cache
access, and give the bit lengths of those fields. (6 pts.)
25. Assume a 256-byte main memory and a four-line cache with two bytes per line.
The cache is initially empty. For the byte address reference stream (reads)
given below circle which of the references are hits for an 8-byte directmapped cache. Also, show the final contents of the cache. (The byte addresses
are in decimal.) (6 pts.)
0,
9,
1,
2,
10,
3,
4,
11,
5,
6,
12,
7
26. Assume a 256-byte main memory and a four-line cache with two bytes per line.
The cache is initially empty. For the byte address reference stream (reads)
given below circle which of the references are hits for an 8-byte fullyassociative cache with FIFO replacement. Also, show the final contents of the
cache. (The byte addresses are in decimal.) (6 pts.)
0,
9,
1,
2,
10,
3,
4,
11,
5,
6,
12,
7
27. What is a “burst transfer” over the memory bus, and why is it useful for
performing cache refills and write-backs? (5 pts.)
XC-1. What is “false sharing”? (up to 7 pts.)
True/False. (1 pt. each)
XC-2.
T / F
XC-3.
T / F
XC-4.
T / F
The MESI cache coherency protocol requires four separate control bits
for each cache line.
Write update is a more popular cache coherency write policy than write
invalidate.
Load/store instructions can be used to access the memories in remote
nodes in a cluster system.
Download