Spring 2015 – Exam 3

advertisement
CPSC 3300 – Spring 2015 – Exam 3 – No Calculators
Name: ___________________
1. Consider the block diagram (high-level circuit) below showing the two-dimensional organization of a DRAM.
(credit: nl.wikipedia.org)
If the same address lines (A) are supplied to both the row decoder and the column decoder, what is the purpose
of the RAS and CAS control signals? (6 pts.)
2. Identify at least three differences between DRAM and SRAM. (15 pts.)
3. Consider a computer system that contains a single-level cache with 1-nsec access time and a main memory
with 50-nsec access time. If the hit rate is 80% and a cache miss requires both the initial cache access that
determines the hit/miss and the subsequent memory access, what is the average memory access time (AMAT)?
(6 pts.)
4. Consider the memory access pattern graph below. Each triangle represents a data access and each oval
represents and instruction fetch. (6 pts. each)
address
time
(a) What is the name of the memory access behavior that the depicted data accesses (triangles) represent?
(b) What is the name of the memory access behavior that the depicted instruction fetches (ovals) represent?
(c) Would data accesses to array elements in sequence (i.e., a[0], a[1], …) look more like (a) or (b)?
(d) Which access pattern, (a) or (b), is the reason why a cache line should contain multiple memory words?
5. C stores matrices in row-major order. Which of these two program segments in C will be faster for large values
of N? Explain your choice. (10 pts.)
sum = 0;
for(i=0; i<N; i++){
for(j=0; j<N; j++){
sum = sum + c[i][j];
}
}
sum = 0;
for(j=0; j<N; j++){
for(i=0; i<N; i++){
sum = sum + c[i][j];
}
}
6. Consider a 4 GB byte-addressable main memory with a level-1 data cache that is two-way set-associative,
32 KB in size, and has a 16-byte line size.
(a) How many total lines are there in cache? (not just per bank) (3 pts.)
(b) How many lines are there in bank? (3 pts.)
(c) Show how the main memory address is partitioned into fields for the cache access and give the bit lengths
of those fields. (9 pts.)
7. Assume a 256-byte main memory and a four-line cache with two bytes per line. The cache is initially empty.
For the byte address reference stream given below, circle which of the references are hits for the different
cache placement schemes. Also, show the final contents of the cache. (The byte addresses are in decimal.)
(a) 8-byte cache, direct-mapped, two bytes per line (12 pts.)
0,
12,
1,
2,
12,
3,
4,
12,
5,
6,
12, 7
(b) 8-byte cache, two-way set associative with LRU replacement, two bytes per line (12 pts.)
0,
12,
1,
2,
12,
3,
4,
12,
5,
6,
12, 7
XC. The Intel Sandy Bridge microarchitecture contains a level-1 data cache that has two hardware prefetchers.
(Descriptions taken from Intel® 64 and IA-32 Architectures Optimization Reference Manual, April 2012.)

Data cache unit (DCU) prefetcher. This prefetcher, also known as the streaming prefetcher, is triggered by
an ascending access to very recently loaded data. The processor assumes that this access is part of a
streaming algorithm and automatically fetches the next line.

Instruction pointer (IP)-based stride prefetcher. This prefetcher keeps track of individual load instructions.
If a load instruction is detected to have a regular stride, then a prefetch is sent to the next address which
is the sum of the current address and the stride. This prefetcher can prefetch forward or backward and
can detect strides of up to 2K bytes.
(a) What type of data memory access patterns would benefit from the first (streaming) prefetcher? Give a
specific example in C (or C-like pseudocode). (up to 5 pts.)
(b) What type of data memory access patterns would benefit from the second (stride) prefetcher? Give a
specific example in C (or C-like pseudocode). (up to 5 pts.)
Download