CS3350B Computer Architecture Quiz 1 January 21, 2016 Student

advertisement
CS3350B Computer Architecture
Quiz 1
January 21, 2016
Student ID number:
Student Last Name:
Exercise 1. [10 points] The following statements are either [T]rue or [F]alse.
Indicate T in [ ] if the statement is true, otherwise F in [ ].
1.1 [T] Design for Moore’s law, using abstraction, parallelism and memory hierarchy are great ideas in pursuing performance.
1.2 [F] Clock rate does not affect CPU execution time.
1.3 [F] Power wall is not one of the reasons why we cannot improve the performance of uniprocessors further.
1.4 [T] When we use perf to profile our programs, it can calculate cycles and
cache-misses.
1.5 [T] We use memory hierarchy because we want fast memory access.
1.6 [F] When a cache miss happens, the access to a lower level of memory
hierarchy for fetching the data is faster compared to the access to the current
level.
1.7 [T] We consider memory stall cycles in the CPU execution time because the
CPU spends waiting for the memory system.
1.8 [F] If a processor has a L1 cache, it is always better attaching a L2 cache
than without a L2 cache, in terms of AMAT (average memory access time).
1.9 [F] Cold misses can be completely avoided if we increase block size.
1.10 [F] Conflict misses can be avoided by increasing associativity but not cache
size.
1
Exercise 2. [10 points] Consider the following C code which computes the sum
of the elements of a three-dimensional array a[].
int sumarray3d(int a[M][N][N]) {
int i, j, k, sum = 0;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
for (k = 0; k < M; k++)
sum += a[k][i][j];
return sum;
}
2.1 Does this function in C have good locality (temporal or spatial)? If so,
identify which variable has what kind of locality. Otherwise, explain why
this code doesn’t have good locality.
sum has temporal locality. [3]
a[] doesn’t have any good locality, since it isn’t accessed in a consecutive
manner. [2]
2
2.2 Can you permute the loops so that the function scans the 3D array a[] with
a stride-1 reference pattern (and thus has good spatial locality)?
for
for
for
sum
(k
(i
(j
+=
= 0; k < M; k++) [3]
= 0; i < N; i++) [1]
= 0; j < N; j++) [1]
a[k][i][j];
3
Exercise 3. [10 points] Consider we have a cache with size of 1K (= 210 ) words.
Recall that a memory address has t-bits for tag, s-bits for set index and b-bits for
block offset. We assume that memory addresses are 32-bit and that last 2 bits are
used for the byte offset.
3.1 We implement this cache as a direct-mapped, one-word cache. Given a
memory address, how many bits are for tag, set and block offset, respectively?
20 bits for tag [1]
10 bits for set [2]
0 bits for block offset [2]
3.2 We implement this cache as a direct-mapped, four-word cache. Given a
memory address, how many bits are for tag, set and block offset, respectively?
20 bits for tag [1]
8 bits for set [2]
2 bits for block offset [2]
4
Exercise 4. [20 points] In this exercise, we consider a direct-mapped cache memory where each cache block holds two words. We assume that each word is a one
byte and that each memory address is 4-bit number where
• the first 2 bits (from left to right) are the tag bits;
• the third bit is the set address (index), and
• the last bit is the offset from the beginning of the block.
We assume that the following words are accessed in sequence, according to
the following access pattern (from left to right):
Table 1: Sequence of accessed words
Word number:
0
1
3
2
4
3
5
Memory address: 0000 0001 0011 0010 0100 0011 0101
15
1111
We start with an empty cache and all blocks initially marked as not valid.
(Valid bits are not shown on the pictures below.)
Table 2: Initially, the cache is empty.
set tag
block
0
1
4.1 Use Table 3 to Table 10 to depict the contents of the cache when the processor requests the 8 words, in sequence, as specified in Table 1. For each
request of the processor to the cache, indicate whether this is a cache miss
or cache hit.
4.2 Calculate the cache miss rate.
miss rate = 4 / 8 = 50% [2]
5
set tag
0 00
1
set tag
0 00
1
Table 3: Accessing word number 0
block [1]
Hit / Miss [1]
Memory(1)
Memory(0)
Miss
Table 4: Accessing word number 1
block [1]
Hit / Miss [1]
Memory(1)
Memory(0)
Hit
set tag
0 00
1 00
Table 5: Accessing word number 3
block [1]
Hit / Miss [1]
Memory(1)
Memory(0)
Memory(3)
Memory(2)
Miss
set tag
0 00
1 00
Table 6: Accessing word number 2
block [1]
Hit / Miss [1]
Memory(1)
Memory(0)
Memory(3)
Memory(2)
Hit
Table 7: Accessing word number 4
set tag [1]
block [1]
0
01
Memory(5)
Memory(4)
1
00
Memory(3)
Memory(2)
Hit / Miss [1]
Miss
set tag
0 01
1 00
Table 8: Accessing word number 3
block [1]
Hit / Miss [1]
Memory(5)
Memory(4)
Memory(3)
Memory(2)
Hit
set tag
0 01
1 00
Table 9: Accessing word number 5
block [1]
Hit / Miss [1]
Memory(5)
Memory(4)
Hit
Memory(3)
Memory(2)
Table 10: Accessing word number 15
set tag [1]
block [1]
Hit / Miss [1]
0
01
Memory(5)
Memory(4)
1
11
Memory(15)
Memory(14)
Miss
6
Download