CS3350B Computer Architecture Quiz 1 January 21, 2016 Student ID number: Student Last Name: Exercise 1. [10 points] The following statements are either [T]rue or [F]alse. Indicate T in [ ] if the statement is true, otherwise F in [ ]. 1.1 [T] Design for Moore’s law, using abstraction, parallelism and memory hierarchy are great ideas in pursuing performance. 1.2 [F] Clock rate does not affect CPU execution time. 1.3 [F] Power wall is not one of the reasons why we cannot improve the performance of uniprocessors further. 1.4 [T] When we use perf to profile our programs, it can calculate cycles and cache-misses. 1.5 [T] We use memory hierarchy because we want fast memory access. 1.6 [F] When a cache miss happens, the access to a lower level of memory hierarchy for fetching the data is faster compared to the access to the current level. 1.7 [T] We consider memory stall cycles in the CPU execution time because the CPU spends waiting for the memory system. 1.8 [F] If a processor has a L1 cache, it is always better attaching a L2 cache than without a L2 cache, in terms of AMAT (average memory access time). 1.9 [F] Cold misses can be completely avoided if we increase block size. 1.10 [F] Conflict misses can be avoided by increasing associativity but not cache size. 1 Exercise 2. [10 points] Consider the following C code which computes the sum of the elements of a three-dimensional array a[]. int sumarray3d(int a[M][N][N]) { int i, j, k, sum = 0; for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (k = 0; k < M; k++) sum += a[k][i][j]; return sum; } 2.1 Does this function in C have good locality (temporal or spatial)? If so, identify which variable has what kind of locality. Otherwise, explain why this code doesn’t have good locality. sum has temporal locality. [3] a[] doesn’t have any good locality, since it isn’t accessed in a consecutive manner. [2] 2 2.2 Can you permute the loops so that the function scans the 3D array a[] with a stride-1 reference pattern (and thus has good spatial locality)? for for for sum (k (i (j += = 0; k < M; k++) [3] = 0; i < N; i++) [1] = 0; j < N; j++) [1] a[k][i][j]; 3 Exercise 3. [10 points] Consider we have a cache with size of 1K (= 210 ) words. Recall that a memory address has t-bits for tag, s-bits for set index and b-bits for block offset. We assume that memory addresses are 32-bit and that last 2 bits are used for the byte offset. 3.1 We implement this cache as a direct-mapped, one-word cache. Given a memory address, how many bits are for tag, set and block offset, respectively? 20 bits for tag [1] 10 bits for set [2] 0 bits for block offset [2] 3.2 We implement this cache as a direct-mapped, four-word cache. Given a memory address, how many bits are for tag, set and block offset, respectively? 20 bits for tag [1] 8 bits for set [2] 2 bits for block offset [2] 4 Exercise 4. [20 points] In this exercise, we consider a direct-mapped cache memory where each cache block holds two words. We assume that each word is a one byte and that each memory address is 4-bit number where • the first 2 bits (from left to right) are the tag bits; • the third bit is the set address (index), and • the last bit is the offset from the beginning of the block. We assume that the following words are accessed in sequence, according to the following access pattern (from left to right): Table 1: Sequence of accessed words Word number: 0 1 3 2 4 3 5 Memory address: 0000 0001 0011 0010 0100 0011 0101 15 1111 We start with an empty cache and all blocks initially marked as not valid. (Valid bits are not shown on the pictures below.) Table 2: Initially, the cache is empty. set tag block 0 1 4.1 Use Table 3 to Table 10 to depict the contents of the cache when the processor requests the 8 words, in sequence, as specified in Table 1. For each request of the processor to the cache, indicate whether this is a cache miss or cache hit. 4.2 Calculate the cache miss rate. miss rate = 4 / 8 = 50% [2] 5 set tag 0 00 1 set tag 0 00 1 Table 3: Accessing word number 0 block [1] Hit / Miss [1] Memory(1) Memory(0) Miss Table 4: Accessing word number 1 block [1] Hit / Miss [1] Memory(1) Memory(0) Hit set tag 0 00 1 00 Table 5: Accessing word number 3 block [1] Hit / Miss [1] Memory(1) Memory(0) Memory(3) Memory(2) Miss set tag 0 00 1 00 Table 6: Accessing word number 2 block [1] Hit / Miss [1] Memory(1) Memory(0) Memory(3) Memory(2) Hit Table 7: Accessing word number 4 set tag [1] block [1] 0 01 Memory(5) Memory(4) 1 00 Memory(3) Memory(2) Hit / Miss [1] Miss set tag 0 01 1 00 Table 8: Accessing word number 3 block [1] Hit / Miss [1] Memory(5) Memory(4) Memory(3) Memory(2) Hit set tag 0 01 1 00 Table 9: Accessing word number 5 block [1] Hit / Miss [1] Memory(5) Memory(4) Hit Memory(3) Memory(2) Table 10: Accessing word number 15 set tag [1] block [1] Hit / Miss [1] 0 01 Memory(5) Memory(4) 1 11 Memory(15) Memory(14) Miss 6