CPSC 3300 – Fall 2013 – Final Exam Name: _______________________ No calculators or other electronics. Points add to 110, but you will be scored out of 100. 1. Give the CPU time equation in terms of clock frequency and define the terms you use. (4 pts.) 2. a) Find the execution time of a program that executes 10 billion instructions on a processor with an average CPI of 1.5 and a clock frequency of 3 GHz. (3 pts.) b) What is the MIPS value for the machine described in part (a)? (3 pts.) 3. Give the equation for Amdahl's law in terms of "f" and "s" and define the terms you use. (4 pts.) 4. What is the overall speedup if a performance enhancement with a ten times speedup can be used 80% of the time? Give your answer as a fraction. (4 pts.) 5. What fraction of time must a ten times speedup be used to obtain an overall speedup of five? Give your answer as a fraction. (5 pts.) 6. Determine the simplest logic expression for the values in this Kmap. (d is don’t care.) (3 pts.) \ BC A \ 00 01 11 10 +----+----+----+----+ 0 | 1 | 0 | d | d | +----+----+----+----+ 1 | 1 | d | 0 | 1 | +----+----+----+----+ F(A,B,C) = _____________ 7. Show the simplified logic expressions for F and G. (6 pts.) A B C | F G ----------+------0 0 0 | 1 0 0 0 1 | 1 0 0 1 0 | 0 1 0 1 1 | 0 1 1 0 0 | 1 0 1 0 1 | 1 1 1 1 0 | 1 0 1 1 1 | 1 1 8. Explain the difference between a modulo counter and a saturating counter. Give at least one example in which a modulo counter is preferred and at least one example in which a saturating counter is preferred. (4 pts.) 9. Draw the state diagram for a 2-bit modulo counter on the left below and a 2-bit saturating counter on the right below. For each diagram, use a single input signal, In, such that the counter counts down if In = 0 and counts up if In = 1. (4 pts.) 10. Choose one of the 2-bit counters from question 9 and implement it with D flip-flops and any necessary additional logic gates. (9 pts.) 11. Identify the pipeline stages for the standard five-stage pipeline and briefly state exactly what actions each pipeline stage performs for a store instruction. (8 pts.) sw r2, 0(r1) // memory[r1+0] <- r2 12. For the MIPS instruction sequence below, draw the data dependency graph and identify any data dependencies as RAW, WAR, or WAW. (5 pts.) lw add lw sub r2, 0( r1) r1, r1, r3 r4, 4( r2 ) r4, r5, r4 13. For the MIPS instruction sequence given in question 12, show the pipeline cycle (“staircase”) diagram for the standard 5-stage pipeline with forwarding. (6 pts.) lw r2, 0( r1) add r1, r1, r3 lw r4, 4( r2 ) sub r4, r5, r4 Associate each term or statement below with aspects of branching. Circle A or P, for Address or Prediction, respectively. Note that some questions may require both to be circled. (2 pts. each) 14. 15. 16. 17. A A A A / / / / P P P P BTAC BHT BTB gshare 18. Consider a block diagram (high-level circuit) showing the two-dimensional organization of a RAM, and identify the components and signals required to access the RAM. Place the appropriate letter, a-j, of the correct component or signal in the blanks numbered 1-10. (1 pt. each) a. b. c. d. e. address column decoder column address strobe (CAS) data bits memory cell array f. g. h. i. j. read/write control signal row decoder row buffer row address strobe (RAS) sense/write circuitry 1) ______ 2) ______ +-+ +-------------+ 3) ______ --------------------->| |------>| | .--------/----->| | ... | 4K x 4K | | high bits | |------>| | 4) ______ --/--< +-+ +-------------+ | | | ... | | | +-------------+ | | 5) ______ |<-- 6) ______ | +-------------+ | +-+ | | ... | | | low bits | |-->+-------------+ `-------/---------->| |...| 8) ______ | 7) ______ ------------------------->| |-->+-------------+ +-+ ^ ^ |...| 9) ______ v v 10) ______ 19. How does a cache exploit spatial locality? (4 pts.) 20. What is false sharing between multiprocessor cache, and how do you avoid it? (5 pts.) 21. Consider these possible steps in cache access: lookup using the index bits refill (read cache line from memory) route the correct line from one of the multiple banks selection of bytes from the line using the offset bits set the dirty bit set the valid bit tag match write cache line back to memory if dirty Write the steps in proper order for a read hit in a set-associative cache. (Note: some of the steps listed above may not be appropriate.) (4 pts.) 22. Assume a 256-byte main memory and a two-way set associative cache with four total lines (two per bank) and with four bytes per line. Replacement is LRU. The cache is initially empty. For the byte address reference stream given below circle which of the references are hits and show the final contents of the cache. (The byte addresses are in decimal.) (6 pts.) 4, 12, 5, 22, 6, 13, 7, 8, 14, 9, 23, 10 23. Some compilers align branch targets on cache lines, even if the preceding sequential block needs to be padded with no-op instructions to reach the end of the previous cache line. Explain the benefit of doing this. (5 pts.)