CPSC 3300 Fall 2014 -- Final Exam Name: 1. Find the execution

CPSC 3300 Fall 2014 -- Final Exam Name: ____________________ 1. Find the execution time for a program that executes 4 billion instructions on a processor with an avg. CPI of 1.5 and a clock frequency of 2 GHz. (3 pts.) 2. What is the MIPS rate for the processor in question (1)? (3 pts.) 3. For the following instruction set workload and cycle values, find the average CPI. (1 pt.) type | freq cycles -------+-------------alu | 0.1 1 branch | 0.24 2 ld/st | 0.66 3 4. The compiler group proposes new optimizations that use only 80% of the instruction count as compared to the compiler used for question 3 and also produces a workload distribution as follows: type | freq cycles -------+-------------alu | 0.6 1 branch | 0.2 2 ld/st | 0.2 3 Would the computer running programs compiled with the optimized compiler be faster, and if so, what would be the speedup? (4 pts.) 5. What is the overall speedup if an enhancement with speedup of 8 can be used 4/5ths of the time? Express your answer as a fraction. (4 pts.) Arithmetic/Harmonic/Geometric. Circle one of A, H, or G, as applies. (1 pt. each) 6. A H G Used for averaging execution rates. 7. A H G Used for averaging execution times. 8. A H G Used for averaging by SPEC when reporting scores for benchmark suites. 9. Draw a circuit diagram with logic AND, OR, and NOT logic gates that implements a 2-to-1 multiplexer. (4 pts.) 10. Does a 2-to-1 multiplexer have an associated state diagram? If so, draw it. If not, explain why not. (4 pts.) 11. Fill in each blank with either “hardwired” or “microprogramming”. (1 pt. each) a) ______________________ uses a control store b) ______________________ typically produces a faster implementation c) ______________________ is more flexible and easier to change Multiple choice. Circle one response. (1 pt. each) 12. A modern microprocessor found in laptops, such as the Intel Core i5, is: a) totally hardwired b) totally microprogrammed c) a combination of hardwired and microprogrammed d) neither hardwired nor microprogrammed 13. A modern microprocessor found in laptops, such as the Intel Core i5, is: a) superscalar b) VLIW c) neither superscalar nor VLIW 14. What is the relationship between a superscalar processor and the compiler? a) Correct execution requires the compiler to carefully order the instructions. b) Performance can increase when the compiler carefully orders the instructions. c) Neither (a) nor (b). 15. What are the five stages in the standard pipeline we studied, and what action does each perform for the store instruction sw r1,4(r2), which implements the action memory[ reg[2] + 4 ] <- reg[1]? (8 pts.) 16. For the MIPS instruction sequence below, identify the dependencies in a data dependency diagram. (8 pts.) i1: i2: i3: i4: i5: i6: lw lw mul sub sw add r4, r5, r6, r8, r8, r1, 0( r1 ) 4( r1 ) r4, r5 r6, r7 8( r1 ) r1, r2 // // // // // // reg[4] <- memory[ reg[1] + 0 ] reg[5] <- memory[ reg[1] + 4 ] reg[6] <- reg[4] * reg[5] reg[8] <- reg[6] - reg[7] memory[ reg[1] + 8 ] <- reg[8] reg[1] <- reg[1] + reg[2] 17. Explain why gshare can be better than a Branch History Table of two-bit saturating counters for branch prediction. (3 pts.) 18. Consider a block diagram (high-level circuit) showing the two-dimensional organization of a RAM, and identify the components and signals required to access the RAM. Place the appropriate letter, a-j, of the correct component or signal in the blanks numbered 1-10. (0.5 pts. each) a) b) c) d) address data bits row decoder row buffer e) f) g) h) column decoder read/write control signal memory cell array sense/write circuitry i) column address strobe (CAS) j) row address strobe (RAS) 1) ______ 2) ______ +-+ +-------------+ 3) ______ --------------------->| |------>| | .--------/----->| | ... | 4K x 4K | | high bits | |------>| | 4) ______ --/--< +-+ +-------------+ | | | ... | | | +-------------+ | | 5) ______ |<-- 6) ______ | +-------------+ | +-+ | | ... | | | low bits | |-->+-------------+ `-------/---------->| |...| 8) ______ | 7) ______ ------------------------->| |-->+-------------+ +-+ ^ ^ |...| 9) ______ v v 10) ______ 19. Identify at least three difference between DRAM and SRAM. (6 pts.) 20. Define temporal locality. (3 pts.) 21. Define spatial locality. (3 pts.) 22. C stores matrices in row-major order. Which of these two program segments in C will be faster? Explain your choice. (4 pts.) sum = 0; for(i=0; i<N; i++){ for(j=0; j<N; j++){ sum = sum + c[i][j]; } } sum = 0; for(j=0; j<N; j++){ for(i=0; i<N; i++){ sum = sum + c[i][j]; } } 23. Explain why a set-associative cache is typically preferred to a direct-mapped cache, even though a direct-mapped cache has a slightly faster hit time. (3 pts.) 24. Consider a 4 GB byte-addressable main memory with a level 1 data cache that is three-way set-associative, 96 KB in size, and has a 32-byte line size. a) How many total lines are there in the data cache? (not just per bank) (1 pt.) b) How many lines are there in a bank? (1 pt.) c) Show how the main memory address is partitioned into fields for the cache access, and give the bit lengths of those fields. (6 pts.) 25. Assume a 256-byte main memory and a four-line cache with two bytes per line. The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for an 8-byte directmapped cache. Also, show the final contents of the cache. (The byte addresses are in decimal.) (6 pts.) 0, 9, 1, 2, 10, 3, 4, 11, 5, 6, 12, 7 26. Assume a 256-byte main memory and a four-line cache with two bytes per line. The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for an 8-byte fullyassociative cache with FIFO replacement. Also, show the final contents of the cache. (The byte addresses are in decimal.) (6 pts.) 0, 9, 1, 2, 10, 3, 4, 11, 5, 6, 12, 7 27. What is a “burst transfer” over the memory bus, and why is it useful for performing cache refills and write-backs? (5 pts.) XC-1. What is “false sharing”? (up to 7 pts.) True/False. (1 pt. each) XC-2. T / F XC-3. T / F XC-4. T / F The MESI cache coherency protocol requires four separate control bits for each cache line. Write update is a more popular cache coherency write policy than write invalidate. Load/store instructions can be used to access the memories in remote nodes in a cluster system.

CPSC 3300 Fall 2014 -- Final Exam Name: 1. Find the execution

Related documents

Products

Support

CPSC 3300 Fall 2014 -- Final Exam Name: 1. Find the execution

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib