CSci6461al Lancaster Homework Set 3 Summer 2012 1. If α is the fraction of a program p’s code that can be executed simultaneously by n processors in a computer system and the remaining code must be executed sequentially by a single processor, along with the fact that each processor has an execution rate of x million instructions per second. Then a. Derive an expression for the effective MIPS rate when using this system for the execution of this program in terms of x, n and α. (10 points) MIPS rate = [n α + (1 – α)] x = (n α – α + 1)x b. If n =16 and x = 4 MIPS, determine the value of α that will yield a system performance of 34 MIPS. (15 points) α = .5 1 CSci6461al Lancaster Homework Set 3 Summer 2012 2. Directory protocols are more scalable than snooping protocols because they send explicit request and invalidate messages to those nodes that have copies of a block, while snooping protocols broadcast all requests and invalidates to all nodes. Consider the 16processor system illustrated in Figure 4.42 and assume that all caches not shown have invalid blocks. For each of the sequences below, identify which nodes receive each request and invalidate. (15 Points) a. [10] <4.4> P1: write 120 <-- 80 b. [10] <4.4> P1: write 110 <-- 88 c. [10] <4.4> P15: write 118 <-- 90 d. [10] <4.4> P15: write 108 <-- 98 a. P1: write 120 <-- 80 Send invalidate to P15 b. P1: write 110 <-- 88 Send fetch/invalidate to P0 c. P15: write 118 <-- 90 Send invalidate to P1 d. P15: write 108 <-- 98 Send invalidate to P0 2 CSci6461al Lancaster Homework Set 3 Summer 2012 3. Using what we know about caches and the principals of special and temporal locality, optimize the following code. For all techniques that you use to optimize, tell what technique is being done and briefly how it was done. (15 Points) int x[1000]; double y[1000]; //assume these two lines //do not translate into code x[0]=1; x[1]=1; for (i=2; i<1000; i++){ x[i] = x[i-1]+x[i-2]; } for (i=0; i<1000; i++){ if (i>0) { y[i] = double(x[i])/double(x[i-1]); } else { y[i] = .61803; } } a. put x and y in a struct and make an array of structs b. combine the two loops, going from i=2 to i<1000 and do the first two iterations before the loop c. remove the else and set y[0] before the loop 4. Research the MESI cache protocol. Draw a diagram with the 4 states and describe each. Tell how each state compares with the basic protocol covered in the lecture, that is, associate the Invalid state of each and note how they are the same. Do the same comparison for the other states. Now also explain why the extra state was added, that is what is the benefit of the additional state. (15 points) The web has many references to MESI. The addition of the fourth state helps us reduce the bus traffic by not sending invalidates unless necessary. Most of the other states map directly. 3 CSci6461al Lancaster Homework Set 3 Summer 2012 4 CSci6461al Lancaster Homework Set 3 Summer 2012 5 CSci6461al Lancaster Homework Set 3 Summer 2012 5. For a snooping cache implementation, complete the following matrix for the operations given. The spaces in the spreadsheet between operations are not necessarily an indication of the content that must be provided. (15 Points) 6 CSci6461al Lancaster Homework Set 3 Summer 2012 \ 6. For the same operations in a directory cache scheme, complete the matrix below.. Again the space in the matrix is no indication of the required space.(20 points) Matrix for Directory Problem 7 CSci6461al Lancaster Homework Set 3 Summer 2012 8 CSci6461al Lancaster Homework Set 3 Summer 2012 9