Computer Architecture HW#1 Solution Exercises 1.8 a. 請參考課本第17頁: Integrated circuit logic technology—Transistor density increases by about 35% per year, quadrupling somewhat over four years. Increases in die size are less predictable and slower, ranging from 10% to 20% per year. The combined effect is a growth rate in transistor count on a chip of about 40% to 55% per year, or doubling every 18 to 24 months. This trend is popularly known as Moore’s law. 由此可知 transistor density 和 transistor count on a chip的定義是不同的, 題目問的是 “transistor count on a chip” 𝟏𝟐𝟎 𝟏𝟐𝟎 答案可以是 𝟐 𝟏𝟖 𝒐𝒓 𝒙 𝟐𝟒 = approximately 101 or 32 或是 (1.4)10 or (1.55)10 = approximately 29 or 80 b. 3200 × (1.4)12 = approximately 181,420 這只是概略性的預測, 用其它年的資料去推算, 都算答對 c. 3200 × (1.01)12 = approximately 3605 同上題, 用其它年的資料去推算, 都算答對 d. Power density, which is the power consumed over the increasingly small area, has created too much heat for heat sinks to dissipate. This has limited the activity of the transistors on the chip. Instead of increasing the clock rate, manufacturers are placing multiple cores on the chip. 有寫出具體原因就算答對 e. Anything in the 15–25% range would be a reasonable conclusion based on the decline in the rate over history. As the sudden stop in clock rate shows, though, even the declines do not always follow predictions. 這只是概略性的預測, 只要合理就算答對. 1.12 a. 題目未說明computer壞了是否有人去維修, 也沒給MTTR, 這會影響到答案, 本題送分 b. There are several correct answers. One would be that, with the current system, one computer fails approximately every 5 minutes. 5 minutes is unlikely to be enough time to isolate the computer, swap it out, and get the computer back on line again. 10 minutes, however, is much more likely. In any case, it would greatly extend the amount of time before 1/3 of the computers have failed at once. Because the cost of downtime is so huge, being able to extend this is very valuable. 有寫出具體原因就算答對 c. $90,000 = (x + x + x + 2x)/4 $360,000 = 5x $72,000 = x 4th quarter = $144,000/hr 有題目有問4th quarter和rest of the year的average cost of downtime per hour, 兩個答案都要寫出來. 1.14 此大題題意有問題, percentage of vectorization 係指the percentage of time that could be spent using vector mode, 而非指percentage of the computation performed in vector mode, 但 是這樣一來, (b)小題和(c)小題問的答案變成都是在問percentage of the computation run time is spent in vector mode, 有點不合常理. 從教科書所附的解答來看: percentage of vectorization的意思是指percentage of the computation performed in vector mode. 用這兩種題意所算出來的答案都算對 a. See Figure S.1. b. 2 = 1/((1 – x) + x /10) 5/9 = x = 0.56 or 56% c. 0.056/0.5 = 0.11 or 11% d. Maximum speedup = 1/(1/10) = 10 5 = 1/((1 – x) + x /10) 8/9 = x = 0.89 or 89% e. 題目沒說清楚” equal an addition 2×speedup in the vector unit (beyond the initial 10×)” 究竟 是指12倍還是20倍, 解答的答案也不正確, percentage of vectorization的定義也不清楚, 答案有太多種版本, 本題送分. 1.16 a. 1/(0.8 + 0.20/2) = 1.11 b. 1/(0.7 + 0.20/2 + 0.10 × 3/2) = 1.05 c. fp ops: 0.1/0.95 = 10.5%, cache: 0.15/0.95 = 15.8% 1.18 a. 1/(.2 + .8/N) b. 1/(.2 + 8 × 0.005 + 0.8/8) = 2.94 c. 1/(.2 + 3 × 0.005 + 0.8/8) = 3.17 d. 1/(.2 + logN × 0.005 + 0.8/N) e. d/dN(1/((1 – P) + logN × 0.005 + P/N)) = 0 有列出方程式即可, 不必求解, 之前扣分是助教改錯, 已經把分數加回來了. Case Study 2: Power Consumption in Computer Systems 1.4 a. Processor DRAM Hard drive Intel Pentium 4 (240-pin Kinston) * 2 7200rpm HD 7.9W 2.3W * 2 7.9W → (66 + 4.6 + 7.9) / 0.8(supply efficiency) = 99 W b. If full-loaded (always seeking for data), a 7200rpm HD will consume 7.9W. → 7.9 * 40% + 4.0 * 60% = 5.56W c. Solve the following four equations: seek7200 = .75 ×seek5400 seek7200 + idle7200 = 100 seek5400 + idle5400 = 100 seek7200 ×7.9 + idle7200 ×4 = seek5400 ×7 + idle5400 ×2.9 → idle7200 = 29.8% of the time. 1.5 a. Power consumption of the server = 66W + 2.3W + 7.9W = 76.2W, and a cooling door for a rack can dissipate 14KW → 14 KW 66 W + 2.3 W + 7.9 W = 𝟏𝟖𝟑 servers b. Power consumption of the server = 66W + 2.3W + 2×7.9W = 84.1W, and a cooling door for a rack can dissipate 14KW → 14 KW 66 W + 2.3 W +2× 7.9 W = 𝟏𝟔𝟔 systems c. 200 W ×11 = 2200 W 2200 / (76.2) = 28 racks Only 1 cooling door is required. 1.6 The IBM x346 could take less space, which would save money in real estate. The racks might be better laid out. It could also be much cheaper. In addition, if we were running applications that did not match the characteristics of these benchmarks, the IBM x346 might be faster. Finally, there are no reliability numbers shown. Although we do not know that the IBM x346 is better in any of these areas, we do not know it is worse, either. 有寫出具體原因就算答對 1.7 a. The application is 80% parallelizable. → 0.2 + 0.8/2 = 0.2 + 0.4 = 0.6 → Frequency can be 60%. b. core內部都有電晶體的負載電容量, 所以dual core就有兩倍的電容性負載 Powerdynamic = 1/2 * Capacitive load * Voltage2 * Frequency switched (C) (V) (F) 2 Powerdual-core / Powersingle-core = [ (2*C) * (0.6*V) * (0.6*F) ] / [C * V2 * F] = 0.432 (43.2%) c. There should be 1/0.75speedup. → 1/0.75 = 1/[(1-x)+x/2] → x = 0.5 → 50% of application program should be parallelizable. d. core內部都有電晶體的負載電容量, 所以dual core就有兩倍的電容性負載 Powerdual-core / Powersingle-core= [ (2*C) * (0.75*V)2 * (0.7*F) ] / [C * V2 * F] = 0.675 (67.5%)