HW1

advertisement
Erik Lee
EN 525.712 Advanced Computer Architecture
Charles B. Cameron
Homework Set 1 (P1.6, P1.12, P1.15)
2/7/12
Problem 1.6 A programs run time is determined by the product of instructions per program,
cycles per instruction, and clock frequency. Assume the following instruction mix for a MIPS-like
RISC instruction set: 15% stores, 25% loads, 15% branches, and 35% integer arithmetic, 5%
integer shift, and 5% integer multiply. Given that load instructions require two cycles, branches
require four cycles, integer ALU instructions require one cycle, and integer multiplies require
ten cycles, computer the overall CPI.
π‘†π‘‘π‘œπ‘Ÿπ‘’π‘  = 15%, 1 𝑐𝑦𝑐𝑙𝑒
πΏπ‘œπ‘Žπ‘‘π‘  = 25%, 2 𝑐𝑦𝑐𝑙𝑒𝑠
π΅π‘Ÿπ‘Žπ‘›π‘β„Žπ‘’π‘  = 15%, 4 𝑐𝑦𝑐𝑙𝑒𝑠
πΌπ‘›π‘‘π‘’π‘”π‘’π‘Ÿ π΄π‘Ÿπ‘–π‘‘β„Žπ‘šπ‘’π‘‘π‘–π‘ = 35%, 1 𝑐𝑦𝑐𝑙𝑒
πΌπ‘›π‘‘π‘’π‘”π‘’π‘Ÿ π‘†β„Žπ‘–π‘“π‘‘ = 5%, 1 𝑐𝑦𝑐𝑙𝑒
πΌπ‘›π‘‘π‘’π‘”π‘’π‘Ÿ 𝑀𝑒𝑙𝑑𝑖𝑝𝑙𝑦 = 5%, 10 𝑐𝑦𝑐𝑙𝑒𝑠
In order to calculate the overall Cycles per Instruction we just need to add up all the cycles need
to complete each instruction type multiplied by the percentage of time it is used in the program.
So 15% of the time there is a store instruction requiring one cycle and then add that to the 25%
time doing Loads requiring two cycles, and so on until we account for 100% of the time.
𝐢𝑃𝐼 = 1 ∗ .15 + 2 ∗ .25 + 4 ∗ .15 + 1 ∗ .35 + 1 ∗ .05 + 10 ∗ .05
𝐢𝑃𝐼 = .15 + .50 + .60 + .35 + .05 + .05
𝐢𝑃𝐼 = 1.7
Overall CPI = 1.7
Problem 1.12 Using Amdahl’s law, compute speedups for a program that is 98% vectorizable
for a system with 16, 64, 256, and 1024 processors. What would be a reasonable number of
processors to build into a system for running such an application?
Amdahl’s law for calculating relative speedup is below.
𝑆=
1
1
=
𝑇 (1 − 𝑓) + ( 𝑓 )
𝑁
In the speed up equation f represents the portion of the program that is able to run in parallel,
which is 98%, and N represents the number of processors available for use. And the resulting
speedup is in comparison to a system with one processor.
𝑆(𝑓 = .98, 𝑁 = 16) =
1
=
Μƒ 12.3
. 98
(1 − .98) + ( )
16
1
𝑆(𝑓 = .98, 𝑁 = 64) =
=
Μƒ 28.3
. 98
(1 − .98) + ( )
64
1
𝑆(𝑓 = .98, 𝑁 = 256) =
=
Μƒ 41.96
. 98
(1 − .98) + (
)
256
1
𝑆(𝑓 = .98, 𝑁 = 1024) =
=
Μƒ 47.7
. 98
(1 − .98) + (
)
1024
N = 16, Speed up = 12.3
N = 64, Speed up = 28.3
N = 256, Speed up = 41.96
N = 1024, Speed up = 47.7
Again by taking the relative speedup and graphing it out we see the speedup as the number of
processors increases. In locating the knee of the graph we identify the transition point where we
get diminishing returns. To identify the knee we need to find where the slope is .5 or 45
degrees. So from the speedup equation we take the derivative and solve for .5 and find what
should be.
1
𝑆=
. 98
(1 − .98) + ( )
𝑁
𝑑𝑆
2450
=
𝑑𝑁 (𝑁 + 49)2
2450
.5 =
(𝑁 + 49)2
𝑁 = 21
A reasonable number of processors
for this application is 21.
Problem 1.15 In 1995, the IBM AS/400 line of computers transitioned from a CISC instruction
set to a RISC instruction set. Because of the simpler instruction set, the realizable clock
frequency for a given technology generation and the CPI metric improved dramatically.
However, for the same reason, the number of instructions per program also increased
noticeably. Given the following parameters, compute the total performance improvement that
occurred with this transition. Furthermore, compute the break-even clock frequency, breakeven cycles per instruction, and break-even code expansion ratios for this transition, assuming
the other two factors are held constant.
Performance factor
Relative Frequency
Cycles per Instruction
Relative Instructions per program
(dynamic count)
AS/400 CISC
(IMPI) (Actual)
50 MHz
7
1000
AS/400 RISC
(PowerPC) (Actual)
125 MHz
3
3300
Actual Ratio
2.5
0.43
3.3
Break-Even
Ratio
?
?
?
For computing the performance improvement we need to calculate the time to execute the
program before and after the improvement.
1
π‘–π‘›π‘ π‘‘π‘Ÿπ‘’π‘π‘‘π‘–π‘œπ‘›π‘ 
∗
∗ π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦
π‘–π‘›π‘ π‘‘π‘Ÿπ‘’π‘π‘‘π‘–π‘œπ‘› π‘π‘œπ‘’π‘›π‘‘
𝑐𝑦𝑐𝑙𝑒
1
1
π‘ƒπ‘’π‘Ÿπ‘“π‘œπ‘Ÿπ‘šπ‘Žπ‘›π‘π‘’π‘œπ‘™π‘‘ =
∗ ∗ 50𝑀 = 7143
1000 7
1
1
π‘ƒπ‘’π‘Ÿπ‘“π‘œπ‘Ÿπ‘šπ‘Žπ‘›π‘π‘’π‘›π‘’π‘€ =
∗ ∗ 125𝑀 = 12626
3300 3
𝑛𝑒𝑀
π‘ƒπ‘’π‘Ÿπ‘“π‘œπ‘Ÿπ‘šπ‘Žπ‘›π‘π‘’ πΌπ‘šπ‘π‘Ÿπ‘œπ‘£π‘’π‘šπ‘’π‘›π‘‘ =
= 1.76 π‘‘π‘–π‘šπ‘’π‘  π‘“π‘Žπ‘ π‘‘π‘’π‘Ÿ
π‘œπ‘™π‘‘
π‘ƒπ‘’π‘Ÿπ‘“π‘œπ‘Ÿπ‘šπ‘Žπ‘›π‘π‘’ =
Break even clock frequency
1
1
∗ ∗ 𝑓 = 7143
3300 3
𝑓 = 70.715𝑀𝐻𝑧
Break even cycles per instruction
1
1
∗ ∗ 125𝑀 = 7143
3300 𝑐
𝑐 = 5.3 𝑐𝑦𝑐𝑙𝑒𝑠 π‘π‘’π‘Ÿ π‘–π‘›π‘ π‘‘π‘Ÿπ‘’π‘π‘‘π‘–π‘œπ‘›
Break even code expansion ratio
1 1
∗ ∗ 125𝑀 = 7143
𝑖 3
𝑖 = 5833.2 π‘–π‘›π‘ π‘‘π‘Ÿπ‘’π‘π‘‘π‘–π‘œπ‘›π‘ 
Performance Improvement = 1.76 times faster
Break-even clock frequency = 70.715MHz
Break-even cycles per instruction = 5.3 cycles per instruction
Break-even code expansion ratio = 5.8332
Download