1

advertisement
1
4.1 Performance and Cost/performance
Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft or are
averages of cited range of values.
2
Figure 4.1 Performance improvement as a function of cost.
3
4.2 Defining Computer Performance
Performance = 1 / Execution time
 Throughput: amount of work performed per unit time. It can be
measured as the number of processes per unit time.
 Tournaround time: the average time from the moment that a job
is submitted until the moment it is completed. It measures how
long the average user has to wait for output.
 Response time: In an interactive systems, the time from when a
user press an Enter or clicks a mouse until the system delivers a
final response.
To filter out variable factor (e.g., scheduling, interrupts, I/O delay)
Performance = 1 / CPU Execution time
4
Figure 4.2 Pipeline analogy shows that imbalance between processing power and I/O capabilities leads to a performance bottleneck.
5
CPU execution time
= Instructions × (Cycles per instruction ) × (seconds per cycle)
= Instructions × CPI / (Clock rate)
(CPI: cycles per instruction)
Performance comparison
(Performance of M1) / (Performance of M2) =
(Execution time of M2) / (Execution time of M1)
6
4.3 Performance Enhancement and Amdahl’s Law
Amdahl’s law
s = 1 / (f+(1-f) / p) ≤ min (p, 1/f)
f: time for instructions that cannot be parallelized.
p: speed-up (by parallel computer or redesign CPU or
algorithm)
s: overall speedup
Study Example 4.1
7
Figure 4.4 Amdahl’s law: speedup achieved if a fraction f of a task is unaffected
and the remaining 1 – f part runs p times as fast.
8
4.4 Performance Measurement vs Modeling
Figure 4.5 Running times of six programs on three machines.
9
Benchmarks: real or synthetic programs that are
selected for comparative evaluation of machine
performance.
SPEC: Standard Performance Evaluation Corporation
Table 4.2 Summary of SPEC CPU2000 benchmark suite characteristics.
10
Figure 4.6 Example graphical depiction of SPEC benchmark results.
Study Example 4.3
11
Performance Estimation
System’s peak performance: expressed in instructions per
second. (MIPS, MFLOPS)
Average CPI 
 (class - i fraction) (class - i CPI)
All inctruction classes
CPU execution time  Instructions  Average CPI / (clockrate)
Table 4.3 Usage frequency, in percentage, for various instruction classes in four
representative applications.
12
Example 4.4 CPI and IPS calculation
Solution
a.
For M1, assume all instructions are class I instructions,
Peak performance of M1 = 1 / (Avg. CPI × Clock time)
= 600 / 2.0 = 300MIPS
Notice: Units for Average CPI and clock time are second.
For M2, assume all instructions are class N instructions,
Peak performance of M2 = 1 / (Avg. CPI × Clock time)
= 500 / 2.0 = 250MIPS
13
b. Average CPI for M1=5.0×0.25+2.0×0.25+2.4×0.5=2.95
Average CPI for M2=4.0×0.25+3.8×0.25+2.4×0.5=2.95
c. 1. Average CPI=2.5×0.25+2.0×0.25+2.4×0.5=2.325
MIPS for option 1 = 600/2.325 = 258
2. Average CPI=5.0×0.25+1.2×0.25+2.4×0.5=2.75
MIPS for option 1 = 600/2.75 = 218
3. MIPS for option 3 = 750/2.95=254.
Conclusion: Option 1 has the greatest impact
d. With larger cache, cache miss rate is reduced 2% (from 5% to 3%), that is all
CPIs are reduced 10×2%=0.2ns (cache miss imposes 10 cycle penalty)
Average CPI M1=(5.0-0.2)×0.25+(2.0-0.2)×0.25+(2.4-0.2)×0.5=2.75
This option is comparable to option 2 in c.
e. Average CPI for M1= 5.0×x+2.0×y+2.4×(1-x-y)=2.6x-0.4y+2.4
Average CPI for M2= 4.0×x+3.8×y+2.0×(1-x-y)=2x+1.8y+2
We need 600/(2.6x-0.4y+2.4) > 500/(2x+1.8y+2) => 2.56y > 0.2x
That is, x/y < 12.8, M1 runs faster than M2 for the given task.
14
Example 4.5 MIPS rating can be misleading
a.
Runtime for the output of compiler 1= (600M+400M)/109= 1.4s
Runtime for the output of compiler 2 = (400M+400M)/109= 1.2s
Compiler 2 is faster.
b. Code produced by compiler 2 is 1.4/1.2= 1.17 times as faster as that of
compiler 1.
c.
Average CPI for compiler 1 = (600M×1+400M×2)/1000M=1.4
Average CPI for compiler 2 = (400M×1+400M×2)/800M=1.5
MIPS rating of compiler 1=1000/1.4=714
MIPS rating of compiler 2=1000/1.5=667
Compiler 1 is faster
15
4.5 Reporting Computer Performance
Table 4.4 Measured or estimated execution times for three programs.
Wrong method (arithmetic mean)
Speedup of Y over X=(0.1+10.0+10.0)/3=6.7
(1)
Speedup of X over y=(10.0+0.1+0.10)/3=3.3 (contradictory with (1))
Total time comparison: correct if they are run the same number of times.
Geometric mean:
Speedup of Y over X=(0.1×10.0×10.0)1/3=2.15
(2)
Speedup of X over y=(10.0×0.1×0.10)1/3=0.46 (consistent with (2))
16
Example 4.6
Table 4.3 Usage frequency, in percentage, for various instruction classes in four
representative applications.
17
Answer:
a. CPI for data compression application on
M1=0.25×4.0+0.32×1.5+0.16×1.2+0×6.0+0.19×2.5+0.08×2.0
=2.31
CPI for data compression application on M2=2.54
CPI for nuclear reactor simulation application on M1=3.94
CPI for nuclear reactor simulation application on M2=2.89
b. Because the programs and clock rates are the same, speedup
ratios is given by the ratio of CPIs.
Data compression performance speed up (M2/M1)= 2.31/2.54
= 0.91
nuclear reactor simulation performance speed up (M2/M1)=
3.94/2.89 = 1.36.
c. Overall performance advantage of M2 over M1 is
(0.91×1.36)1/2=1.11
18
4.6 The Quest for High Performance
Figure 4.7 Exponential growth of supercomputer performance [Bell92].
19
Figure 4.8 Milestones in the Accelerated Strategic Computing Initiative (ASCI) program,
sponsored by the U.S. Department of Energy, with extrapolation up to the PFLOPS level.
20
Problem 4.5
21
Problem 4.12
22
Download