Benchmarking

advertisement
Power calculation for transistor operation
P  CapacitiveLoad Voltage2  ClockFrequency
or
P  CV f clock
2
• What will cause power consumption to
increase?
CS2710 Computer Organization
1
Measuring the current used by the Atmega
microprocessor shows a linear relationship
ATMEGA 32 Current versus Crystal Frequency
P  CV f clock
2
y = 1E-06x + 21.406
R² = 0.9921
45
Microprocessor Current (mA)
Also :
P  IV
Thus :
I  CVf clock
50
40
35
30
25
Current
20
Linear (Current)
15
10
5
0
0
5000000
10000000
15000000
20000000
Crystal Frequency (Hz)
Note: V=5v for in this case
CS2710 Computer Organization
2
What effect does increasing voltage to a
microprocessor have on power?
On speed?
Power versus Microprocessor Voltage
250
y = 3.428x2.5874
R² = 0.9984
Microprocessor Power (mW)
200
150
Power
100
Power (Power)
50
Below around 2.5v (for this microprocessor), the transistors simply stop working
0
0
1
2
Microprocessor
Voltage 4
3
CS2710 Computer Organization
5
6
3
The Power Wall: Why haven’t clock rates
continued to increase at historical rates?
CS2710 Computer Organization
4
Manufacturers have turned to multi-core
architectures to bypass the Power Wall
Clock speed
decrease, but
overall
performance
increase
CS2710 Computer Organization
5
Lecture Objectives:
1) Explain the SPEC benchmarks.
2) Define Amdahl's law
3) Define MIPS
Amdahl’s Law (p51)
• The performance enhancement
possible with a given improvement is
limited by the amount that the
improved feature is used
ExecutionTimeIm proved 
ExecutionTimeAffectedBy Im provement
AmountOfImprovement
CS2710 Computer Organization
 ExecutionTimeUnaffected
7
Amdahl’s Law Applied
• A Program spends 40 seconds
performing network transfers and 60
seconds generating reports.
– Suppose we could rewrite the report
generator to make it more efficient.
– What improvement in performance in the
report generator would be necessary to
increase the overall speed of the program
by a factor of 2?
– How about by a factor of 3?
CS2710 Computer Organization
8
A Performance Metric: MIPS
InstructionCount
MIPS 
6
ExecutionTime 10
Units: millions of instructions per second
CS2710 Computer Organization
9
Issues with MIPS metrics
1. Measures instruction execution rate, but doesn’t
consider the complexity of the instructions
performed
2. Average instruction complexity varies between
programs executing on a single computer
3. Different microprocessors implement instructions
of differing complexities
• MIPS may vary independently from performance
• We cannot compare computers with different
instruction sets using MIPS!
CS2710 Computer Organization
10
Benchmarking: How do you decide which
computer to buy?
CS2710 Computer Organization
11
SPEC Benchmark
• A set of programs used to measure performance
– Supposedly typical of actual workload
• Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006
– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
– Normalize relative to reference machine
– Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i
i1
CS2710 Computer Organization
12
Geometric vs. Arithmetic Mean
• Arithmetic mean:
1 n
xi

n i 1
• Geometric mean:
n
n
x
i 1
i
CS2710 Computer Organization
13
Which computer has better overall
performance?
Computer A
Computer B
Computer C
Program 1
1
10
20
Program 2
1000
100
20
CS2710 Computer Organization
14
Which computer has better overall
performance?
Computer A
Computer B
Computer C
Program 1
1
10
20
Program 2
1000
100
20
Arithmetic
mean
500.5
55
20
Geometric
mean
31.622 . . .
31.622 . . .
20
A is fastest via Arithmetic mean.
A and B are tied via Geometric mean.
Geometric mean is the appropriate mean when the
ranges of the values being compared vary significantly.
CS2710 Computer Organization
15
Benchmarking often computes performance
relative to a standard reference
Computer A
Computer B
Computer C
Program 1
1
10
20
Program 2
1000
100
20
Let’s say A is the “reference” computer. We adjust all performance values by dividing
each value by the reference computer’s value. In this example, we divide all results for
Program 2 by the reference computer’s performance value of 1000, giving:
Computer A
(reference)
Computer B
Computer C
Program 1
1
10
20
Program 2
1
0.1
0.02
Scaling the results in this manner is called normalization.
Note that no normalization was needed for Program 1 since
the reference computer’s value was already 1.
CS2710 Computer Organization
16
Arithmetic and Geometric means based on
the normalized values:
Computer A
Computer B
Computer C
Program 1
1
10
20
Program 2
1
0.1
0.02
Arithmetic
mean
1
5.05
10.01
Geometric
mean
1
1
0.632 . . .
Now C is fastest via Arithmetic mean!
A and B are still tied via Geometric mean.
CS2710 Computer Organization
17
Now consider computer B to be the “reference”
computer and normalize A and C w.r.t. B
Computer A
Computer B
(reference)
Computer C
Program 1
0.1
1
2
Program 2
10
1
0.2
Arithmetic
mean
5.05
1
1.1
Geometric
mean
1
1
0.632
Now A is fastest via Arithmetic mean!
A and B are still tied via Geometric mean.
The Geometric mean is consistent regardless of
normalization!
CS2710 Computer Organization
18
The SPECjvm2008 application
– SPECjvm2008 is a benchmark suite for
measuring the performance of a Java
Runtime Environment (JRE), containing
several real life applications and
benchmarks focusing on core java
functionality.
– The SPECjvm2008 workload mimics a
variety of common general purpose
application computations.
CS2710 Computer Organization
19
CINT2006 integer performance
benchmarks for the Opteron X4 2356
IC×109
CPI
Tc
(ns)
Exec time
Ref time
SPECratio
Interpreted string processing
2,118
0.75
0.40
637
9,777
15.3
bzip2
Block-sorting compression
2,389
0.85
0.40
817
9,650
11.8
gcc
GNU C Compiler
1,050
1.72
0.47
24
8,050
11.1
mcf
Combinatorial optimization
336
10.00
0.40
1,345
9,120
6.8
go
Go game (AI)
1,658
1.09
0.40
721
10,490
14.6
hmmer
Search gene sequence
2,783
0.80
0.40
890
9,330
10.5
sjeng
Chess game (AI)
2,176
0.96
0.48
37
12,100
14.5
libquantum
Quantum computer simulation
1,623
1.61
0.40
1,047
20,720
19.8
h264avc
Video compression
3,102
0.80
0.40
993
22,130
22.3
omnetpp
Discrete event simulation
587
2.94
0.40
690
6,250
9.1
astar
Games/path finding
1,082
1.79
0.40
773
7,020
9.1
xalancbmk
XML parsing
1,058
2.70
0.40
1,143
6,900
6.0
Name
Description
perl
Geometric mean
11.7
CS2710 Computer Organization
20
SPEC and power: ssj_ops
(server-side java operations/sec)
• Power consumption of server at
different workload levels
– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)
 10
  10

Overall ssj_ops per Watt    ssj_ops i    poweri 
 i0
  i 0

CS2710 Computer Organization
21
A Power benchmark: SPEC Power versus load
SPECpower_ssj2008 for X4
Target Load %
Performance (ssj_ops/sec)
Average Power (Watts)
100%
231,867
295
90%
211,282
286
80%
185,803
275
70%
163,427
265
60%
140,160
256
50%
118,324
246
40%
920,35
233
30%
70,500
222
20%
47,126
206
10%
23,066
180
0%
0
141
1,283,590
2,605
Overall sum
∑ssj_ops/ ∑power
493
CS2710 Computer Organization
22
Low power at low usage? No!
• Look back at X4 power benchmark
– At 100% load: 295W
– At 50% load: 246W (83%)
– At 10% load: 180W (61%)
• Google data center
– Mostly operates at 10% – 50% load
– At 100% load less than 1% of the time
• Future research/development: Design
processors to make power proportional
to load
CS2710 Computer Organization
23
Download