L2-Perf - Peer Instruction for Computer Science

advertisement
Reading: 2.4, 3.1-3.5.
Measuring and Discussing Computer
System Performance
or
“My computer is faster than your
computer”
Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is
licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.01
Unported License.
Match (Best) Performance Metric to
Domain
Prep with explanation of metrics and domains.
Performance Metrics
1. Network Bandwidth (data/sec)
2. Network Latency (ms)
3. Frame Rate (frames/sec)
4. Throughput (ops/sec)
Domains
Selection
Online
WoW
Crysis
(FPS)
Torrent
Download
Google Server Farm
A
4
3
1
2
B
4
1
3
2
C
2
1
3
4
D
2
3
1
4
E
None of the above
Jack’s car buying analogy, we care about many….
2
Measures of “Performance”
•
•
•
•
•
•
•
Execution Time
Frame Rate
Throughput
(operations/time)
Responsiveness
Performance / Cost
Performance / Power
Performance / Power^2
3
Recall our O(n) discussion
•
Much of computer science focuses on
execution time – and much of our class
will as well.
•
Ultimately, much of what we do we
want fast (response time).
People sit in front of computers /
iphones / etc.
•
So is time a reasonable metric?
User time/ clock time/ CPU time
CPU Time for this class – for now.
4
All Together Now
CPU Execution
=
Time
Instruction
CPI
X
X
Count
Clock Cycle
Time
5
All Together Now
seconds
CPU Execution
=
Time
Instruction
CPI
X
X
Count
instructions
Clock Cycle
Time
cycles/instruction
seconds/cycle
6
CPU Execution
=
Time
•
Instruction
CPI
X
X
Count
IC = 1 billion, 500 MHz
processor, execution
time of 3 seconds.
What is the CPI for
this program?
3 sec = 1*10^9 inst*CPI *
1sec/(5*10^8)cycles
1.5*10^9 cycles = 10^9insts*CPI
1.5 = CPI
Clock Cycle
Time
Selectio
n
CPI
A
3
B
15
C
1.5
D
15*10^9
E
None of the above
7
Individual only
Who Affects Performance?
CPU Execution
=
Time
CT
IC
Instruction
CPI
Clock Cycle
X
X
Count
Time
•
There are a number of people involved in
processor / programming design
•
Each of these elements of the performance
equation can be impacted by different designer(s)
•
Next slides will be about who can impact what.
We’ll do speed voting (1 min ind, 1 min group) then
discuss each slide.
8
1 min ind / 1 min group
Who Affects Performance?
CPU Execution
=
Time
•
CT
IC
Instruction
CPI
Clock Cycle
X
X
Count
Time
What can a programmer influence?
Selection
A
B
C
D
E
Impacts
IC
IC, CPI
IC, CPI, and CT
IC and CT
None of the above
9
Who Affects Performance?
1 min ind / 1 min group
CPU Execution
=
Time
•
CT
IC
Instruction
CPI
Clock Cycle
X
X
Count
Time
What can a compiler influence?
Selection
A
B
C
D
E
Impacts
IC
IC, CPI
IC, CPI, and CT
CPI and CT
None of the above
10
Who Affects Performance?
1 min ind / 1 min group
CPU Execution
=
Time
•
CT
IC
Instruction
CPI
Clock Cycle
X
X
Count
Time
What can an instruction set architect
influence?
Selection
A
B
C
D
E
Impacts
IC
IC, CPI
IC, CPI, and CT
CPI and CT
None of the above
11
Who Affects Performance?
1 min ind / 1 min group
CPU Execution
=
Time
•
CT
IC
Instruction
CPI
Clock Cycle
X
X
Count
Time
What can an hardware designer influence?
Selection
A
B
C
D
E
Impacts
IC
IC, CPI
IC, CPI, and CT
CPI and CT
None of the above
12
Performance Variation
CPU Execution
=
Time
Number of
instructions
ROW
1
2
3
Instruction
CPI
Clock Cycle
X
X
Count
Time
Same machine different
programs
same programs,
different machines,
same ISA
Same programs,
different machines,
different ISA
CPI
DIFF
Same
Diff
Same
DIFF
Same
DIFF
Same
Diff
Clock Cycle Time
Same
Diff
Diff
Sele Row(s)
ction Correct
A
1
B
1 and 3
C
2
D
2 and 3
E
None of the
above
13
Other Performance Metrics
•
Time is useful – but how might we try
to measure the “performance” of a
machine
- MIPS
- MFLOPS
14
MIPS
MIPS = Millions of Instructions Per
Second
= Instruction Count
Execution Time * 106
= Clock rate
CPI * 106
•
•
program-independent
deceptive
Just crank up clock rate and
have it execute tons of noops.
But we need to sell processors,
what do we market?
15
Trying to market the “performance”
of a processor.
•
“Speed Demons” vs. “Brainiacs”
Intel vs. Alpha, Intel wins…
but ends up remarketing.
If we can’t use something like raw processor speed
(CT), if we want CPI - we need to look at
performance on benchmarks
16
Benchmarks - Which Programs?
•
peak throughput measures (simple
programs)?
17
Benchmarks - Which Programs?
•
•
peak throughput measures (simple
programs)?
synthetic benchmarks (whetstone,
dhrystone,...)?
18
Benchmarks - Which Programs?
•
•
•
peak throughput measures (simple
programs)?
synthetic benchmarks (whetstone,
dhrystone,...)?
Real applications
19
Benchmarks - Which Programs?
•
•
•
•
peak throughput measures (simple
programs)?
synthetic benchmarks (whetstone,
dhrystone,...)?
Real applications
SPEC (best of both worlds, but with
problems of their own)
- System Performance Evaluation Cooperative
- Provides a common set of real applications along
with strict guidelines for how to run them.
- provides a relatively unbiased means to compare
machines.
20
Danger in Benchmark-Specific
Performance Measures
•
measures compiler as much as
architecture!
21
SPEC Performance on Pentium III
on clock rate relative to change in
and Pentium Focus
4
INT vs. FP performance. SSE2 on P3
was a FP stack, P4 had independent
registers
22
Speedup
Often want to compare performance of
one machine against another
Performance =
1
Execution Time
Speedup (A over B) = PerformanceA
PerformanceB
Speedup (A over B) = ETB
ETA
•
23
Amdahl’s Law
Execution time
=
after improvement
Execution Time Affected
+ Execution Time Unaffected
Amount of Improvement
24
Amdahl’s Law and Parallelism
•
Our program is 90% parallelizable (segment of code
executable in parallel on multiple cores) and runs in 100
seconds with a single core. What is the execution time if you
use 4 cores (assume no overhead for parallelization)?
ISOMORPHIC
Execution time
=
after improvement
Execution Time Affected
+ Execution Time Unaffected
Amount of Improvement
Selecti
on
Execution Time
A
25 seconds
B
32.5 seconds
C
50 seconds
D
92.5 seconds
E
None of the above
25
Amdahl’s Law and Parallelism
•
Our program is 90% parallelizable (segment of code
executable in parallel on multiple cores) and runs in 100
seconds with a single core. What is the execution time if you
use 2 cores (assume no overhead for parallelization)?
ISOMORPHIC
Execution time
=
after improvement
Execution Time Affected
+ Execution Time Unaffected
Amount of Improvement
Selecti
on
Execution Time
A
55 seconds
B
50 seconds
C
100 seconds
D
95 seconds
E
None of the above
26
Amdahl’s Law
•
So what does Amdalh’s Law mean at a high
level?
Selectio “BEST” message from Amdahl’s Law
n
A
Parallel programming is critical for improving
performance
B
Improving serial code execution is ultimately the
most important goal.
C
Performance is strictly tied to the ability to
determine which percentage of code is
parallelizable.
D
The impact of a performance improvement is limited
by the percent of execution time affected by the
improvement
E
None of the above
27
Point out Phenom II
x4 and x2 get same
performance – but
one has 4 cores the
other 2. What does
this tell us? (Note –
with some slower
speeds of the
Phenom this isn’t
the case – 4 cores
help.)
28
Speedup vs. Sizeup
•
Speedup runs into problems for
parallelization because of diminishing
returns and dominance of serial
execution.
•
What if time were a constant?
Human perception (graphics)
Earthquake prediction
Weather prediction
29
Key Points
•
Be careful how you specify
“performance”
•
Execution time = IC * CPI * CT
•
Use real applications, if possible
•
Use standards, if possible
•
Make the common case fast
30
Download