Looking at Data Dror Feitelson Hebrew University

advertisement
Looking at Data
Dror Feitelson
Hebrew University
Disclaimer
No connection to www.lookingatdata.com
They have neat stuff – recommended
But we’ll just use very simple graphics
The Agenda
To promote the collection, sharing, and use of
real data about computer systems, in order to
ensure that our research is relevant to real-life
situations (as opposed to doing research
based on assumptions)
Computer “Science”
• Mathematics = abstract thought
• Engineering = building things
• Science = learning about the world
– Observation
– Measurement
– Experimentation
• The scientific method is also required for the
study of complex computer systems
(including complexity arising from humans)

Example 1: The Top500 list
The Top500 List
• List of the 500 most powerful supercomputers
in the world
• As measured by Linpack
• Started in 1993 by Dongarra, Meuer, Simon,
and Strohmaier
• Updated twice a year at www.top500.org
• Contains data about vendors, countries, and
machine types
• Egos and politics in the top spots
November 2002 list
site
1 Earth sim ctr JP
computer
proc
Earth simulator 5120
NEC
max peak
35.9 40.9
2 LANL USA
ASCI Q HP
4096
7.73 10.2
3 LANL USA
ASCI Q HP
4096
7.73 10.2
4 LLNL USA
ASCI wht IBM
8192
7.23 12.3
5 LLNL USA
MCR Linux net
2304
5.69 11.1
6 Pittsbg SC USA
Alphaserv HP
3016
4.46 6.03
7 Com energie
atomique FR
Alphaserv HP
2560
3.98 5.12
Top500 Evolution: Scalar vs. Vector
70
50
% machines
% processors
% Rmax
40
30
20
10
0
19
93
19
95
19
97
19
99
20
01
20
03
20
05
20
07
1993 – 1998:
Number of vector
machines
plummets: MPPs
instead of Crays
60
Top500 Evolution: Scalar vs. Vector
70
50
% machines
% processors
% Rmax
40
30
20
10
0
19
93
19
95
19
97
19
99
20
01
20
03
20
05
20
07
1998 – 2003:
Vector machines
stabilize
• Earth simulator
• Cray X1
60
Top500 Evolution: Scalar vs. Vector
70
What happened?
50
% machines
% processors
% Rmax
40
30
20
10
0
19
93
19
95
19
97
19
99
20
01
20
03
20
05
20
07
2003 – 2007:
Vectors all but
disappear
60
Top500 Evolution: Parallelism
1000000
10000
1000
100
10
1
19
93
19
96
19
99
20
02
20
05
Most attention
typically given to
largest machines
100000
top rank
largest
Top500 Evolution: Parallelism
1000000
10000
1000
100
10
1
19
93
19
96
19
99
20
02
20
05
But let’s focus on
the smallest ones:
We need more
and more proc’s to
stay on the list
100000
top rank
largest
smallest
smallest up
Top500 Evolution: Parallelism
1000000
Vectors needed
100000
double every 18
10000
months
1000
Microproc’s double
every 2-3 years
100
Implication:
So microproc’s
are
10
in 2008 microprocessors
improvingfinally
fasterclosed the
top rank
largest
smallest
smallest up
1
19
93
19
96
19
99
20
02
20
05
performance gap
Historical Perspective
Figure from a
1994 report
Top500 Evolution: Parallelism
Need more proc’s
to stay on list
1000000
100000
top rank
largest
smallest
smallest up
10000
1000
100
19
93
19
96
19
99
20
02
20
05
Implication:
10
performance grows faster
than Moore’s law1
Top500 Evolution: Parallelism
100000
top rank
largest
smallest
smallest up
10000
1000
100
10
1
19
93
19
96
19
99
20
02
20
05
Need more proc’s
to stay on list =
performance
grows faster than
Moore’s law
Since 2003 slope
increased due to
slowing of micro
improvements
1000000
Top500 Evolution: Parallelism
1000000
19
93
19
96
19
99
20
02
20
05
BTW: largest
100000
machines stayed
10000
flat for 7 years
1000
Everything else
grew exponentially
100
Implication:
10
indicates difficulty
1
in usage and control
top rank
largest
smallest
smallest up
Example 1: The Top500 list
 Example 2: Parallel workload patterns

Parallel Workloads Archive
• All large scale supercomputers maintain
accounting logs
• Data includes job arrival, queue time,
runtime, processors, user, and more
• Many are willing to share them
(and shame on those who are not)
• Collection at
www.cs.huji.ac.il/labs/parallel/workload/
• Uses standard format to ease use
NASA iPSC/860 trace
user
cmd
proc runtm date
time
user8
cmd33
1
31 10/19/93 18:06:10
sysadmin
pwd
1
16 10/19/93 18:06:57
sysadmin
pwd
1
5 10/19/93 18:08:27
intel0
cmd11
64
165 10/19/93 18:11:36
user2
cmd2
1
19 10/19/93 18:11:59
user2
cmd2
1
11 10/19/93 18:12:28
user2
nsh
0
10 10/19/93 18:16:23
user2
cmd1
32
2482 10/19/93 18:16:37
Parallelism Assumptions
• Large machines have thousands of
processors
• Cost many millions of dollars
• So expected to be used for large-scale
parallel jobs
(Ok, maybe also a few smaller debug runs)
Parallelism Data
SDSC SP2
LANL O2K
% jobs
% jobs
30
20
10
0
40
30
20
10
0
job size
50
40
30
20
10
0
% jobs
% jobs
HPC2N Cluster
job size
40
30
20
10
0
job size
SDSC DataStar
job size
Parallelism Data
On all machines 15-50% of
jobs are serial
Also very many small jobs
SDSC SP2
25
% jobs
20
Implication:
good
bad news:
news:small
smalljobs
jobs
mayare
block
easy
outtolarge
packjobs
15
10
5
0
job size
Parallelism Data
On all machines 15-50% of
jobs are serial
Also very many small jobs
– We think in binary
25
20
% jobs
Majority of jobs use power of
2 nodes
Implication:
– No real application
requirements
regardless of reason,
– Hypercube
tradition
reduces
fragmentation
SDSC SP2
15
10
5
0
1
32
63
job size
94
Size-Runtime Correlation
• Parallel jobs require resources in two
dimensions:
– A number of processors
– For a duration of time
• Assuming the parallelism is used for
speedup,
weimplication:
can expect large jobs to run for
Potential
large jobs
lessscheduling
time
first also schedules
• Important for scheduling, because job size is
short jobs first!
known in advance
Size-Runtime Correlation Data
LANL CM-5
CC
SDSC Paragon
0.178
SDSC Paragon
0.280
CTC SP2
0.057
SDSC SP2
0.146
LANL O2K
-0.096
SDSC Blue
0.121
HPC2N cluster
-0.046
SDSC DataStar
-0.012
1000000
runtime [s] (log)
System
100000
10000
1000
100
10
1
1
10
100
job size (log)
1000
“Distributional” Correlation
• Partition jobs into two groups based on size
– Small jobs (less than median)
– Large jobs (more than median)
• Find distribution of runtimes for each group
• Measure fraction of support where one
distribution dominates the other
“Distributional” Correlation
System
distCC
LANL CM-5
0.986
SDSC Paragon
0.990
CTC SP2
0.892
-0.208
small
large
1000
100000
runtime (log)
SDSC SP2
SDSC DataStar
1
0.8
0.6
0.4
0.2
0
10
SDSC DataStar
probability
0.962
Implication:
LANL O2K
large jobs-0.872
first ≠
SDSC Blueshort jobs
0.993
first
(maybe
long first)
HPC2N
clustereven-0.173
probability
SDSC SP2
1
0.8
0.6
0.4
0.2
0
10
1000
runtime (log)
100000
Example 1: The Top500 list
 Example 2: Parallel workload patterns
 Example 3: “Dirty” data

Beware Dirty Data
• Looking at data is important
• But is all data worth looking at?
– Errors in data recording
– Evolution and non-stationarity
– Diversity between different sources
– Multi-class mixtures
– Abnormal activity
• Need to select relevant data source
• Need to clean dirty data
Abnormality Example
HPC2N cluster
20000
18000
16000
jobs per week
Some users are much
more active than others
So much so that they
single-handedly affect
workload statistics
– Job arrivals (more)
– Job sizes Implication:
(modal?)
we not
maygenerally
be optimizing
Probably
representative
for user 2
user 2
257 others
14000
12000
10000
8000
6000
4000
2000
0
28/07/2002
7/2/2004
21/08/2005
Workload Flurries
• Bursts of activity by a single user
– Lots of jobs
– All these jobs are small
– All of them have similar characteristics
• Limited duration (day to weeks)
• Flurry jobs may be affected as a group,
leading to potential instability (butterfly effect)
• This is a problem with evaluation
methodology more than with real systems
Workload Flurries
SDSC SP2
8000
7000
user 374
427 others
CTC SP2
3500
5000
4000
3000
2000
1000
0
05/10/1998 25/04/1999 04/09/2000
jobs per week
jobs per week
6000
3000
user 135
678 others
2500
2000
1500
1000
500
0
07/07/1996
15/12/1996
25/05/1997
Instability Example
CTC SP2
100
90
80
70
60
50
40
30
20
10
0
average bounded slowdown
Simulate scheduling of
parallel jobs with EASY
scheduler
Use CTC SP2 trace as
input workload
Change load by
systematically modifying
inter-arrival times
Leads to erratic behavior
0.5
0.6
0.7
0.8
offered load
0.9
1
Instability Example
CTC SP2
100
90
80
70
60
50
40
30
20
10
0
average bounded slowdown
Simulate scheduling of
parallel jobs with EASY
scheduler
Use CTC SP2 trace as
input workload
Change load by
Implication:
systematically modifying
usingtimes
dirty data may
inter-arrival
to erroneous
Leads tolead
erratic
behavior
evaluation
results
Removing
a flurry by
user
135 solves the problem
all
w/o user 135
0.5
0.6
0.7
0.8
offered load
0.9
1
Example 1: The Top500 list
 Example 2: Parallel workload patterns
 Example 3: “Dirty” data
 Example 4: User behavior

Independence vs. Feedback
• Modifying the offered load by changing interarrival times assumes an open system model
– Large user population insensitive to system
performance
– Jobs are independent of each other
• But real systems are often closed
– Limited user population
– New jobs submitted after previous ones terminate
• This leads to feedback from system
performance to workload generation
Evidence for Feedback
SDSC SP2
SDSC Paragon
1500
jobs sub'd
jobs sub'd
2000
1500
1000
500
500
0
0
0
400000
avg. node-sec
0
800000
CTC
SP2
Implication:
2400
jobs are not independent
1600
modifying inter-arrivals
800
is problematic
jobs sub'd
jobs sub'd
1000
0
0
90000
180000
avg. node-sec
5000
4000
3000
2000
1000
0
250000
avg. node-sec
500000
HPC2N cluster
0
250000
500000
avg. node-sec
The Mechanics of Feedback
• If users perceive the system as loaded, they
will submit less jobs
• But what exactly do users care about?
– Response time: how long they wait for results
– Slowdown: how much longer than expected
• Answer needed to create a user model that
will react correctly to load conditions
Data Mining
• Available data: system accounting log
• Need to assess user reaction to momentary
condition
• The idea: associate the user’s think time with
the performance of the previous job
– Good performance  satisfied user  continue
work session  short think time
– Bad performance  dissatisfied user  go home
 long think time
• “performance” = response time or slowdown
The Data
10000
10000
1000
100
Implication:
response time is a
much better predictor
of user behavior
1
100
10000
response time [s]
think time
100000
think time
100000
1000
100
1
100
slowdown
10000
Predictability = Locality
• Predicting the future is good
– Avoid constraints of on-line algorithms
– Approximate performance of off-line algorithms
– Ability to plan ahead
• Implies a correlation between events
• Application behavior characterized by locality
of reference
• User behavior characterized by locality of
sampling
Locality of Sampling
Workload attributes are
modeled by a marginal
distribution
But at different times the
distributions may be quite
distinct
SDSC Paragon
1
0.8
0.6
0.4
all year
1-7/2/95
15-21/4/95
13-19/9/95
Implication:
the notion that more data0.2
is better is problematic 0
1
100
10000
runtime [s]
Locality of Sampling
Workload attributes are
modeled by a marginal
distribution
But at different times the
distributions may be quite
distinct
SDSC Paragon
1
0.8
0.6
0.4
all year
1-7/2/95
15-21/4/95
13-19/9/95
Implication:
the assumption of 0.2
stationarity is problematic 0
1
100
10000
runtime [s]
Locality of Sampling
Workload attributes are
modeled by a marginal
distribution
But at different times the
distributions may be quite
distinct
Thus the situation
Implication:
changes
with time
locality
is required to
SDSC Paragon
1
0.8
0.6
0.4
all year
1-7/2/95
15-21/4/95
13-19/9/95
0.2
evaluate adaptive systems0
1
100
10000
runtime [s]
Example 1: The Top500 list
 Example 2: Parallel workload patterns
 Example 3: “Dirty” data
 Example 4: User behavior
 Example 5: Mass-count disparity

Variability in Workloads
• Changing conditions
– locality of sampling
– Variability between different periods
• Heavy-tailed distributions
– Unique “high weight” samples
– Samples may be so big that they dominate the
workload
File Sizes Example
100
% files
USENET survey by
Gordon Irlam in 1993
Distribution of file sizes is
concentrated around
several KB
75
50
25
0
1
1000 1000000 1E+09
file size
File Sizes Example
% files
100
75
50
25
0
1
1000 1000000 1E+09
file size
1
1000 1000000 1E+09
file size
100
% bytes
USENET survey by
Gordon Irlam in 1993
Distribution of file sizes is
concentrated around
several KB
Distribution of disk space
spread over many MB
This is mass-count
disparity
75
50
25
0
File Sizes Example
100
% files/bytes
Joint ratio of 11/89
89% of files have 11% of
bytes, while other 11% of
files have 89% of bytes
(generalization of 20/80
principle and 10/90
principle)
75
50
25
0
1
1000 1000000 1E+09
file size
File Sizes Example
% files/bytes
100
75
50
25
0
1
1000 1000000 1E+09
file size
1
1000 1000000 1E+09
file size
100
% bytes
Joint ratio of 11/89:
89% of files have 11% of
bytes, while other 11% of
files have 89% of bytes
(generalization of 20/80
and 10/90 principles)
0/50 rule: Implication:
50% ofoptimizing
files have storage
0% of of
bytes,small
and 50%
ofnot
bytes
files is
needed
belong to 0% of files
75
50
25
0
Locality of Reference
• Spatial locality
• Temporal locality
– References to a location are concentrated in a
short span of time
– Some locations are more popular than others
• Some are much more popular
References to Memory Locations
Joint ratio: 22/78
¾ of locations get <10
references
¾ of references are to
popular locations
Implication:
sampling a random
location finds one
that is unpopular
SPEC 2000 twolf
1
0.75
0.5
0.25
0
1
10
100
references
1000
References to Memory Locations
Joint ratio: 22/78
¾ of locations get <10
references
¾ of references are to
popular locations
Implication:
sampling a random
reference finds a
popular location
SPEC 2000 twolf
1
0.75
0.5
0.25
0
1
10
100
references
1000
Tomato Soup
Computer Science
• When confronted with a problem, we tend to
abstract
• Focus on the essentials
• Implies assumption that we can identify the
essentials
• To keep in touch with reality, we need to
look at data
• To look at data, we need to collect data and
share it
“Few of us escape being indoctrinated with
these notions:
• Numerical calculations are exact, but graphs
are rough;
• For any particular kind of statistical data there
is just one set of calculations constituting a
correct statistical analysis;
• Performing intricate calculations is virtuous,
Looking
at data
is
whereas
actually
looking
at the data is
not cheating.
cheating."
Not looking at data
F. J. Anscombe
is irresponsible.
The American Statistician 27(1) Feb 1973
Thank You
• Top500 list – Jack Dongarra, Hans Meuer, Horst Simon, and
Erich Strohmaier
• Parallel Workloads Archive
–
–
–
–
–
–
–
–
CTC SP2 – Steven Hotovy and Dan Dwyer
SDSC Paragon – Reagan Moore and Allen Downey
SDSC SP2 and DataStar – Victor Hazlewood
SDSC Blue Horizon – Travis Earheart and Nancy Wilkins-Diehr
LANL CM5 – Curt Canada
LANL O2K – Fabrizio Petrini
HPC2N cluster – Ake Sandgren and Michael Jack
LLNL uBGL – Moe Jette
• Unix files survey – Gordon Irlam
• My students
– Dan Tsafrir
– Edi Shmueli
– Yoav Etsion
Download