CS533 Modeling and Performance Evaluation of Network and Computer Systems

advertisement
CS533
Modeling and Performance
Evaluation of Network and
Computer Systems
The Art of Data
Presentation
1
Introduction
It’s not what you say, but how you say it. – A. Putt
• An analysis whose results cannot be
•
understood is as good as one that is never
performed.
General techniques
– Line charts, bar charts, pie charts,
histograms
• Some specific techniques
– Gantt charts, Kiviat graphs …
• A picture is worth a thousand words
– Plus, easier to look at, more interesting
2
Outline
• Types of Variables
• Guidelines
• Common Mistakes
• Pictorial Games
• Special Purpose Charts
• Decision Maker’s Games
• Ratio Games
3
•
•
Types of Variables
Qualitative (Categorical) variables
– Have states or subclasses
– Can be ordered or unordered
• Ex: PC, minicomputer, supercomputer  ordered
• Ex: scientific, engineering, educational  unordered
Quantitative variables
– Numeric levels
– Discrete or continuous
• Ex: number of processors, disk blocks, etc. is discrete
• Ex: weight of a portable computer is continuous
Variables
Qualitative
4
Ordered
Unordered
Quantitative
Discrete
Continuous
Outline
• Types of Variables
• Guidelines
• Common Mistakes
• Pictorial Games
• Special Purpose Charts
• Decision Maker’s Games
• Ratio Games
5
Guidelines for Good Graphs (1 of 5)
• Again, “art” not “rules”.
•
Learn with
experience. Recognize good/bad when see it.
Require minimum effort from reader
– Perhaps most important metric
– Given two, can pick one that takes less reader
effort
Ex:
6
a
b
c
a
b
c
Legend Box
Direct Labeling
Addition
• Figure curves should be distinguishable in
black-white printout
– Using color is fine, but must obey the above
principle
– Light color curve might be too light gray in
black-white printout
• Yellow, light green, …
• Can use clear marks to distinguish
7
Guidelines for Good Graphs (2 of 5)
• Maximize information
– Make self-sufficient
– Key words in place of symbols
• Ex: “PIII, 850 MHz” and not “System A”
• Ex: “Daily CPU Usage” not “CPU Usage”
– Axis labels as informative as possible
• Ex: “Response Time in seconds” not
“Response Time”
– Can help by using captions, too
• Ex: “Transaction response time in seconds
versus offered load in transactions per
second.”
8
Guidelines for Good Graphs (3 of 5)
• Minimize ink
– Maximize information-to-ink ratio
– Too much unnecessary ink makes chart
cluttered, hard to read
• Ex: no gridlines unless needed to help read
– Chart that gives easier-to-read for same
data is preferred
.1
1
9
Availability
•Same data
•Unavail = 1 – avail
•Right better
Unavailability
Guidelines for Good Graphs (4 of 5)
• Use commonly accepted practices
– Present what people expect
– Ex: origin at (0,0)
– Ex: independent (cause) on x-axis,
dependent (effect) on y-axis
– Ex: x-axis scale is linear
– Ex: increase left to right, bottom to top
– Ex: scale divisions equal
• Departures are permitted, but require
extra effort from reader so use sparingly
10
Guidelines for Good Graphs (5 of 5)
• Avoid ambiguity
–
–
–
–
11
Show coordinate axes
Show origin
Identify individual curves and bars
Do not plot multiple variables on same chart
Guidelines for Good Graphs
(Summary)
• Checklist in Jain, Box 10.1, p. 143
• The more “yes” answers, the better
– But, again, may consciously decide not to
follow these guidelines if better without
them
• In practice, takes several trials before
•
12
arriving at “best” graph
Want to present the message the most:
accurately, simply, concisely, logically
Outline
• Types of Variables
• Guidelines
• Common Mistakes
• Pictorial Games
• Special Purpose Charts
• Decision Maker’s Games
• Ratio Games
13
Common Mistakes (1 of 6)
• Presenting too many alternatives on one
•
chart
Guidelines
– More than 5 to 7 messages is too many
• (Maybe related to the limit of human shortterm memory?)
–
–
–
–
14
Line chart with 6 curves or less
Column chart with 10 bars
Pie chart with 8 components
Each cell in histogram should have 5+ values
Common Mistakes (2 of 6)
• Presenting many y-variables on a single
chart
– Better to make separate graphs
– Plotting many y-variables saves space, but
better to requires reader to figure out
relationship
throughput
15
utilization
Response time
• Space constraints for journal/conf!
Common Mistakes (3 of 6)
• Using symbols in place of text
• More difficult to read symbols than text
• Reader must flip through report to see
symbol mapping to text
– Even if “save” writers time, really “wastes”
it since reader is likely to skip!
1 job/sec
Y=3

Y=5
16

Service rate
Y=1
3 jobs/sec
5 jobs/sec
Arrival rate
Common Mistakes (4 of 6)
• Placing extraneous information on the
chart
– Goal is to convey particular message, so
extra information is distracting
– Ex: using gridlines only when exact values
are expected to be read
– Ex: “per-system” data when average data is
only part of message required
17
Common Mistakes (5 of 6)
• Selecting scale ranges improperly
– Most are prepared by automatic programs
(excel, gnuplot) with built-in rules
• Give good first-guess
– But
• May include outlying data points, shrinking
body
• May have endpoints hard to read since on axis
• May place too many (or too few) tics
– In practice, almost always over-ride scale
values
18
Common Mistakes (6 of 6)
• Using a Line Chart instead of Column Chart
– Lines joining successive points signify that
they can be approximately interpolated
– If don’t have meaning, should not use line
chart
MIPS
- No linear relationship
between processor
types!
- Instead, use column
chart
19
8000
8010
8020 8120
Outline
• Types of Variables
• Guidelines
• Common Mistakes
• Pictorial Games
• Special Purpose Charts
• Decision Maker’s Games
• Ratio Games
20
Pictorial Games
• Can deceive as easily as can convey meaning
• Note, not always a question of bad practice
but should be aware of techniques when
reading performance evaluation
21
Non-Zero Origins to Emphasize
(1 of 2)
• Normally, both axes meet at origin
• By moving and scaling, can magnify (or
reduce!) difference
MINE
2610
5200
YOURS
MINE
YOURS
2600
0
22
Which graph is better?
Non-Zero Origins to Emphasize
(2 of 2)
• Choose scale so that vertical height of
highest point is at least ¾ of the horizontal
offset of right-most point
– Three-quarters rule
• (And represent origin as 0,0)
MINE
2600
0
23
YOURS
Using Double-Whammy Graph
• Two curves can have twice as much impact
– But if two metrics are related, knowing one
predicts other … so use one!
Response Time
Goodput
Number of Users
24
Plotting Quantities without
Confidence Intervals
• When random quantification, representing
mean (or median) alone (or single data
point!) not enough
(Worse)
25
MINE
MINE
YOURS
YOURS
(Better)
Pictograms Scaled by Height
• If scaling pictograms, do by area not
height since eye drawn to area
– Ex: twice as good  doubling height
quadruples area
MINE
YOURS
(Worse)
26
MINE
YOURS
(Better)
Using Inappropriate Cell Size in
Histogram
• Getting cell size “right” always takes more
than one attempt
Frequency
Frequency
– If too large, all points in same cell
– If too small, lacks smoothness
0-2
27
2-4
4-6
6-8 8-10
0-6
6-10
Same data. Left is “normal” and
right is “exponential”
Using Broken Scales in Column
Charts
• By breaking scale in middle, can exaggerate
differences
– May be trivial, but then looks significant
– Similar to “zero origin” problem
System A-F
28
System A-F
Outline
• Types of Variables
• Guidelines
• Common Mistakes
• Pictorial Games
• Special Purpose Charts
• Decision Maker’s Games
• Ratio Games
29
Scatter Plot (1 of 2)
• Useful in statistical analysis
• Also excellent for huge quantities of data
– Can show patterns otherwise invisible
– (Another example next)
20
15
10
5
0
0
5
10
30
(Geoff Kuenning, 1998)
Scatter Plot (2 of 2)
31
Gantt Charts
• A type of bar chart that illustrates a
•
project schedule.
Want mix of jobs with significant overlap
– Show with Gantt Chart
• In general, represents Boolean condition …
on or off. Length of lines represent busy.
CPU
60
20
I/O
Network
32
30
10
(Example 10.1
Page 151
Next)
20
5
15
33
A Gantt chart showing three kinds of schedule
dependencies (in red) and percent complete indications.
Download