CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation 1 Introduction It’s not what you say, but how you say it. – A. Putt • An analysis whose results cannot be • understood is as good as one that is never performed. General techniques – Line charts, bar charts, pie charts, histograms • Some specific techniques – Gantt charts, Kiviat graphs … • A picture is worth a thousand words – Plus, easier to look at, more interesting 2 Outline • Types of Variables • Guidelines • Common Mistakes • Pictorial Games • Special Purpose Charts • Decision Maker’s Games • Ratio Games 3 • • Types of Variables Qualitative (Categorical) variables – Have states or subclasses – Can be ordered or unordered • Ex: PC, minicomputer, supercomputer ordered • Ex: scientific, engineering, educational unordered Quantitative variables – Numeric levels – Discrete or continuous • Ex: number of processors, disk blocks, etc. is discrete • Ex: weight of a portable computer is continuous Variables Qualitative 4 Ordered Unordered Quantitative Discrete Continuous Outline • Types of Variables • Guidelines • Common Mistakes • Pictorial Games • Special Purpose Charts • Decision Maker’s Games • Ratio Games 5 Guidelines for Good Graphs (1 of 5) • Again, “art” not “rules”. • Learn with experience. Recognize good/bad when see it. Require minimum effort from reader – Perhaps most important metric – Given two, can pick one that takes less reader effort Ex: 6 a b c a b c Legend Box Direct Labeling Addition • Figure curves should be distinguishable in black-white printout – Using color is fine, but must obey the above principle – Light color curve might be too light gray in black-white printout • Yellow, light green, … • Can use clear marks to distinguish 7 Guidelines for Good Graphs (2 of 5) • Maximize information – Make self-sufficient – Key words in place of symbols • Ex: “PIII, 850 MHz” and not “System A” • Ex: “Daily CPU Usage” not “CPU Usage” – Axis labels as informative as possible • Ex: “Response Time in seconds” not “Response Time” – Can help by using captions, too • Ex: “Transaction response time in seconds versus offered load in transactions per second.” 8 Guidelines for Good Graphs (3 of 5) • Minimize ink – Maximize information-to-ink ratio – Too much unnecessary ink makes chart cluttered, hard to read • Ex: no gridlines unless needed to help read – Chart that gives easier-to-read for same data is preferred .1 1 9 Availability •Same data •Unavail = 1 – avail •Right better Unavailability Guidelines for Good Graphs (4 of 5) • Use commonly accepted practices – Present what people expect – Ex: origin at (0,0) – Ex: independent (cause) on x-axis, dependent (effect) on y-axis – Ex: x-axis scale is linear – Ex: increase left to right, bottom to top – Ex: scale divisions equal • Departures are permitted, but require extra effort from reader so use sparingly 10 Guidelines for Good Graphs (5 of 5) • Avoid ambiguity – – – – 11 Show coordinate axes Show origin Identify individual curves and bars Do not plot multiple variables on same chart Guidelines for Good Graphs (Summary) • Checklist in Jain, Box 10.1, p. 143 • The more “yes” answers, the better – But, again, may consciously decide not to follow these guidelines if better without them • In practice, takes several trials before • 12 arriving at “best” graph Want to present the message the most: accurately, simply, concisely, logically Outline • Types of Variables • Guidelines • Common Mistakes • Pictorial Games • Special Purpose Charts • Decision Maker’s Games • Ratio Games 13 Common Mistakes (1 of 6) • Presenting too many alternatives on one • chart Guidelines – More than 5 to 7 messages is too many • (Maybe related to the limit of human shortterm memory?) – – – – 14 Line chart with 6 curves or less Column chart with 10 bars Pie chart with 8 components Each cell in histogram should have 5+ values Common Mistakes (2 of 6) • Presenting many y-variables on a single chart – Better to make separate graphs – Plotting many y-variables saves space, but better to requires reader to figure out relationship throughput 15 utilization Response time • Space constraints for journal/conf! Common Mistakes (3 of 6) • Using symbols in place of text • More difficult to read symbols than text • Reader must flip through report to see symbol mapping to text – Even if “save” writers time, really “wastes” it since reader is likely to skip! 1 job/sec Y=3 Y=5 16 Service rate Y=1 3 jobs/sec 5 jobs/sec Arrival rate Common Mistakes (4 of 6) • Placing extraneous information on the chart – Goal is to convey particular message, so extra information is distracting – Ex: using gridlines only when exact values are expected to be read – Ex: “per-system” data when average data is only part of message required 17 Common Mistakes (5 of 6) • Selecting scale ranges improperly – Most are prepared by automatic programs (excel, gnuplot) with built-in rules • Give good first-guess – But • May include outlying data points, shrinking body • May have endpoints hard to read since on axis • May place too many (or too few) tics – In practice, almost always over-ride scale values 18 Common Mistakes (6 of 6) • Using a Line Chart instead of Column Chart – Lines joining successive points signify that they can be approximately interpolated – If don’t have meaning, should not use line chart MIPS - No linear relationship between processor types! - Instead, use column chart 19 8000 8010 8020 8120 Outline • Types of Variables • Guidelines • Common Mistakes • Pictorial Games • Special Purpose Charts • Decision Maker’s Games • Ratio Games 20 Pictorial Games • Can deceive as easily as can convey meaning • Note, not always a question of bad practice but should be aware of techniques when reading performance evaluation 21 Non-Zero Origins to Emphasize (1 of 2) • Normally, both axes meet at origin • By moving and scaling, can magnify (or reduce!) difference MINE 2610 5200 YOURS MINE YOURS 2600 0 22 Which graph is better? Non-Zero Origins to Emphasize (2 of 2) • Choose scale so that vertical height of highest point is at least ¾ of the horizontal offset of right-most point – Three-quarters rule • (And represent origin as 0,0) MINE 2600 0 23 YOURS Using Double-Whammy Graph • Two curves can have twice as much impact – But if two metrics are related, knowing one predicts other … so use one! Response Time Goodput Number of Users 24 Plotting Quantities without Confidence Intervals • When random quantification, representing mean (or median) alone (or single data point!) not enough (Worse) 25 MINE MINE YOURS YOURS (Better) Pictograms Scaled by Height • If scaling pictograms, do by area not height since eye drawn to area – Ex: twice as good doubling height quadruples area MINE YOURS (Worse) 26 MINE YOURS (Better) Using Inappropriate Cell Size in Histogram • Getting cell size “right” always takes more than one attempt Frequency Frequency – If too large, all points in same cell – If too small, lacks smoothness 0-2 27 2-4 4-6 6-8 8-10 0-6 6-10 Same data. Left is “normal” and right is “exponential” Using Broken Scales in Column Charts • By breaking scale in middle, can exaggerate differences – May be trivial, but then looks significant – Similar to “zero origin” problem System A-F 28 System A-F Outline • Types of Variables • Guidelines • Common Mistakes • Pictorial Games • Special Purpose Charts • Decision Maker’s Games • Ratio Games 29 Scatter Plot (1 of 2) • Useful in statistical analysis • Also excellent for huge quantities of data – Can show patterns otherwise invisible – (Another example next) 20 15 10 5 0 0 5 10 30 (Geoff Kuenning, 1998) Scatter Plot (2 of 2) 31 Gantt Charts • A type of bar chart that illustrates a • project schedule. Want mix of jobs with significant overlap – Show with Gantt Chart • In general, represents Boolean condition … on or off. Length of lines represent busy. CPU 60 20 I/O Network 32 30 10 (Example 10.1 Page 151 Next) 20 5 15 33 A Gantt chart showing three kinds of schedule dependencies (in red) and percent complete indications.