Looking at Data Dror Feitelson Hebrew University Disclaimer No connection to www.lookingatdata.com They have neat stuff – recommended But we’ll just use very simple graphics The Agenda To promote the collection, sharing, and use of real data about computer systems, in order to ensure that our research is relevant to real-life situations (as opposed to doing research based on assumptions) Computer “Science” • Mathematics = abstract thought • Engineering = building things • Science = learning about the world – Observation – Measurement – Experimentation • The scientific method is also required for the study of complex computer systems (including complexity arising from humans) Example 1: The Top500 list The Top500 List • List of the 500 most powerful supercomputers in the world • As measured by Linpack • Started in 1993 by Dongarra, Meuer, Simon, and Strohmaier • Updated twice a year at www.top500.org • Contains data about vendors, countries, and machine types • Egos and politics in the top spots November 2002 list site 1 Earth sim ctr JP computer proc Earth simulator 5120 NEC max peak 35.9 40.9 2 LANL USA ASCI Q HP 4096 7.73 10.2 3 LANL USA ASCI Q HP 4096 7.73 10.2 4 LLNL USA ASCI wht IBM 8192 7.23 12.3 5 LLNL USA MCR Linux net 2304 5.69 11.1 6 Pittsbg SC USA Alphaserv HP 3016 4.46 6.03 7 Com energie atomique FR Alphaserv HP 2560 3.98 5.12 Top500 Evolution: Scalar vs. Vector 70 50 % machines % processors % Rmax 40 30 20 10 0 19 93 19 95 19 97 19 99 20 01 20 03 20 05 20 07 1993 – 1998: Number of vector machines plummets: MPPs instead of Crays 60 Top500 Evolution: Scalar vs. Vector 70 50 % machines % processors % Rmax 40 30 20 10 0 19 93 19 95 19 97 19 99 20 01 20 03 20 05 20 07 1998 – 2003: Vector machines stabilize • Earth simulator • Cray X1 60 Top500 Evolution: Scalar vs. Vector 70 What happened? 50 % machines % processors % Rmax 40 30 20 10 0 19 93 19 95 19 97 19 99 20 01 20 03 20 05 20 07 2003 – 2007: Vectors all but disappear 60 Top500 Evolution: Parallelism 1000000 10000 1000 100 10 1 19 93 19 96 19 99 20 02 20 05 Most attention typically given to largest machines 100000 top rank largest Top500 Evolution: Parallelism 1000000 10000 1000 100 10 1 19 93 19 96 19 99 20 02 20 05 But let’s focus on the smallest ones: We need more and more proc’s to stay on the list 100000 top rank largest smallest smallest up Top500 Evolution: Parallelism 1000000 Vectors needed 100000 double every 18 10000 months 1000 Microproc’s double every 2-3 years 100 Implication: So microproc’s are 10 in 2008 microprocessors improvingfinally fasterclosed the top rank largest smallest smallest up 1 19 93 19 96 19 99 20 02 20 05 performance gap Historical Perspective Figure from a 1994 report Top500 Evolution: Parallelism Need more proc’s to stay on list 1000000 100000 top rank largest smallest smallest up 10000 1000 100 19 93 19 96 19 99 20 02 20 05 Implication: 10 performance grows faster than Moore’s law1 Top500 Evolution: Parallelism 100000 top rank largest smallest smallest up 10000 1000 100 10 1 19 93 19 96 19 99 20 02 20 05 Need more proc’s to stay on list = performance grows faster than Moore’s law Since 2003 slope increased due to slowing of micro improvements 1000000 Top500 Evolution: Parallelism 1000000 19 93 19 96 19 99 20 02 20 05 BTW: largest 100000 machines stayed 10000 flat for 7 years 1000 Everything else grew exponentially 100 Implication: 10 indicates difficulty 1 in usage and control top rank largest smallest smallest up Example 1: The Top500 list Example 2: Parallel workload patterns Parallel Workloads Archive • All large scale supercomputers maintain accounting logs • Data includes job arrival, queue time, runtime, processors, user, and more • Many are willing to share them (and shame on those who are not) • Collection at www.cs.huji.ac.il/labs/parallel/workload/ • Uses standard format to ease use NASA iPSC/860 trace user cmd proc runtm date time user8 cmd33 1 31 10/19/93 18:06:10 sysadmin pwd 1 16 10/19/93 18:06:57 sysadmin pwd 1 5 10/19/93 18:08:27 intel0 cmd11 64 165 10/19/93 18:11:36 user2 cmd2 1 19 10/19/93 18:11:59 user2 cmd2 1 11 10/19/93 18:12:28 user2 nsh 0 10 10/19/93 18:16:23 user2 cmd1 32 2482 10/19/93 18:16:37 Parallelism Assumptions • Large machines have thousands of processors • Cost many millions of dollars • So expected to be used for large-scale parallel jobs (Ok, maybe also a few smaller debug runs) Parallelism Data SDSC SP2 LANL O2K % jobs % jobs 30 20 10 0 40 30 20 10 0 job size 50 40 30 20 10 0 % jobs % jobs HPC2N Cluster job size 40 30 20 10 0 job size SDSC DataStar job size Parallelism Data On all machines 15-50% of jobs are serial Also very many small jobs SDSC SP2 25 % jobs 20 Implication: good bad news: news:small smalljobs jobs mayare block easy outtolarge packjobs 15 10 5 0 job size Parallelism Data On all machines 15-50% of jobs are serial Also very many small jobs – We think in binary 25 20 % jobs Majority of jobs use power of 2 nodes Implication: – No real application requirements regardless of reason, – Hypercube tradition reduces fragmentation SDSC SP2 15 10 5 0 1 32 63 job size 94 Size-Runtime Correlation • Parallel jobs require resources in two dimensions: – A number of processors – For a duration of time • Assuming the parallelism is used for speedup, weimplication: can expect large jobs to run for Potential large jobs lessscheduling time first also schedules • Important for scheduling, because job size is short jobs first! known in advance Size-Runtime Correlation Data LANL CM-5 CC SDSC Paragon 0.178 SDSC Paragon 0.280 CTC SP2 0.057 SDSC SP2 0.146 LANL O2K -0.096 SDSC Blue 0.121 HPC2N cluster -0.046 SDSC DataStar -0.012 1000000 runtime [s] (log) System 100000 10000 1000 100 10 1 1 10 100 job size (log) 1000 “Distributional” Correlation • Partition jobs into two groups based on size – Small jobs (less than median) – Large jobs (more than median) • Find distribution of runtimes for each group • Measure fraction of support where one distribution dominates the other “Distributional” Correlation System distCC LANL CM-5 0.986 SDSC Paragon 0.990 CTC SP2 0.892 -0.208 small large 1000 100000 runtime (log) SDSC SP2 SDSC DataStar 1 0.8 0.6 0.4 0.2 0 10 SDSC DataStar probability 0.962 Implication: LANL O2K large jobs-0.872 first ≠ SDSC Blueshort jobs 0.993 first (maybe long first) HPC2N clustereven-0.173 probability SDSC SP2 1 0.8 0.6 0.4 0.2 0 10 1000 runtime (log) 100000 Example 1: The Top500 list Example 2: Parallel workload patterns Example 3: “Dirty” data Beware Dirty Data • Looking at data is important • But is all data worth looking at? – Errors in data recording – Evolution and non-stationarity – Diversity between different sources – Multi-class mixtures – Abnormal activity • Need to select relevant data source • Need to clean dirty data Abnormality Example HPC2N cluster 20000 18000 16000 jobs per week Some users are much more active than others So much so that they single-handedly affect workload statistics – Job arrivals (more) – Job sizes Implication: (modal?) we not maygenerally be optimizing Probably representative for user 2 user 2 257 others 14000 12000 10000 8000 6000 4000 2000 0 28/07/2002 7/2/2004 21/08/2005 Workload Flurries • Bursts of activity by a single user – Lots of jobs – All these jobs are small – All of them have similar characteristics • Limited duration (day to weeks) • Flurry jobs may be affected as a group, leading to potential instability (butterfly effect) • This is a problem with evaluation methodology more than with real systems Workload Flurries SDSC SP2 8000 7000 user 374 427 others CTC SP2 3500 5000 4000 3000 2000 1000 0 05/10/1998 25/04/1999 04/09/2000 jobs per week jobs per week 6000 3000 user 135 678 others 2500 2000 1500 1000 500 0 07/07/1996 15/12/1996 25/05/1997 Instability Example CTC SP2 100 90 80 70 60 50 40 30 20 10 0 average bounded slowdown Simulate scheduling of parallel jobs with EASY scheduler Use CTC SP2 trace as input workload Change load by systematically modifying inter-arrival times Leads to erratic behavior 0.5 0.6 0.7 0.8 offered load 0.9 1 Instability Example CTC SP2 100 90 80 70 60 50 40 30 20 10 0 average bounded slowdown Simulate scheduling of parallel jobs with EASY scheduler Use CTC SP2 trace as input workload Change load by Implication: systematically modifying usingtimes dirty data may inter-arrival to erroneous Leads tolead erratic behavior evaluation results Removing a flurry by user 135 solves the problem all w/o user 135 0.5 0.6 0.7 0.8 offered load 0.9 1 Example 1: The Top500 list Example 2: Parallel workload patterns Example 3: “Dirty” data Example 4: User behavior Independence vs. Feedback • Modifying the offered load by changing interarrival times assumes an open system model – Large user population insensitive to system performance – Jobs are independent of each other • But real systems are often closed – Limited user population – New jobs submitted after previous ones terminate • This leads to feedback from system performance to workload generation Evidence for Feedback SDSC SP2 SDSC Paragon 1500 jobs sub'd jobs sub'd 2000 1500 1000 500 500 0 0 0 400000 avg. node-sec 0 800000 CTC SP2 Implication: 2400 jobs are not independent 1600 modifying inter-arrivals 800 is problematic jobs sub'd jobs sub'd 1000 0 0 90000 180000 avg. node-sec 5000 4000 3000 2000 1000 0 250000 avg. node-sec 500000 HPC2N cluster 0 250000 500000 avg. node-sec The Mechanics of Feedback • If users perceive the system as loaded, they will submit less jobs • But what exactly do users care about? – Response time: how long they wait for results – Slowdown: how much longer than expected • Answer needed to create a user model that will react correctly to load conditions Data Mining • Available data: system accounting log • Need to assess user reaction to momentary condition • The idea: associate the user’s think time with the performance of the previous job – Good performance satisfied user continue work session short think time – Bad performance dissatisfied user go home long think time • “performance” = response time or slowdown The Data 10000 10000 1000 100 Implication: response time is a much better predictor of user behavior 1 100 10000 response time [s] think time 100000 think time 100000 1000 100 1 100 slowdown 10000 Predictability = Locality • Predicting the future is good – Avoid constraints of on-line algorithms – Approximate performance of off-line algorithms – Ability to plan ahead • Implies a correlation between events • Application behavior characterized by locality of reference • User behavior characterized by locality of sampling Locality of Sampling Workload attributes are modeled by a marginal distribution But at different times the distributions may be quite distinct SDSC Paragon 1 0.8 0.6 0.4 all year 1-7/2/95 15-21/4/95 13-19/9/95 Implication: the notion that more data0.2 is better is problematic 0 1 100 10000 runtime [s] Locality of Sampling Workload attributes are modeled by a marginal distribution But at different times the distributions may be quite distinct SDSC Paragon 1 0.8 0.6 0.4 all year 1-7/2/95 15-21/4/95 13-19/9/95 Implication: the assumption of 0.2 stationarity is problematic 0 1 100 10000 runtime [s] Locality of Sampling Workload attributes are modeled by a marginal distribution But at different times the distributions may be quite distinct Thus the situation Implication: changes with time locality is required to SDSC Paragon 1 0.8 0.6 0.4 all year 1-7/2/95 15-21/4/95 13-19/9/95 0.2 evaluate adaptive systems0 1 100 10000 runtime [s] Example 1: The Top500 list Example 2: Parallel workload patterns Example 3: “Dirty” data Example 4: User behavior Example 5: Mass-count disparity Variability in Workloads • Changing conditions – locality of sampling – Variability between different periods • Heavy-tailed distributions – Unique “high weight” samples – Samples may be so big that they dominate the workload File Sizes Example 100 % files USENET survey by Gordon Irlam in 1993 Distribution of file sizes is concentrated around several KB 75 50 25 0 1 1000 1000000 1E+09 file size File Sizes Example % files 100 75 50 25 0 1 1000 1000000 1E+09 file size 1 1000 1000000 1E+09 file size 100 % bytes USENET survey by Gordon Irlam in 1993 Distribution of file sizes is concentrated around several KB Distribution of disk space spread over many MB This is mass-count disparity 75 50 25 0 File Sizes Example 100 % files/bytes Joint ratio of 11/89 89% of files have 11% of bytes, while other 11% of files have 89% of bytes (generalization of 20/80 principle and 10/90 principle) 75 50 25 0 1 1000 1000000 1E+09 file size File Sizes Example % files/bytes 100 75 50 25 0 1 1000 1000000 1E+09 file size 1 1000 1000000 1E+09 file size 100 % bytes Joint ratio of 11/89: 89% of files have 11% of bytes, while other 11% of files have 89% of bytes (generalization of 20/80 and 10/90 principles) 0/50 rule: Implication: 50% ofoptimizing files have storage 0% of of bytes,small and 50% ofnot bytes files is needed belong to 0% of files 75 50 25 0 Locality of Reference • Spatial locality • Temporal locality – References to a location are concentrated in a short span of time – Some locations are more popular than others • Some are much more popular References to Memory Locations Joint ratio: 22/78 ¾ of locations get <10 references ¾ of references are to popular locations Implication: sampling a random location finds one that is unpopular SPEC 2000 twolf 1 0.75 0.5 0.25 0 1 10 100 references 1000 References to Memory Locations Joint ratio: 22/78 ¾ of locations get <10 references ¾ of references are to popular locations Implication: sampling a random reference finds a popular location SPEC 2000 twolf 1 0.75 0.5 0.25 0 1 10 100 references 1000 Tomato Soup Computer Science • When confronted with a problem, we tend to abstract • Focus on the essentials • Implies assumption that we can identify the essentials • To keep in touch with reality, we need to look at data • To look at data, we need to collect data and share it “Few of us escape being indoctrinated with these notions: • Numerical calculations are exact, but graphs are rough; • For any particular kind of statistical data there is just one set of calculations constituting a correct statistical analysis; • Performing intricate calculations is virtuous, Looking at data is whereas actually looking at the data is not cheating. cheating." Not looking at data F. J. Anscombe is irresponsible. The American Statistician 27(1) Feb 1973 Thank You • Top500 list – Jack Dongarra, Hans Meuer, Horst Simon, and Erich Strohmaier • Parallel Workloads Archive – – – – – – – – CTC SP2 – Steven Hotovy and Dan Dwyer SDSC Paragon – Reagan Moore and Allen Downey SDSC SP2 and DataStar – Victor Hazlewood SDSC Blue Horizon – Travis Earheart and Nancy Wilkins-Diehr LANL CM5 – Curt Canada LANL O2K – Fabrizio Petrini HPC2N cluster – Ake Sandgren and Michael Jack LLNL uBGL – Moe Jette • Unix files survey – Gordon Irlam • My students – Dan Tsafrir – Edi Shmueli – Yoav Etsion