12. Experimental Evaluation 18-749: Fault-Tolerant Distributed Systems Tudor Dumitraş & Prof. Priya Narasimhan Carnegie Mellon University Recommended readings and these lecture slides are available on CMU’s BlackBoard Electrical &Computer ENGINEERING What Are We Going To Do Today? Overview of experimental techniques Case study: “Fault-Tolerant Middleware and the Magical 1%” Experimental requirements for the project 2 Overview of Experimental Techniques Basics – – Visual representation of data – – – Probability distributions, density functions Outlier detection: 3σ test Boxplots 3D, contour plots Multivariate plots Do’s and don’ts of experimental science 3 Experimental Research “God has chosen that which is the most simple in hypotheses and the most rich in phenomena [...] But when a rule is extremely complex, that which conforms to it passes for random.” Gottfried Wilhelm Leibniz, Discours de Métaphysique, 1686 4 Statistical Distributions If a metric is measured repeatedly, then we can determine its probability distribution function (PDF) – – – PDF(x) is the probability that the metric takes the value x b a PDF ( x)dx Pr[a metric b] Matlab function ksdensity Common statistics – – – – Mean = sum of values / #measurements (mean) Median = half the measured values are below this point (median) Mode = measurement that appears most often in the dataset Standard deviation (σ) = how widely spread the data points are (std) 1 n Xi X n 1 i 1 2 where Xi is a measurement and X is the mean 5 Statistical Tools Percentiles – – – “The Nth percentile” is a value X such that N% of the measured samples are less than X The median is the 50th percentile Matlab function prctile Outlier detection: 3σ test – – – Any value that is more than 3 standard deviations away from the mean is an outlier Latencyoutlier Latency 3 For example, for latency: In Matlab: outliers(a) = a (a > mean(a) + 3*std(a)) 6 Line plot (plot) – – – Scatter plot (plot, scatter) – – Determine a relationship between two variables Reveal clustering of data Bar graphs (bar, bar3) – Y-axis is a function of X-axis values Can use error bars to show standard deviation Can also do an area plot to emphasize overhead or difference between similar metrics Compare discrete values Pie charts (pie, pie3) – 0% Data upsets 50% Data upsets Rounds Client-perceived throughput [bytes/s] Nodes reached Basic Plots Latency [in µs] Breakdown of a metric into its constituent components 7 Boxplots A “box and whisker” plot describes a probability distribution – – – – The box represents the size of the inter-quartile range (the difference between the 25th and 75th percentiles of the dataset) The whiskers indicate the maximum and minimum values The median is also shown Matlab function boxplot In 1970, US Congress instituted a random selection process for the military draft – – All 366 possible birth dates were placed in a rotating drum and selected one by one The order in which the dates were drawn defined the priority for drafting The boxplots show that men born later in the year were more likely to be drafted From http://lib.stat.cmu.edu/DASL/Stories/DraftLottery.html 8 Impact of Two Variables 3D plots – – – – Z axis is a function of X and Y values Surface plots: mesh, surf Scatter plots: plot3, scatter3 Volume: display convex hull using convhulln and trisurf Contour plots – – – Represents a function of 2 variables (the X and Y axes) Suggests the values of the function through color and annotations Displays the isolines (variable combinations that yield the same value) of the function 62 65 94 67 72 p 70 97 110 80 pupset 9 Impact of Many Variables Multi-variate plot 10 Do … Make Results Comparable – – – – Use same hardware for all the experiments Use same versions of your software Avoid interference from other programs or make sure you always get the same interference Vary one parameter at a time Make Results Reproducible – – Record and report all the parameters of your experimental setup Archive and publish raw data Be Rigorous – – – – Minimize the impact of your monitoring infrastructure Report number of runs Report mean values and standard deviations Examine statistical distributions (modes, long tails, etc.) 11 Don’t … 14000 Latency [s] 12000 Forget to label the axes of your figures 10000 8000 6000 4000 Use different axis limits when comparing results 5 3 12000 2.5 10000 8000 6000 15 20 x 10 2 1.5 1 0.5 4000 2000 0 10 4 14000 Latency [s] Latency [s] 2000 0 5 10 Clients 15 0 0 20 5 10 Clients 15 20 Plot mean values without looking at the error margin Latency [[s] s] Latency 15000 10000 5000 0 0 5 10 15 15 Clients Clients 20 20 12 FT Middleware and the Magical 1% Unpredictability of FT middleware Unpredictability limited to 1% of remote invocations T. Dumitraş and P. Narasimhan. Fault-Tolerant Middleware and the Magical 1%. In ACM/IFIP/USENIX Conference on Middleware, Grenoble, France, Nov.-Dec. 2005. http://www.ece.cmu.edu/~tdumitra/public_documents/dumitras05magical.pdf 13 Predictability in FT Middleware Systems ? C R C C R Client R Server CORBA CORBA Replicator Replicator Group Communication Host OS Host OS Networking Faults are inherently unpredictable What about the fault-free case? 14 System Configuration for Predictability Can we configure an FT CORBA system for predictable latency? Software configuration – – – – – Operating system: Group Communication: Replication: ORB: Micro-benchmark: RedHat Linux w/ TimeSys 3.1 kernel Spread v. 1.3.1 MEAD v. 1.1 TAO Real Time ORB v. 1.4 10,000 remote invocations per client Hardware configuration – – – 25 hosts on the Emulab test bed Pentium III at 850 MHz 100 Mb/s LAN 15 Experimental Methodology Parameters varied: – – – – – active, warm passive 1, 2, 3 replicas 1, 4, 7, 10, 13, 16, 19, 22 clients 0, 0.5, 2, 8, 32 ms client pause 16, 256, 4096, 65536 bytes Tested all 960 combinations, collected 9.1 Gb of data – Replication style: Replication degree: Number of clients: Request arrival rates: Sizes of reply messages: Trace available at: www.ece.cmu.edu/~tdumitra/MEAD_trace Statistical analysis of end-to-end latency: – – – – Means, medians, standard deviations Maximum and minimum values 1st, 5th, 95th, 99th percentiles Numbers and sizes of the outliers 16 Example of Unpredictability Maximum latency can be several orders of magnitude larger than the average Distribution is skewed to the right and has a long tail Long tail occurs on only one side because the latency cannot be arbitrarily low – MEAD latency is lower-bounded by CORBA and group communication latency 17 Systematic Unpredictability Average values increase linearly with the number of clients Maximum values are unpredictable 18 Counting the Outliers An outlier is a measurement that fails the 3σ test In most cases, less than 1% of the measured latencies are outliers Outliers originate in various modules of the system: – – – The ORB The group communication The application 19 The “Magical” 1% 20 The “Magical” 1% The “haircut” effect of removing 1% of the highest remote latencies 21 Observable Trends 10 10 7 10 99% latency [s] Maximum latency [s] 10 6 5 4 10 5 4 3 10 65536 10 65536 2000 4096 Request size [bytes] Request rate [req/s] 1000 16 500 0 1500 256 1000 16 2000 4096 1500 256 10 6 Request size [bytes] 500 0 Request rate [req/s] The 99th percentile helps us identify trends in the data – E.g., latency increases with request rate and size 22 Interpretation Predictable maximum latencies are hard to achieve – – – – Tried to achieve predictability by selecting a good FT CORBA configuration Even in the fault-free case, end-to-end latencies have skewed distributions for almost all 960 parameter combinations Maximums are several orders of magnitude higher than averages Unpredictability cannot be isolated to a single component Magical 1%: achieving predictability through statistical approaches – – We remove 1% of the highest measured latencies Remaining samples have more deterministic properties • 99th percentile helps us identify trends in the data – This allows us to extract tunable, predictable behavior out of fairly complex, dependable systems 23 Experimental Evaluation of 18-749 Projects Requirements for experimental evaluation – – – List of client invocations Probes Graphs Tips Digging deeper 24 Requirements for Experimental Evaluation Things to hand in: – – – – List of client invocations – the server methods you’re going to exercise Raw data from the 7 probes in your application Graphs of end-to-end latency Interpretation of the results Constraints – – – – – All clients must run on separate machines Each client must issue at least 10,000 requests All requests must receive a reply (two-way invocations) The middle tier must have 2 replicas (e.g., primary & backup) Try all 48 combinations of the following: • Number of clients: • Size of reply message: • Inter-request time: 1, 4, 7, 10 original, 256, 512, 1024 bytes 0 (no pause), 20, 40 ms Administrative – Each team must designate a chief experimenter 25 List of Client Invocations METHOD createObj() getInfo() deleteObj() ONE_WAY No No No Name of remote invocation Is it a one-way (no reply)? DB_ACCESS Yes Yes Yes SZ_REQUEST 16 4 4 SZ_REPLY 4 256 4 Size of the forward message before marshaling (the combined sizes of all the in and inout parameters) Does it require a DB access (all 3 tiers are involved)? Size of the return message before marshaling (the combined sizes of all the out and inout parameters) 26 Application Modifications Use only two-way invocations – – Tunable size of replies – – The client must receive a reply from the server for each invocation Suggestion: have at least 2 different invocations in your benchmark Add a variable-sized parameter that is returned by the server (e.g., sequence<octet>) Try the following reply sizes: original, 256 bytes, 512 bytes and 1024 bytes Inter-request time – – – Insert a pause in-between requests Try the following pauses: 0 (no pause), 20, 40 ms CAUTION: • sleep(0) inserts a non-zero pause • On most Linux kernels, you cannot pause for less than 10 ms • For more information: http://www.atl.lmco.com/projects/QoS/RTOS_html/periodic.html 27 Experiments Make Your Life Meaningful 28 Stages of an Invocation Client Server Database Application out in in out out in in out out in Replication out in Middleware reply reply request request Network 29 Data Probes (1 of 7) Client Server Database P1 File Name Application out in in out out in DATA749_app_out_cli_${STY}_2srv_ out in ${C}cli_${IRT}us_${BYT}req_${HOST in out }_team${N}.txt Replication Data Time (in µs) is issued out when each request in Example request 67605 Middleware 69070 69877 72807reply ... reply request Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 30 Data Probes (2 of 7) Client Server P1 Database P2 out in in out ApplicationFile Name DATA749_app_in_cli_${STY}_2srv_ out in ${C}cli_${IRT}us_${BYT}req_${HOST in out }_team${N}.txt Replication Data out Time out (in µs) when in each reply is received in Example Middleware request 67605 69070 69877 reply 72807 ... reply request Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 31 Data Probes (3 of 7) Client Server P3 P1 Database P2 FileApplication Name out in in out out in DATA749_app_msg_cli_${STY}_2srv_ out in ${C}cli_${IRT}us_${BYT}req_${HOST in out }_team${N}.txt Replication Data Name of eachoutinvocation in Example request createObj() Middleware createObj() getInfo() reply deleteObj() ... reply request Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 32 Data Probes (example) Client P3 P1 Server Database P2 Application out in in out out in in out out in Replication out in Middleware reply reply request request Network Example: probe1.record (new Long(gettimeofday())); remoteFactory.createObj (); probe2.record (new Long(gettimeofday())); probe3.record (new String(“createObj()”)); 33 Data Probes (4 of 7) Client File Name Server P3 P1 P2 Application out in in out out in P4 in out Data Time (in µs) when each request is received Example Replication out Database DATA749_app_in_srv_${STY}_2srv_ ${C}cli_${IRT}us_${BYT}req_${HOST} _team${N}.txt in out 67605 69070 69877 72807 ... in Middleware reply reply request request Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 34 Data Probes (5 of 7) Client Server P3 P1 Database P2 Application out in P5 in out out in File Name P4 Replication out DATA749_app_out_srv_${STY}_2srv_ ${C}cli_${IRT}us_${BYT}req_${HOST} _team${N}.txt in out Data in out Time (in µs) when each reply is completed in Example Middleware 67605 69070 69877 72807 request ... reply request reply Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 35 Data Probes (6 of 7) Client P3 P1 P2 P6 Application out in Server File Name P5 in out out in DATA749_app_msg_srv_${STY}_2srv_ ${C}cli_${IRT}us_${BYT}req_${HOST }_team${N}.txt P4 in out Replication out Database Data Name of each invocation Example in out createObj() createObj() getInfo() deleteObj() ... in Middleware reply reply request request Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 36 Data Probes (7 of 7) Client P3 P1 P2 P6 P7 Application out in Server File Name P5 in out out in DATA749_app_source_srv_${STY}_2sr v_${C}cli_${IRT}us_${BYT}req_${HO ST}_team${N}.txt P4 in out Replication out Database Data Hostname of client sending the invocation Example in out black black blue magenta ... in Middleware reply reply request request Network ${IRT} Legend ${STY} ${C} Replication style (ACTIVE or WARM_PASSIVE) Number of clients ${BYT} ${HOST} ${N} Inter-request time (in µs) Reply size (in bytes) Hostname Your team number 37 Probe Invariant Client P3 P1 Server P2 P6 P7 Application out in Database P5 in out out in P4 in out Replication out in out in Middleware reply reply request request Network Probes at the same side and same level must have the same number of records! 38 Computing End-To-End Latency Client P3 P1 Server P2 P6 P7 Application out in Database P5 in out out in P4 in out Replication out in out in Middleware reply reply request request Network For request i: Latency(i) P2 (i) P1 (i) 39 Computing the Components of Latency Client P3 P1 Server P2 P6 P7 Application out in Database P5 in out out in P4 in out Replication out in out in Middleware reply reply request request Network For request i: Server(i) P5 (i) P4 (i) Middleware (i ) Latency(i ) Server(i ) 40 Computing the Request Arrival Rate Client P3 P1 Server P2 P6 P7 Application out in Database P5 in out out in P4 in out Replication out in out in Middleware reply reply request request Network For request i: 106 Req_rate (i ) P4 (i ) P4 (i 1) 41 Computing the Server Throughput Client P3 P1 Server P2 P6 P7 Application out in Database P5 in out out in P4 in out Replication out in out in Middleware reply reply request request Network For request i: 106 Throughput (i ) Size reply P4 (i ) P4 (i 1) 42 Graphs Required Line plots of latency for increasing number of clients and different reply sizes (no pause) Area plots of (mean, max) latency and (mean, 99%) latency, sorted by increasing mean values Bar graphs of latency component break-down for outliers and normal requests 3D scatter plots of reply size and request rate impact on max and 99% latency Latency vs. throughput 43 Interpretation of Results Short write-up containing the “lessons learned” from the experiments What did you learn about your system? – – – – What can you tell (good or bad) about the performance, dependability and robustness of your application? Were the results surprising? If you observed some behavior you didn’t expect, how can you explain it? What further experiments would be needed to verify your hypothesis? Do your results confirm or infirm the magical 1% theory? 44 Tips for Experimental Evaluation Avoid interference – – Use separate machines for each client, server replica, NamingService/JNDI, FT manager, database, etc. Make sure there are no other processes using your CPU or bandwidth Minimize impact of monitoring – – – Store data in pre-allocated memory buffer Flush buffers to the disk at the end Record timestamps as time from the start of the process • Use 4-byte integers (long) for the timestamps Automate the experimental process as much as possible – Create scripts for launching the servers and clients, for collecting data, for analyzing it and for creating the graphs Use Matlab for graphs and data processing – This is installed on the ECE cluster and is available to students • Can also download it from https://www.cmu.edu/myandrew/ – If you need help with plotting your graphs, please send email to us 45 Digging Deeper Do the same thing while injecting faults Other probes – – – – – Other ways to represent data – – – CPU usage (time spend in kernel, user mode) Memory (total, resident set) Bandwidth usage Context switches Major/minor page faults (page not in physical memory) Boxplots for end-to-end latency Impact of varying #clients, size, request rate on #outliers, size of outliers, latency, etc. Do you see multi-modal distributions (can you explain them)? Interpretation of results – – – – Are outliers isolated or do they come in bursts? What is the source of the outliers? Can you predict anything about the behavior of your system? What questions can you answer by looking at this data? 46 Summary of Lecture What matters to you? – – – – – Email all questions to the course mailing list – – What experiments should you run? What data should you collect? How should you present your data? What should you analyze? What lessons might you learn about your system? The other two TAs and myself (Tudor) are on this list We’re happy to sit down and work out the details with you and to help you run your experiments It might sound like a lot of work, but the hard part is behind you – you’ve already built your system – Now, it’s time to understand what you actually built! 47