MTAT.03.243 Software Engineering Management Lecture 12: SPI & Empirical Methods - Part B Dietmar Pfahl Spring 2013 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 email: dietmar.pfahl@ut.ee Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Your feedback is appreciated! • Please fill in the questionnaire • 10 min MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Basics of Statistical Process Control • Statistical Process Control (SPC) – monitoring production process to detect and prevent poor quality UCL • Sample – subset of items produced to use for inspection • Control Charts – process is within statistical control limits MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 LCL Variability • Random – common causes – inherent in a process – can be eliminated only through improvements in the system MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • Non-Random – special causes – due to identifiable factors – can be modified through operator or management action Statistical Process Control • Understanding the process, • Understanding the causes of variation, and • Elimination of the sources of special cause variation. MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 2 1 Identify product or process characteristics that describe process performance Select process 3 Usage of control charts Select the appropriate type of control chart 4 Measure process performance over a period of time 5 Use appropriate calculations based on measurement data to determine center lines and control limits for performance characteristics 6 Plot measurement data on control charts 8 Process is stable; continue measuring 7 Are all measured values within limits and distributed randomly around centerlines? 10 Identify and remove assignable causes 9 Process is not stable Source: Florac & Carleton (1999) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Control Chart Patterns 8 consecutive points on one side of the center line 8 consecutive points up or down across zones 14 points alternating up or down 2 out of 3 consecutive points in zone C but still inside the control limits 4 out of 5 consecutive points in zone C or B MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Detecting out-of-control situations Source: Western Electric (1958) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Common questions for investigating an out-of-control process (1) • Are there differences in the measurement accuracy of instruments/methods used? • Are there differences in the methods used by different personnel? • Is the process affected by the environment? • Has there been a significant change in the environment? • Is the process affected by predictable conditions? – Example: tool wear. • Were any untrained personnel involved in the process at the time? MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Common questions for investigating an out-of-control process (2) • Has there been a change in the source for input to the process? – Example: plans, specs, information. • Is the process affected by employee fatigue? • Has there been a change in policies or procedures? – Example: maintenance procedures. • Is the process adjusted frequently? • Did the samples come from different parts of the process? Shifts? Individuals? • Are employees afraid to report “bad news”? One should treat each “Yes” answer as a potential source of a special cause. MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 2 1 Identify product or process characteristics that describe process performance Select process 3 Usage of control charts Select the appropriate type of control chart 4 Measure process performance over a period of time 5 Use appropriate calculations based on measurement data to determine center lines and control limits for performance characteristics 6 Plot measurement data on control charts 8 Process is stable; continue measuring 7 Are all measured values within limits and distributed randomly around centerlines? 10 Identify and remove assignable causes 9 Process is not stable Source: Florac & Carleton (1999) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Ishikawa Chart Example: Change Request Process change control board problem reports cannot isolate software not logged in properly artifact(s) containing the problem Collection information missing from problem reports Evaluation meets only once a week change decisions Delays in approving not released in a timely manner changes Resolution Closure It takes too long to process software change requests cannot determine takes time to delays in shipping what needs to be done to fix the problem make changes changes and releases cannot replicate must reconfigure problem baselines MTAT.03.243 SENG © G. Ruhe 511 2012 / Lecture 12 / © Dietmar Pfahl 2013 delays en-route 38 2 1 Identify product or process characteristics that describe process performance Select process 3 Usage of control charts Select the appropriate type of control chart 4 Measure process performance over a period of time 5 Use appropriate calculations based on measurement data to determine center lines and control limits for performance characteristics 6 Plot measurement data on control charts 8 Process is stable; continue measuring 7 Are all measured values within limits and distributed randomly around centerlines? 10 Identify and remove assignable causes 9 Process is not stable Source: Florac & Carleton (1999) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Type of Chart depends on Type of Measures • Attribute – a product characteristic that can be evaluated with a discrete response – good – bad; yes - no • Variable – a product characteristic that is continuous and can be measured – e.g., complexity, length MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • Types of charts – Attributes • p-chart • c-chart – Variables • x-bar-chart (means) • R-chart (range) Control Charts for Attributes p-charts uses portion defective in a sample c-charts uses number of defects in an item MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 p-Chart UCL = p + zp LCL = p - zp z = number of standard deviations from process average p = sample proportion defective; an estimate of process average p = standard deviation of sample proportion p = MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 p(1 - p) n p-Chart Example SAMPLE 1 2 3 : : 20 NUMBER OF DEFECTIVES PROPORTION DEFECTIVE 6 0 4 : : 18 200 20 samples of 100 pairs of jeans MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 .06 .00 .04 : : .18 p-Chart Example (cont.) p= total defectives total sample observations UCL = p + z = 200 / 20(100) = 0.10 p(1 - p) = 0.10 + 3 n 0.10(1 - 0.10) 100 UCL = 0.190 LCL = p - z p(1 - p) = 0.10 - 3 n LCL = 0.010 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 0.10(1 - 0.10) 100 0.20 UCL = 0.190 0.18 p-Chart Example (cont.) Proportion defective 0.16 0.14 0.12 0.10 p = 0.10 0.08 0.06 0.04 0.02 LCL = 0.010 2 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 4 6 8 10 12 Sample number 14 16 18 20 c-Chart UCL = c + zc LCL = c - zc c = where c = number of defects per sample MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 c c-Chart (cont.) Number of defects in 15 samples SAMPLE 1 2 3 NUMBER OF DEFECTS 12 8 16 : : : : 15 15 190 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 c= UCL LCL 190 = 12.67 15 = c + zc = 12.67 + 3 = 23.35 12.67 = c + zc = 12.67 - 3 = 1.99 12.67 24 UCL = 23.35 c-Chart (cont.) Number of defects 21 18 c = 12.67 15 12 9 6 LCL = 1.99 3 2 4 6 8 10 Sample number MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 12 14 16 Control Charts for Variables Mean chart ( x-bar-Chart ) uses average of a sample Range chart ( R-Chart ) uses amount of dispersion in a sample MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 x-bar Chart x1 + x2 + ... xk x= = k = UCL = x + A2R = LCL = x - A2R where x= = average of sample means MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 5.10 – 5.08 – UCL = 5.08 5.06 – Mean 5.04 – x- bar Chart Example (cont.) 5.02 – x= = 5.01 5.00 – 4.98 – 4.96 – LCL = 4.94 4.94 – 4.92 – | 1 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 | 2 | 3 | | | 4 5 6 Sample number | 7 | 8 | 9 | 10 R- Chart UCL = D4R LCL = D3R R R= k where R = range of each sample k = number of samples MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 R-Chart Example (cont.) 0.28 – 0.24 – Range 0.20 – 0.16 – UCL = 0.243 R = 0.115 0.12 – 0.08 – 0.04 – 0– LCL = 0 | | | 1 2 3 | | | 4 5 6 Sample number MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 | 7 | 8 | 9 | 10 Required Sample Size Attribute charts require larger sample sizes 50 to 100 parts in a sample Variable charts require smaller samples 2 to 10 parts in a sample MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Motorola and Six-Sigma Mikel J. Harry • Ph.D. – Arizona State University 1984 • M.A. – Ball State University 1981 • B.S. – Ball State University 1973 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Six-Sigma Key: • σ = standard deviation • µ = center of the distribution (shifted 1.5σ from its original, on-target location) • +/-3σ & +/-6σ show the specifications relative to the original target MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • Conceptually, the sigma level of a process or product is where its customer-driven specifications intersect with its distribution. • A centered six-sigma process has a normal distribution with mean=target and specifications placed 6 standard deviations to either side of the mean. At this point, the portions of the distribution that are beyond the specifications contain 0.002 ppm of the data (0.001 on each side). • Practice has shown that most manufacturing processes experience a shift (due to drift over time) of 1.5 standard deviations so that the mean no longer equals target. When this happens in a six-sigma process, a larger portion of the distribution now extends beyond the specification limits: 3.4 ppm. Six-Sigma Source: SEI http://www.sei.cmu.edu/str/descriptions/sigma6_body.html MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 How to Calculate Six-Sigma? Far Right Tail Probabilities Z P{Z to oo} | Z P{Z to oo} | Z P{Z to oo} | Z P{Z to oo} ----------------+-----------------+------------------+-----------------2.0 0.02275 | 3.0 0.001350 | 4.0 0.00003167 | 5.0 2.867 E-7 2.1 0.01786 | 3.1 0.0009676 | 4.1 0.00002066 | 5.5 1.899 E-8 2.2 0.01390 | 3.2 0.0006871 | 4.2 0.00001335 | 6.0 9.866 E-10 2.3 0.01072 | 3.3 0.0004834 | 4.3 0.00000854 | 6.5 4.016 E-11 2.4 0.00820 | 3.4 0.0003369 | 4.4 0.000005413 | 7.0 1.280 E-12 2.5 0.00621 | 3.5 0.0002326 | 4.5 0.000003398 | 7.5 3.191 E-14 2.6 0.004661 | 3.6 0.0001591 | 4.6 0.000002112 | 8.0 6.221 E-16 2.7 0.003467 | 3.7 0.0001078 | 4.7 0.000001300 | 8.5 9.480 E-18 2.8 0.002555 | 3.8 0.00007235 | 4.8 7.933 E-7 | 9.0 1.129 E-19 2.9 0.001866 | 3.9 0.00004810 | 4.9 4.792 E-7 | 9.5 1.049 E-21 Six-Sigma: P (x ≥ (6 - 1.5)) = P (x ≥ 4.5) = 0.000003398 = 3.398 / 1,000,000 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Six-Sigma and the ±1.5σ Shift • A run chart depicting a +1.5σ drift in a 6σ process. USL and LSL are the upper and lower specification limits and UNL and LNL are the upper and lower natural tolerance limits. MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • Six-Sigma Average industry in the US runs at four sigma, which corresponds to 6210 defects per million opportunities. Depending on the exact definition of "defect" in payroll processing, for example, this sigma level could be interpreted as 6 out of every 1000 paychecks having an error. – MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 As "four sigma" is the average current performance, there are industry sectors running above and below this value. • Internal Revenue Service (IRS) phone-in tax advice, for instance, runs at roughly two sigma, which corresponds to 308,537 errors per million opportunities. Again, depending on the exact definition of defect, this could be interpreted as 30 out of 100 phone calls resulting in erroneous tax advice. ("Two Sigma" performance is where many noncompetitive companies run.) • On the other extreme, domestic (U.S.) airline flight fatality rates run at better than six sigma, which could be interpreted as fewer than 3.4 fatalities per million passengers - that is, fewer than 0.00034 fatalities per 100 passengers [Harry 00], [Bylinsky 98], [Harrold 99]. Six-Sigma Assumptions: • Normal Distribution • Process Mean Shift of 1.5σ from Nominal is Likely • Process Mean and Standard Deviation are known • Defects are randomly distributed throughout units • Parts and Process Steps are Independent MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 http://www.socialresearchmethods.net/kb/design.php Experimental Designs Group = Set of “experimental units” (subjects) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs (cont’d) • One-Group designs (withingroup): – Post-Test XO – Pre-Test and Post-Test OXO – Interrupted time-series OOXOOOXOXO… With: O = observation (measurement) X = treatment (intervention) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • Multiple-Group designs (betweengroups): – With or without random sampling / assignment – With or without blocking – Balanced or unbalanced – Factorial Designs: • nested vs. crossed • interaction between factors Experimental Designs: Random Assignment /1 • • ? ? Definition [Pfl94]: – Randomization is the random assignment of subjects to groups or of treatments to experimental units, so that we can assume independence (and thus validity) of results. Rationale for Randomization [Pfl94]: – Sometimes the results of an experimental treatment can be affected by the time, the place or unknown characteristics of the participants (= experimental units / subjects) – These uncontrollable factors can have effects that hide or skew the results of the controllable variables. – To spread and diffuse the effects of these uncontrollable or unknown factors, you can • assign the order of treatments randomly, • assign the participants to each treatment randomly, or • assign the location of each treatment random[y, whenever possible. MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Random Assignment /2 Randomization is a prerequisite for a controlled experiment! MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Blocking /1 ? ? • Definition [Pfl94]: – Blocking (Stratification) means allocating experimental units to blocks (strata) or groups so the units within a block are relatively homogeneous. • Rationale for Blocking [Pfl94]: – The blocked design captures the anticipated variation in the blocks by grouping like varieties, so that the variation does not contribute to the experimental error. MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Blocking /2 • X Y Z A B MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Example [Pfl94]: – Suppose you are investigating the comparative effects of two design techniques A and B on the quality of the resulting code. – The experiment involves teaching the techniques to twelve developers and measuring the number of defects found per thousand lines of code to assess the code quality. – It may be the case that the twelve developers graduated from three universities. It is possible that the universities trained the developers in very different ways, so that the effect of being from a particular university can affect the way in which the design technique is understood or used. – To eliminate this possibility, three blocks can be defined so that the first block contains all developers from university X, the second block from university Y, and the third block from university Z. Then, the treatments are assigned at random to the developers from each block. If the first block has six developers, you would expect three to be assigned to design method A, and three to method B, for instance. Experimental Designs: Blocking /3 with blocking without blocking Less variance increases statistical power (for the same mean difference) MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Balancing • Definition [Pfl94]: – Balancing is the blocking and assigning of treatments so that an equal number of subjects is assigned to each treatment, wherever possible. X Y • Rationale for Balancing [Pfl94]: Z – Balancing is desirable because it simplifies the statistical analysis, but it is not necessary. – Designs can range from being completely balanced to having little or no balance. unbalanced A B MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Factorial Designs Factor 1 Factor 2 LA Factor 1 Factor 2 LA L1 L2 LB LC L1 L2 LB LC LD LD • Definition of “Factorial Design: – The design of an experiment can be expressed by explicitly stating the number of factors and how they relate to the different treatments. – Expressing the design in terms of factors, tells you how many different treatment combinations are required. • Crossed Design: – Two factors, F1 and F2, in a design are said to be crossed if each level of each factor appears with each level of the other factor. • Nested Design: – Factor F2 is nested within factor F1 if each meaningful level of F2 occurs in conjunction with only one level of factor F1. MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Interaction Effects • Example: Measuring time to code a program module with or without using a reusable repository – Case 1: No interaction between factors – Case 2: Interaction effect Effect on Time to Code (Factor 1) depends (also) on Size of Module (Factor 2) Case 1 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Case 2 Experimental Designs: Crossed vs. Nested Design Method Tool Usage Design Method B1 Factorial Design: Useful for investigating one factor with two or more conditions, Useful for looking at two factors, each with two or more conditions. B2 no A1 A2 Method A1 Method A2 Tool B1 Usage Tool B2 Usage yes no yes no • Crossing (each level of each factor appears with each level of the other factor • Nesting (each level of one factor occurs entirely in conjunction with one level of another factor) • Proper nested or crossed design may reduce the number of cases to be tested. similar, but not necessarily identical Factors MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Experimental Designs: Design Selection Flow Chart for selecting an Experimental Design [Pfl95] MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • [Pfl95] S. L. Pfleeger: Experimental Design and Analysis in Software Engineering. Annals of Software Engineering, vol. 1, pp. 219-253, 1995. • Also appeared as: S. L. Pfleeger: Experimental design and analysis in software engineering, Parts 1 to 5, Software Engineering Notes, 1995 and 1996. Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Exercise: Assessing the Quality of Reported Experiments • Checklist • Paper • Work individually – Make sure you take notes on the rationale for your assessment • After ca. 25 min compare with your neighbour • Report to class MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Homework 4 Assignment • Work in Pairs • 2 Phases (A and B) – Deadline A: Mon, 6 May, 17:00 – Deadline B: Wed 15 May, 17:00 • 3 Tasks – Phase A: Task 1 & 2 – Phase B: Task 3 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 • Task 1: – Assess the quality of paper P1 or P1 (pick only one!) • Task 2: – Design a controlled experiment (pick one RQ!) • Task 3: – Review two designs of your peers Structure of Lecture 12 • • • • • • • Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Literature on Empirical Methods in SE • T. Dybå, B. A. Kitchenham, M. Jørgensen (2004) “Evidence-based Software Engineering for Practitioners”, IEEE Software • F. Shull, J. Singer and D. I. K. Sjøberg: Advanced Topics in Empirical Software Engineering, Chapter 11, pp. 285-311, Springer London (ISBN: 13:978-1-84800-043-8) – Chapter: S. Easterbrook et al. (2008) ”Selecting Empirical Methods for Software Engineering Research” • A. Endres and D. Rombach (2003) A Handbook of Software and Systems Engineering – Empirical Observations, Laws and Theories, Addison-Wesley • S. L. Pfleeger (1995-96) “Experimental design and analysis in software engineering”, Parts 1 to 5, Software Engineering Notes • H. Robinson, J. Segal, H. Sharp (2007) ”Ethnographically-informed empirical studies of software practice”, in Information and Software Technology,49(6), pp. 540-551 • W. L. Wallace (1971) The Logic of Science in Sociology, New York: Aldine • R. K. Yin (2002) Case Study Research: Design and Methods, Sage, Thousand Oaks MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013 Next Lecture • Topic: – Industry Presentation by Artur Assor (Nortal): "Rebuilding development infrastructure in Nortal" (tentative title) • For you to do: – Start working on Homework 4 MTAT.03.243 / Lecture 12 / © Dietmar Pfahl 2013