Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Page 1 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I have just done my first regression analysis. A colleague of mine indicates that my analysis is incomplete since I did not test the raw data for normality. Do I need to test the x values and the y values for normality? The simple answer is NO. What you should be testing for normality are the residues associated with the expected model. The general model for simple linear regression is the following expression: y = a + bx + ε where a is the intercept term, b is the slope term and ε is the random error. There are four basic assumptions associated with regression analysis. These assumptions deal with linearity, independence, constant variance and normality. In other words: 1. The mean residual values (one mean residual value per value of x) lie on a straight line or the mean y values (one mean y value per value of x) lie on a straight line. 2. The residual values are independent or the y values are independent. 3. The subpopulations or residual values (one subpopulation per value of x) have the same variance or the sub-populations of y values (one subpopulation per value of x) have the same variance. 4. For each value of x, the subpopulation of residual values is normally distributed or for each value of x, the subpopulation of y values is normality distributed. For simplicity it is usually assumed that errors have a normal distribution with mean zero and variance s2. This means that if repeat measurements of y are taken for a particular value of x then most of them are expected to fall close to the regression line and very few to fall a long way from the line. The assumption of normality is checked using the residuals. You can check the assumption of normality for all y values for a given x value but this will require multiple observed y values at each x value (this also requires more work for the data analyst). Page 2 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. When estimating the performance of a process using process capability indices, what is the appropriate sample size for calculating valid Cp and Cpk values? Process capability is the long-term performance level of the process after it has been brought under statistical control. In other words, process capability is the range over which the natural variation of the process occurs as determined by the system of common causes. The data chosen to estimate the variability of the process should attempt to encompass all natural variations (i.e., raw materials, time of day, changes in ambient conditions, people, etc.). For example, one organization might report a very good process capability value using only ten samples produced on one day, while another organization of the same commodity might report a somewhat lesser process capability number using data from a longer period of time that more closely represents the true process performance. If one were to compare these process index numbers when choosing a supplier, the best supplier might not be chosen. As a rule of thumb a minimum of 20 subgroups (of sample size, preferably of a least 4 or 5) should be used to estimate the capability of a process. The number of samples used has a significant influence on the accuracy of the Cpk estimate. For example, for a random sample of size n = 100 drawn from a known normal population of Cpk = 1, the Cpk estimate can vary from 0.85 to 1.15 (with 95 % confidence ). Therefore smaller samples will result in even larger variations of the Cpk statistics. Page 3 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is meant by validation? ISO defines validation as confirming that a product or system appropriately meets the intended use. Some people confused this with verification. Verification is confirming that a product or system meets identified specifications. To some these two words mean the same but there is a distinction – meets intended use vs. meets identified specifications. Verification and validation work together as a sort of “before” (verification) and “after” (validation) proof. Verification answers the question are we doing the job right while validation answers the question are we doing the right job. For example, you assemble bicycles for your customers. One of the key characteristics is how tight the chain is. Your customers have a requirement of 70 lbs/ft ± 5 lbs/ft. After assembling the bike, every chain is checked with a torque wrench per the plant’s procedure and all the results have been in specification. This is verification. How do you know that the torque wrench is still calibrated? This would be validation. You have data but how reliable is the data. You are checking the chains with a torque wrench as the procedure indicates but the procedure should probably indicate that the check should be done with a calibrated torque wrench. Page 4 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. When reading books and articles on process capability, the authors often mention short and long term capability. What is the difference between short term and long term capability? The major difference between short term and long term capability is the type of variation being estimated. Typically, short term variation is estimated by using the within sample estimates of variation (i.e., the average range or average standard deviation within many subgroups taken over time) while long term variation is estimated by using the between samples estimate of variation. If the process is truly in control both of these estimates of variability will be statistically equivalent. One must remember that the within sample variation for any subgroup represents variation of the process for a very short time frame. The 3-5 samples in any one of the subgroups were taken extremely close in time and thus probably only represent common cause variation – probably the best the process can do. In contrast, especially if the process is not in control, the between sample variation includes special causes (i.e., drifts, shifts, cycles, etc.) and thus will be larger than the average within sample variation. Page 5 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Which normality test is the best? There are several normality tests: Anderson-Darling, Lin-Mudholkar, Shapiro-Wilk, Chi square test, skewness and kurtosis tests and many others. Each normality test has different power in their ability to detect nonnormal distributions. Some of the tests are more powerful with small sample sizes and some are more powerful with large sample sizes. One must be extremely cautious in selecting a normality test because there is no one test which is most powerful in all cases. The Anderson-Darling test can be used to test most departures from normality and is a very powerful test for sample sizes between 6 and 20. It can be used for larger sample sizes but has a tendency to lose power. The test is very sensitive in detecting a kurtosis issue. The Lin-Mudholkar test is a very powerful test for sample sizes between 10 and 50. It can be used for smaller sample sizes but has a tendency to lose power. The test is particularly sensitive to detecting asymmetric (skewness) alternatives to normality. The Shapiro-Wilk test is a very effective procedure for evaluating the assumption of normality against a wide spectrum of nonnormal alternatives even if only a relatively small number of observations are collected. For example, if 20 observations are taken from a process that is actually exponentially distributed, the chances are about 80 out of 100 that the null hypothesis (The distribution is normal) will be rejected. The typical range for the Shapiro-Wilk test is 15 to 50 data points. The Chi square test of goodness of fit is a very useful procedure if the sample sizes are over thirty. Large sample sizes (usually greater than 50) are needed since the sample data is sorted into groups. The skewness and kurtosis tests are good tests if the sample sizes are large. The sample size should be more than 50 for the skewness and kurtosis tests to be effective since one or two extreme data points could cause these two tests to reject normality. Page 6 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. We are having an on-going discussion at our office about calculating process capability indices. Some of us believe that process capability indices should be calculated no matter what the situation is (i.e., in control, not in control, periods of time when in control, etc.) and for both product characteristics and process variables, and some of us believe that process capability indices should only be calculated when appropriate and only for product characteristics. Which group is correct? WOW! This question should be a topic/lecture within any course on process capability. Unfortunately it is not. Calculating capability indices (such as the typical ones: Cp, Cpk, Pp and Ppk) are not appropriate or statistically correct for all situations. The intent of process capability indices is to provide a measure of how well the product from the process is meeting the expectations of the customer. First of all, one must remember that the typical capability indices should not be calculated for processes that do not meet the basic assumptions (I know all statistical software packages will make the calculations but that does not mean that the values calculated are meaningful). These basic assumptions are: (1) the process is in a state of statistical control (2) the measurements follow a normal distribution (3) the measurements are independent of each other Violation of any or all of these assumptions will cause the estimated capability value to be erroneous (i.e., inflated or understated) which could lead to wrong decisions being made about the performance of the process and cause actions to be taken when they should not be or cause actions not be taken when they should be. For measurements from a process that behave with a known non random pattern, like a downward trend or an upward trend, you can see we violate all three assumptions. There may be some type of linear mathematical model that the measurements follow that will allow us to predict what the next measurement may be but it is not the Shewhart model. Second, the concept of capability is more for product characteristics then for process variables. Product characteristics usually have targets and tolerances that represent the customers’ window of acceptability for their usage of the product. Process variables may have targets and sometimes tolerances but these tolerances usually represent the engineering window and are used to help minimize product unacceptability. The more important concept here is to keep the process variables in control and use them as “knobs” for adjustments in order to keep the process output in control and capable (that is, meeting the expectations of the customer). Third, when dealing with capability (that is, using capability indices to estimate capability) we want to assure ourselves that the output from the process is meeting the expectations of the customer and that the process variables are our way of managing the process output. Thus we are more concern that we know when the process output goes out of control (or is going out of control) Page 7 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter so that we can detect the out of control quickly, know which process variable needs to be adjusted in order to bring the output back to meeting the customer’s expectations, know how much we need to adjust the process variable and verify that the adjustment to the process variable was effective in resolving the out of control or out of specification issue. One must keep in mind that capability indices are probably the most over abused statistics in the field of quality. We have a tendency to put too much emphasis on their value. It is similar to the correlation coefficient in regression. Just keep in mind, your organization is not in the business of selling capability indices to your customer but selling multiple highly reliable parts that meet the expectations of the customer each and every time they are used. Page 8 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. How often should you re-calculate process capability indices for a process? If the output characteristic of a process is in control and the Cpk values will not change significantly from one set of calculations to the next set. There probably is no need to calculate Cpk daily, weekly or monthly, just monitor the control chart. If the output characteristic of a process is not in control (shifts and drifts) and then each time a Cpk value is calculated it will probably be different than the previous calculation. There is a need to calculate Cpk on a regular basis do to the special causes. Care must be taken in interpreting the value since one of the major assumptions has been violated. Page 9 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What are cusum control charts? The cumulative sum (Cusum) control chart is used primarily to maintain control of a process at some specified goal or target. The concept was developed by E. S. Page in 1954. The basic functions of Cusum control charts are similar to those for Shewhart control charts except that Cusum control charts are more sensitive to small changes in the process performance. The distinguishing feature of Cusum control charts is that each plotted point also contains information from all previous observations. The process performance of a particular quality characteristic is measured by cumulating the differences between a particular statistic, Q, and a given target value, T. The statistic, Q, can be X , X, R, s, p, c, etc. The cumulative sum (Sn) is equal to the following expression: Sn = (Q-T) The Cusum technique derives its name from the fact that it accumulates successive deviations of the process characteristic from a target value. The major advantages of Cusum control charts are: - Cusum control charts are good for detecting small changes in the performance of the process quickly. For changes between 0.5 s / n and 2.0 s / n , the Cusum control charts generally detect the change 50% faster than Shewhart control charts. - Cusum control charts use all the observations to detect whether a change in the process performance has occurred or not. Ordinarily Shewhart control charts only use the current group of observations to detect if a change in the process performance has occurred or not. Page 10 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What are target or nominal control charts? Target or nominal control charts for variable data can be used when monitoring the behavior of a single quality characteristic produced by a process running different parts. A quality characteristic may be shared by many different parts. The characteristic may have different target values depending upon the part being monitored. This is of particular value in short-run or process-control situations. Target charts across these parts are based on constructing centerlines and control limits with transformed data. Before control limits are calculated, each measurement is normalized (coded) by subtracting a target value from the measured value. Target charts are simply standard control charts as described but using transformed data. The typical transformation is the deviation from target. Target or nominal control charts are used: To To To To support process-oriented SPC rather than a part-by-part SPC. better display, statistically control, and improve a family of parts. better display, statistically control, and improve a process. reduce the number of control charts needed. Page 11 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Can you please give me a good explanation of where the “magical” number 5.15, used as a multiplier of the test error in a gage capability analysis, came from? I know it is suppose to represent 99% of the area under a normal curve but this is not the value I get when I look up the z value for 99% in my favorite normal distribution table. You are right the 5.15 value represents an area of 99%. The value 5.15 represents the width of the 99% probability band in z units. I suspect you get z = 2.326 from your table. This value of 2.326 represents 99% of the area under the normal curve to the left of the this value or 1% to the right of this value. If we knew the true value of a particular part and measured it several times with a non perfect measuring system (a measuring system that has test error greater than zero), we would get some measured values higher than the true value and some measured values lower than the true value – basically, the distribution of multiple measures on the same part will follow a normal distribution. Therefore, we want the confidence band to be symmetrical around the true value – that is, we want 99% to be distributed equally on both sides of the true value. This means we want 1% outside of the symmetrical band or 0.5% in each of the two tails. Therefore, when we use a normal distribution table, we want to find the z value for either 0.5% or 99.5%. If we look up 99.5%, we find that the z value equals 2.575. Since we want our band around the true value to represent an area of 99%, we need to double this value. If we do, we get 5.15 – the width of the 99% symmetrical band around the true value in z units. I am not sure why 99% was originally chosen but it has been the standard multiplying factor or several decades, ever since General Motors published the first discussion on gage capability analysis. Today, many organizations are using the multiplier 6 (99.7% probability band) rather than 5.15. Actually the AIAG MSA manual (3rd edition) gives you the option of using 5.15 or 6 as the multiplier. The following table shows what the multiplying factor would be for different probability values symmetrical around a true value. z Value Area Left -1.645 -1.960 -2.250 -2.575 -3.000 0.0500 0.0250 0.0122 0.0050 0.0013 to z Value 1.645 1.960 2.250 2.575 3.000 Area Right 0.9500 0.9750 0.9878 0.9950 0.9987 to Percent Between z Values 90.0 95.0 97.6 99.0 99.7 Width of Confidence Band 3.29 3.92 4.50 5.15 6.00 It should be noted that awareness of which multiplying factor is being used is critical to the integrity of the conclusions from a gage capability study. This is especially important if a comparison is to be made between two or more organizations on the same type of measurement system. Page 12 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is the difference between key quality characteristics and key process variables? Key quality characteristics are characteristics of the product or service produced by a process that customers have determined to be important to them. Key quality characteristics are such things as the speed of delivery of a service, the finish on a set of stainless steel shelves, the width of the table top, the precision with which an electronic component is calibrated, or the effectiveness of an administrative response to a tasking by higher authority. Every product or service has multiple key quality characteristics. These key quality characteristics need to be measured and monitored. When you are selecting processes to improve, you need to find out the processes, or process steps, that produce the characteristics your customers perceive as important to product quality. Key process variables are variables that effect the performance of the process and thus the quality of the product. Key process variables are such things as line speed, oven temperature, reaction time, pressure, water pH, etc. Key process variables need to be measured and monitored. When you are selecting key process variables to measure and control, you need to identify those that have an influence on the key quality characteristics. Page 13 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I was recently reading an article on continuous improvement and the author mentioned a technique called EVOP. The article provided no information on the technique. Do you know what the technique is and could you provide me with a brief description of the technique. Evolutionary operation (EVOP), which was introduced by G. E. P. Box in 1957, is a technique that can be used to facilitate continuous improvement. Realizing the problem with large scale experimentation, the failure of the pilot plant scale up process and the apparent complexity of designed of experiments, Dr. Box developed EVOP. EVOP is used today by many chemical companies in their effort to increase their rate of process improvement with only small investments in money and manpower. EVOP forces the plant process to produce information about itself without upsetting production. EVOP is based upon experimental design concepts but is used differently than the classical design of experiments. With EVOP only very small changes are made to the settings of the process variables so that the process is not disrupted and thus there generally is no increase in the percentage of nonconforming units. But there is a difficulty when we make small changes. The difficulty lies in the fact that there are uncontrollable sources of variation that will cause the observed results to vary and making it hard to see the effect of the variables in the study. EVOP overcomes this by taking advantage of large scale production quantities to build sample sizes large enough to overcome the problem of finding differences in the response variable. The effect of these small changes upon product performance is noted and the process is shifted to obtain product improvement. The procedure is then continued until the optimum settings for the variables under study are found. The following table summarizes the basic differences between design of experiments (DOX) and evolutionary operations (EVOP). DOX EVOP - requires many experimental runs. - requires few experimental runs. - large difference between settings of levels. - small difference between settings of levels. - finds best settings fast. - finds best settings slow. - usually disrupts production. - usually does not disrupt production. - produces information but product may not be sold. - produces information but product can be sold. - large differences need to - small differences can be Page 14 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter be seen before statistical significance can be declared. volumes. assessed as statistically different due to high production Page 15 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I need help with how to chart and evaluate data that we are collecting from a process in which the number of observations per sample can be highly variable. The variable that we track is width of the cut. We are using a Xbar and Range control chart and the number of cuts varies from 2 to 7. What would be the best way to manage these charts? For ease of chart maintenance we would prefer to have all the data on one chart, but if this is impractical then we could do one chart for each of the different sample sizes. You do not need to construct charts for each sample size. What you could do is calculate the control limits for each possible sample size (n=2,3,…7). You then would draw on the control chart the control limits for the sample size that appears the most often. On the side of the chart or on the back, you would list the control limits for all the possible sample sizes. If the actual sample size for the current sample is different than sample size used to construct the control limits displayed on the control chart, the individual who is plotting the data would then look at this list and make a decision (in control or out of control) against the appropriate control limits for his/her sample size. You should also consider have the individual record on the control the actual sample size for each sample. Page 16 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Realistically at some time, a control chart being used to monitor a key product characteristic or a key process variable will show a pattern that violates one of the many out of control rules. Does this mean that the process is no longer in control? Assessing if a process is in control or not is not as simple as the textbooks imply. There are many documented rules (about 12 of them) for defining when a process is statistically out of control. The interpretation of these rules still require that common sense be used in deciding what actions, if any, should be taken. When a process is in a state of statistical control, it is an indication that only random variation is present, that is, all identifiable special causes have been eliminated or corrected for. The presence of special causes in a process is not necessary a “bad” event. Being out of control is not always “bad”. Consider the case where an organization has been using a control chart to monitor the scrap rate of a particular process. The scrap rate has been in a state of statistical control at 15% for many months. For the last 8-9 weeks, the scrap rate has been steadily decreasing (exhibiting a downward trend on the control chart – a nonrandom pattern) and the pattern violates one or more of the standard out of control rules. Most managers and supervisors would be thrilled with this downward trend. Indeed, this process is out of control – that is, special causes are influencing the performance of the process. In this case (hopefully), the special causes are all the changes being made by the organization to improve the process. These changes should be known and documented. The organization would not want to undo the changes in order to bring the process back to a state of control at 15%. Defining processes as either in control or out of control is a dichotomous view of process control. Control is really a continuous characteristic that is concerned with reducing variation and preventing nonconformance over time. Therefore, "in control" implies more than simply meeting the "in control" rules associated with control charts. An operational definition that is often used is as follows: A process is said to be in control if the output from the process exhibits only random or common variations on the appropriate control chart and when nonrandom variations or unexpected variations are present, actions are taken to understand the source of the special causes and, when appropriate, actions are taken to eliminate the source of special causes in order to return the output of the process to the state of random variations. This operational definition requires knowing the cause and effect relationships (the relationships between input variables and output variables) that govern the process. When people understand cause and effect relationships of a process, they can quickly find and correct, when appropriate, whatever is making the process behave in a state of out of control. Process control is about recognizing when special causes are present, deciding if the effect due to the special cause is understood and desirable and taking action (following the reaction plan) to rectify the influence of the special causes when the effect of the special cause is determined to be unacceptable. In this situation we can say that the organization is controlling the process even though the process may not be in “pure” statistical control. Page 17 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Therefore, if our control chart shows an out of control situation, will we say the process is out of control? Yes. In this situation, will we immediately take action to rectify the out of control situation? No. If after investigation, we conclude that the process is out of control due to actions (special causes) that we have taken to improve the process, we will conclude that we are controlling the process. If after investigation, we conclude that the process is out of control due to special causes (no actions taken by us), we will conclude that we are not controlling the process. Walter Shewhart once said: “While every process displays variation, some processes display controlled variation, while others display uncontrolled variation.” Page 18 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I am a recent college graduate and I am currently working as a quality engineer. In school, I took one statistical quality control course. I am under the impression that a process in control is also what that is capable. Some of my co-workers are telling me that I am wrong. If there is a difference between a process in control and a capable process, please explain it to me. There is a big difference between a process being in control and a process being in control and capable. When a process is in a state of statistical control, it is an indication that only random variation is present, that is, all identifiable special causes have been eliminated or corrected for. An in control process is one that is predictable, that is, all expected future results will fall randomly between two limits based upon probability. My golf game is in control (unfortunately my golf scores are no where in the neighborhood of Tiger Woods) but not capable if I want to play on the PGA tour. A capable process is a process that is not only in statistical control but also a process where all the results are meeting the requirements. Since this type of process is in control this then means, as long as we keep it in control, all future results will fall randomly between the two probability limits and all future results will continue to meet the requirements. There are four situations when dealing with the concepts of control and capability: - a a a a process in control and capable process in control but not capable process not in control but capable process not in control and not capable To evaluate if a process is in control, control charts are used. To evaluate if a process is capable, after achieving control, capability indices are used. Page 19 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is EVOP? Design of experiments is a systematic technique in which process variables, known to effect the product, are varied over a wide range of operating levels. The objective is to determine the relationship between the response variable and a number of process variables. This is accomplished by making large changes to the settings of the process variables. However in doing so, the scrap rate or percentage of nonconforming items may increase significantly. This increase in scrap or nonconformance does not make the production group happy. Evolutionary operation (EVOP) is a technique that can be used to facilitate continuous improvement. It is based upon experimental design concepts but is used differently than the classical design of experiments. With EVOP only very small changes are made to the settings of the process variables so that the process is not disrupted and thus there generally is no increase in the percentage of nonconforming units. The basic principles of EVOP are: - make small changes to the process variables so that product quality is not endangered. - make changes in the process variables in a set pattern and repeat this pattern several times. - evaluate the effects of the small changes by grouping data and comparing the averaged results. - interpret the results for significance. - move the process settings (if the results are significant) in the direction with the best results. - repeat the procedure at the new settings. EVOP uses planned runs that are repeated over and over. An EVOP design generally involves two levels of each variable being evaluated and a center point. The center point denotes the reference condition. At the beginning of the process study, this reference condition represents the current process settings for the variables being evaluated. When two independent variables are being evaluated at two levels per variable, there are five experimental conditions: - Reference settings for variable A and variable B - Variable A low and variable B low - Variable A high and variable B high Page 20 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter - Variable A high and variable B low - Variable A low and variable B high For those processes for which EVOP is applicable, EVOP can lead to some important benefits: - product improvement. - better understanding of the process - an increased awareness of the process - a sense of involvement with process performance by operating personnel A minor disadvantage of EVOP is that its implementation costs time and money in training personnel, keeping and analyzing simple data, making process changes, the number of repeats needed to see a significant change, etc. Page 21 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I am new to the field of quality and would like to know what is the difference between ANOVA and design of experiments? Design of experiments is an advanced statistical tool for planning experiments while ANOVA (analysis of variance) is a statistical test used to test the null hypothesis. In basic statistics, this is similar to hypothesis testing and the t test. Hypothesis testing is a simple statistical tool while the t test is one of many statistical tests used to test a null hypothesis. Design of experiments is a systematic approach to sort out the important variables or combination of variables that influence a system. This technique allows several variables to be evaluated at the same time. The process is defined as some combination of machines, materials, methods, measurements, people and environment which used together form a service, produce a product or complete a task. Thus designed experiments is a scientific method which allows the experimenter to better understand a process or system and how the inputs affect the outputs or responses. Analysis of variance (ANOVA), an advanced statistical test, is used to determine whether or not significant differences exist between several means. Basically, the analysis of variance technique extends the concept of the t test but holds the pre-selected level of significance constant. The use of ANOVA results in an ANOVA table. The analysis of variance technique is based on two principles: - Partitioning the total variability of the process into its components (the process variables selected for the study). - Estimating the inherent variability of the population by two methods and comparing the two estimates. If the two estimates are close, then there is no significant difference in the means. If the two estimates are not close, then there is a significant difference in the means. Page 22 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. As a follow-up to the above question. What is an ANOVA table? The ANOVA table is output from a analysis of variance calculation used in design of experiments and regression. This table shows the source of variation, the sum of squares (SS), the degrees of freedom (df), the mean squares (MS), the F ratio and the significance level. An example of a very simple ANOVA table is illustrated below. This ANOVA table is for a completely randomized design of experiment. Source of Variation Between Groups Within Groups Total Sum of Squares 70 54 124 degrees of freedom 3 16 19 Mean Square 23.3 3.4 F Ratio 6.9 Level of Significance 0.003 The Source of Variation column lists the components of the total variation. The sources of variation for this simple case are between the groups and within the groups. The term within groups is sometimes referred to as the error of the experiment. The Sum of Squares (SS) column shows the sum of squares for each component of variation and the total variation. A sum of square represents the deviations of a random variable from its mean. The sum of squares for between groups represents the variability between all the groups in the study. The sum of squares for within represents the averaged variability within each of the groups in the study. The total sum of squares represents the variability among all the observations. The degrees of freedom (df) column shows the degrees of freedom for each component of variability and the total variability. Degrees of freedom represent the maximum number of measurable characteristics that can freely be calculated before the rest of the characteristics are completely determined. The Mean Squares (MS) column shows the estimated variances for each component of variability. A mean square is an unbiased estimate of a population variance and is determined by dividing a sum of squares by its degrees of freedom. The F Ratio column shows the ratio between a mean square for a component and a mean square for error. If this value is bigger than a critical F value then the null hypothesis is not accepted. The Level of Significance (sometimes called the p value) column shows the exact probability of obtaining the F Ratio value. If the calculated p value is less than the chosen significance level (the alpha value or Type I error), then the null hypothesis is not accepted. On the other hand if the calculated p value is greater than the chosen alpha value then the null hypothesis is not rejected. Page 23 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. When evaluating the results of a process capability study, it is recommended that process capability indices be calculated. Two such indices are the Cp and the Cpk. Why is it necessary to calculate both of these indices and what is the difference between the two indices? A. Process capability indices, such as Cp and Cpk, measure the performance of a process to meet the customer’s requirements. These two indices are concerned with only two characteristics of the process - location and dispersion. The relationship between process and customer specifications can be summarized by two questions: Is there enough room within the specification limits for the process to operate? Is the process properly located to take advantage of what room there is within the specification limits? The first question is answered by calculating Cp and the second question is answered by calculating Cpk. Cp is the process potential index and measures a process’s potential capability, which is defined as the allowable spread over the observed spread of the process. The allowable spread is the difference between the upper and lower specification limits given by the customer. The observed spread is determined from data gathered from the actual process by estimating the standard deviation of process and multiplying this estimate by 6. The general formal is given by: Cp USL LSL 6S As the standard deviation increases in the process, the C p decreases in value. As the standard deviation decreases, the Cp increases in value. By convention, when a process has a C p value less than 1.0, it is considered potentially incapable of meeting the customer specification requirements. Ideally, the Cp should be as high as possible. The higher the C p, the lower the variability with respect to the customer specification limits. However, a high Cp value does not guarantee a production process falls within the customer specification limits because the Cp value does not imply that the observed spread of the process is centered within the allowable spread. This is why the C p is called the process potential. The process capability index, Cpk, measures a process’s ability to produce product with the customer specification limits. Cpk represents the difference between the observed process average and the closest specification limit over three times the estimated process standard deviation. The general formal is given by: USL X X LSL Cpk min , 3S 3S Page 24 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter By convention, when Cpk is less than 1.0, the process is considered not capable. When C pk is equal to 1.0 or greater than 1.0, the process is considered capable of producing product within the customer specification limits. The Cpk is inversely proportional to the process standard deviation. The higher the C pk, the narrower the observed process distribution as compared to the customer specification limits and the more uniform the product. As the process standard deviation increases, the C pk index decreases. At the same time, the potential to produce product outside the customer specification limits increases. The Cpk index can never be greater than the Cp, only equal to it. This happens when the observed process average falls in the middle of the specification limits. The C pk index will equal 0.0 when the observed process average equals one of the specification limits. The C pk index can be negative if the observed process average is outside one of the specification limits. The need to calculate both indices helps in deciding what the issue is (location vs. variability) if the process is found not to be capable. If C p 1.0 and Cpk < 1.0, then the issue is location related. If Cp < 1.0 and Cpk = Cp, then the issue is variability related. If C p < 1.0 and Cpk < 1.0, then the issue is location and/or variability related. Page 25 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. In recent months, I have seen in print and have heard at various conferences and seminars, the phrase process management. I think I know what this mean but could you tell me more about what process management is? To achieve process excellence, an organization must manage their key processes. Process excellence implies that waste is minimized and variability is reduced. Minimization of waste brings about more efficient application of resources, raw materials and time. Reduced variability brings about process consistency and improved capability. So what is required to manage a process. The organization must: - satisfy the needs and wants of the external customer and the internal customer - produce and deliver acceptable goods and services on time - understand the capabilities of each activity within the process to produce acceptable goods and services - identify in a timely fashion any changes in the process so that they can be properly managed (if change is positive) or corrected (if change is negative) before the process goes out of control and/or begins to produce unacceptable goods - detect unacceptable goods or services resulting from activities that come from processes that are incapable of producing acceptable goods or services - ensure that new people are trained before they become involved in the process and to provide refresher training, when appropriate, to assure that the people continue to perform as expected - report all unacceptable findings to the appropriate people - define the root causes of problems and initiate a process to eliminate them - obtain customer feedback that defines process problems so that the process can be improved - develop an on-going feedback system to the suppliers of the organization (within and outside the organization) that measures process performance This is what people are calling process management. Some of the key elements of a process management oriented company include: Process ownership: The process owner is responsible for the process performance, maintenance, improvements, and other aspects of its health; Page 26 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter roles and responsibilities and accountabilities are well defined. Performance measures: Performance of the overall process is measured, planned, and linked to process changes; compensation is often linked to it as well. Strategy: The process is part of the overall strategic planning (not the result of it). Process management involves the design, improvement, monitoring, and maintenance of an organization's most important processes in order to bring them up to the highest levels of excellence. In Process Management the goal is usually to maximize profits, have high levels of customer satisfaction, and achieve long-term business stability. Page 27 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. At the last ASQ dinner meeting, someone at the table was talking about data mining. What is data mining? Generally, data mining is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. The concept of data mining is not new but it is the latest buzzword especially in the database industry. To use a simple analogy, it's finding the proverbial needle in the haystack. In this case, the needle is that single piece of knowledge your business needs and the haystack is the large data file you've built up over a long period of time. Through the use of automated statistical analysis techniques, organizations are discovering trends and patterns in the large dataset that previously went unnoticed. Regression is the oldest and most well-known statistical technique that the data mining community utilizes. Basically, regression takes a numerical dataset and develops a mathematical model that fits the data. When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed model and you've got a prediction! The major limitation of this technique is that it only works well with continuous quantitative data (like weight, speed or age). If you're working with categorical data where order is not significant (like color, name or gender) you're better off choosing another technique. In this case, classification analysis may be a better technique to use. This technique is capable of processing a wider variety of data than regression and is growing in popularity. Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data. With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments. WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive data file. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identify new merchandising opportunities. The easiest way to use data mining is by using the new software packages. Data mining software is a growing field. Nearly every statistical software company (i.e., SAS, SPSS, JMP, etc.) have developed a data mining program. Page 28 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Page 29 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What are skewness and kurtosis? A fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis. Skewness and kurtosis are measures of the shape of the distribution. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. Skewness and kurtosis can be estimated for a dataset using EXCEL. EXCEL functions SKEW and KURT that are used like the functions AVERAGE and STDEV. Page 30 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Six Sigma is still a little foreign to me and I am trying to learn what I can. I am wondering if there is a formula I could use to determine what my parts per million goal would have to be if I wanted to improve the process 1 sigma? Six sigma technically means having no more than 3.4 defects per million opportunities in any process, product, or service not meeting the requirements of the customer. The number 3.4 is reached by assuming that the specification limits are not only 6 standard deviations away from the target, but that the process average may drift over the long term by as much as 1.5 standard deviations despite best efforts to control it. This results in a one-sided integration under the normal curve beyond 4.5 standard deviations, which produces an area of about 3.4 defects per million opportunities. In contrast, the old three sigma quality standard of 99.73% translates to 2,700 per million defects, assuming zero drift in the mean. A process operating in this mode will produce 1350 defects per opportunities beyond each specification limit (total would be 2700). For processes with a series of steps, the overall yield is the product of the yields of the different steps. For example, if we had a simple two step process where step #1 had a yield of 80% and step #2 had a yield of 90%, then the overall yield would be 0.8 x 0.9 = 0.72 = 72%. Note that the overall yield from processes involving a series of steps is always less than the yield of the step with the lowest yield. If three sigma quality levels (99.73% yield) are obtained from every step in a ten step process, the quality level at the end of the process will contain 26,674 defects per million! Considering that the complexity of modern processes is usually far greater than ten steps, it is easy to see that Six Sigma quality isn’t optional; it’s required if an organization is to remain viable. The following table shows the number of defects per million for various sigma values with the assumed 1.5 sigma shift. Sigma 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Number of Defects per million 500,000 308,300 158,650 67,000 22,700 6,220 1,350 233 32 3.4 Six Sigma isn't twice as good as three Sigma, it's almost 20,000 times better. This is because the relationship between 1, 2, 3, 4, 5, 6 sigma is not linear. Remember the area under the normal curve is also not linear. Page 31 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. When should I recalculate the control limits on my control chart? This is perhaps the most frequently asked question by people who use control charts. While there is no simple answer, there are some useful guidelines. It is obvious that the control limits need to revised/recalculated when the sample size has been changed or the specification target has been changed The primary guideline for computing and/or recomputing control limits is: The purpose of the control limits is to adequately reflect the voice of the process. Remember, control charts are intended as aids for making decisions, and as long as the limits appropriately reflect what the process can do, or can be made to do, then the control limits do not need to be revised. Therefore, the following questions are generally used to help determine when control limits should be revised. - Does the current data, on the control chart, display a distinctly different kind of behavior and/or pattern past data? - Is the reason for this change in behavior and/or pattern known? - Is the new process behavior and/or pattern desirable? - Is it intended and expected that the new behavior and/or pattern will continue? If the answer to all four questions is yes, then it is appropriate to revise the control limits based on data collected since the change in the process. If the answer to question 1 is no, then there is no need to revise the control limits. If the answer to question 2 is no, then one should look for the assignable/special cause instead of wondering if the control limits should have to be revised. If the answer to question 3 is no, then you should be working to remove the detrimental assignable/special cause instead of wondering if the control limits should have to be revised. If the answer to question 4 is no, then you should again be looking for the assignable/special cause instead of wondering if the control limits should have to be revised. The objective is to discover what the process can do, or can be made to do. Page 32 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. The new ISO 9001:2000 standard stresses continual improvement. Recently, our third party auditor was doing a surveillance audit and implied we were not involved in continual improvement activities. What is your interpretation of what is meant by continual improvement in the ISO 9001:2000 standard? Many quality system standards include direct and indirect reference to continual improvement. For example, clause 8.5.1 in ISO 9001:2000, clause 8.5.1 in ISO/TS 16949 and clause 4.2.5 in QS 9000:1998. In general, the primary requirement in these standards refers to continually improving the effectiveness of the quality management system. In my opinion this appears to be a very narrow interpretation. Continual improvement should also include the entire organization’s effectiveness if an organization is to improve customer satisfaction, gain a long term competitive advantage and improve overall process performance. Continual improvement should be viewed as a type of change that is focused on increasing the effectiveness and/or efficiency of the entire organization to fulfill its policy and objectives. It should not be limited to just quality initiatives. Improvement in business strategy, business results, customer, employee and supplier relationships can be subject to continual improvement. Continual improvement should focus on enablers such as leadership, communication, resources, organization architecture, people and processes. Continual improvement should also lead to better results such as price, cost, productivity, time to market, delivery, responsiveness, profit and customer and employee satisfaction. Page 33 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is a Weibull distribution and what is it used for? The Weibull distribution is another parametric probability distribution. It is a member of a special class of parametric distributions known as location-scale distributions. The Weibull distribution is characterized by its shape and scale parameters. By changing the shape parameter, the Weibull distribution can be made to have many different shapes, from highly skewed like an exponential distribution to nearly bell-shaped like a normal distribution. The Weibull distribution was named after its inventor, Waloddi Weibull of Sweden in 1939. While the normal distribution is described by two parameters and , the Weibull distribution requires three parameters to define a particular Weibull. These three parameters are the scale parameter , the shape parameter and the location parameter . The following is the density function for the Weibull: β(t γ) β1 f(t) e α (t γ)β α The parameter is also known as the Weibull slope and is a positive number. The parameter is also a positive number. In addition, is called the characteristic life since it represents the 63.2th percentile of the distribution. The Weibull distribution is used in many statistical data analyses and reliability analyses. It is primarily used as a distribution of strength of certain materials. Page 34 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q: How will lack of normality affect my Cpk statistics for a process? Will the calculated statistics be higher or lower than they should be? There is no one answer to this question. It depends on how different from normal the distribution is, whether the distribution is skewed to one side, how close to a specification the distribution is, and other considerations. Even moderate departures from normality that affect the tails of the process distribution may severely impact the validity of the process capability calculations. The construction and interpretation of process capability statistics are based on the process being distributed as a normal distribution. For a normal distribution, approximately 99.73% of the observations should fall within 3 standard deviations (s) above the mean and 3 standard deviations below the mean. The Cp statistic is designed to be equal to 1.0 when the process spread (± 3s) is the same as the specification width. With a Cp equal to 1.0 for a normally distributed and centered process, we would expect about 0.27% of the output (2700 parts per million) to be beyond the specification limits. The Cp statistic assumes that the process is centered, which may not be true. Therefore the Cpk statistic is typically reported. The Cpk statistic should be equal to 1 when the ± 3s process spread coincides with one or both of the specification limits. With a Cpk equal to 1.0 for a normally distributed process, we would expect about 0.27% or less of the output to be beyond the specification limits. When the distribution differs significantly from a normal distribution, the calculated process capability indices will probably be incorrect. There may also be significant discrepancies between the predicted (theoretical) proportion and actual proportion of the process output that are beyond the specification limits. This is because the theoretical percentages are calculated based on the tail probabilities for a normal distribution. Page 35 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I have been using xbar control charts for several years and recently was told that I am making a mistake by not having a range control chart or a standard deviation chart with the xbar control chart. Is this true and if so why? Yes, it is true you are making a mistake. Control charts are used to check for process stability. In this context, a process is said to be in statistical control if the probability distribution representing the quality characteristic is constant over time. If there is some change over time in this distribution, the process is said to be out of control. When dealing with a quality characteristic that is variable, it is standard practice to control both the mean value of the quality characteristic and its variability. Control of the process average is usually done using xbar control charts. The xbar control chart shows how uniform or consistent the averages are between the samples. Process variability or dispersion can be controlled with either a control chart for the standard deviation or a control chart for the range. Xbar control charts show how the average performance varies between samples with respect to time. Range control charts or standard deviation control charts show how uniform or consistent the individual values within the sample are. Therefore you need a control chart to monitor the average performance and another control chart to monitor the within sample variability. Page 36 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is the best way to summarize the performance of a process – using control charts, using histograms or calculating capability indices? I do not think that any one of these methods is better than another for summarizing the performance of a process. Control charts allow the user to determine if the process from which the data was collected is in a state of statistical control. Histograms allow the user to determine if the data follows a normal distribution or not. In addition, histograms can be used to determine what percent of the data is in or out of specification. Capability indices allow the user to determine if the process from which the data was collected is capable of meeting the customer’s requirements or not. All three of these methods have their strengths and weaknesses. Control charts can not help the user determine if the underlying distribution is normal or not and if the process is capable or not. Histograms cannot help the user determine if the process is in control or not. Capability indices cannot help the user determine if the underlying distribution is normal or not and if the process is in control or not. In the long run, the best thing to do is use all three methods. This will permit you to make a more intelligent decision about the process in the long run. Page 37 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Recently, I have been hearing people talking about visual controls in a manufacturing environment. My readings indicate that it is a tool of lean manufacturing. Can you tell me what visual controls are? A. Visual controls are indeed associated with the methodology of lean manufacturing but they can be used in many other applications, not just manufacturing. The intent of a visual controls is that the whole workplace is set-up with signs, labels, color-coded markings, etc. such that anyone unfamiliar with the process can, in a matter of minutes, know what is going on, understand the process, and know what is being done correctly and what is out of place. There are two types of application in visual factory: displays and controls. - A visual display relates information and data to employees in the area. For example, charts showing the monthly revenues of the company or a graphic depicting a certain type of quality issue that group members should be aware of. - A visual control is intended to actually control or guide the action of the group members. Examples of controls are readily apparent in society: stop signs, handicap parking signs, no smoking signs, etc. The most important benefit of visual controls is that it shows when something is out of place or missing or not working correctly. Visual controls help keep things running as efficiently as they were designed to run. The efficient design of the production process that results from lean manufacturing application carries with it a set of assumptions. The process will be as successful as it was designed to be as long as the assumptions hold true. A factory with expansive visual control applications will allow employees to immediately know when one of the assumptions has not held true. Visual controls can also help prevent mistakes. Color coding is a form of visual display often used to prevent errors. Shaded "pie slices" on a dial gauge tell the viewer instantly when the needle is out of the safe range. Matching color marks is another approach that can help people use the right tool or assemble the right part. Examples of visual controls include, but are not limited to the following: - color-coded pipes and wires painted floor areas for good stock, scrap, trash, etc. shadow boards for parts and tools indicator lights workgroup display boards with charts, metrics, procedures, etc. production status boards direction of flow indicators Page 38 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Several months ago, the Evansville-Owensboro newsletter mentioned in one of their articles the Box-Cox transformation as a way to transform data in order to convert non normal data to data from a normal distribution. The article was vague on describing the technique, what exactly is the Box-Cox transformation? A. Certain assumptions about the distributions of the populations are necessary for most statistical procedures to be valid. One such assumption is that one or more populations are normally distributed. When this assumption is violated then using the statistical procedure that requires normality is not valid. It sometimes happens that applying the appropriate transformation to the original data , will more nearly satisfy the assumption of normality. The success in finding a good transformation depends in part on the experience one has in the particular field that the data came from. One very useful transformation is the Box-Cox transformation. The Box-Cox transformation is a family of transformations and is a power law transformation. The general formal is as follows: y x 1 where: x is the original data y is the data after transforming is a number usually between -1 and 1 The task becomes selecting the appropriate value for so that the transformed data is now normally distributed (verified by using some type of test for normality, such as the Shapiro Wilk or Anderson-Darling or others). It should be noted that if = 0, the transformation is simply the natural log of the original data. If = 1.0, then no transformation is needed and if = -1.0, then the transformation is the reciprocal of the original data. If = 0.5, the transformation is the square root of the original data. Page 39 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. How do you calculate Cpk if the customer gives you a target value of 0 with no tolerances. In addition, all the test data is positive (i.e., conductivity in a bath, the target is 0)? A. The general formula for calculating Cpk is the following: USL x x LSL Cpk min , 3s 3s where USL is the upper spec and LSL is the lower spec and x is the mean of the observed data and s is the standard deviation of the observed data. The USL and LSL are given to the organization by the customer and x and s are statistics calculated from the observed data. As you can see from the above formula, the target is not used to calculate the Cpk value. If the customer only provides a target (i.e., target is 0) then Cpk cannot be calculated. If the customer provides USL (i.e., USL = 4) and a target value (i.e., target is 0) then the following formula is used to calculate Cpk: USL x Cpk 3s For the above example, the organization wants the mean ( x ) of the process to be very close to zero with very few observed results close to zero. This implies that the statistical distribution of the process will probably not be normal distribution but a very skewed distribution. For this example, some type of transformation would have to be found and used to convert the skewed distribution to a normal distribution. Page 40 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is measurement system discrimination and when is it a concern? Discrimination, with respect to a measuring system, is the ability of the measuring system to detect small changes in the response being measured. A lack of discrimination exists when the measured response may be grouped into few data categories. If the measuring system does not have a minimum number of data categories, the gage does not have enough discrimination hence the gage will not be able to monitor and evaluate the process of interest. If this is the case, an alternative measuring system needs to be used with the desired discrimination required by the using organization. One way to evaluate the discrimination of a measuring system is to use a range control chart. If the range chart shows four or fewer values, it can be concluded that the measuring system has inadequate discrimination. For a measuring system to have adequate discrimination, the range chart must show five or more values. Page 41 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is the difference between a dependent and independent variable? With respect to regression analysis and design of experiments, variables can be classified into two categories: (1) dependent and (2) independent. The dependent variable is the response variable, that is, the variable that is used to assess the results process. They represent the measurable outcomes from the study such as product yield, product strength, failure rate, coating thickness, temperature, tensile strength, etc. The independent variable is the variable which might influence the dependent variable. It is important to understand the role each independent variable has on the dependent variable. Independent variables are variables that can be changed (knowingly or unknowingly) over time. As these independent variables change, the organization must have an understanding on how these changes effect the process and product. Page 42 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. How do I generate a list of variables for an experiment? The typical method used is brainstorming. This brainstorming session should involve process experts, process technicians and process operators. These are the people who have experience and knowledge of the process under investigation. After a brainstormed list has been create then, the variables need to be prioritized as to importance to the response variable. You should not try and run an experiment with more than 4 or 5 variables. Managing more than five variables is usually very difficult for most organizations. Page 43 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Why, when calculating the standard deviation, do we divide by n-1 rather than n? The reason that n-1 is used instead of n in the formula for calculating the sample standard deviation is as follows: The sample variance (the square of the sample standard deviation) can be thought of as a random variable (i.e., a function which takes on different values for different samples from the same population). Its use is as an estimate for the true variance of the population. In the real world, one typically does not know the true variance. We typically use the sample variance to estimate the true variance. Since the sample variance is a random variable, it usually has a mean or average value. One would hope that this average value is close to the actual value that the sample variance is estimating. In fact, if we use n-1 in the calculation of the sample variance, we do obtain an unbiased estimate of the true population variance. If we use n in the calculation, we obtain a biased estimate of the true population variance. In general, when using n the sample variance is (n-1)/n times as large as the true variance. This can be illustrated by using EXCEL and normal distribution function. Randomly have EXCEL draw 100 sets of data in groups of 5 from a normal distribution with a known mean and variance. Calculate the mean, the variance using n-1 in the formula and the variance using n in the formula for each subgroup of 5. Calculate the mean variance of the 100 estimates and see which method (n-1 or n) comes the closer to the true variance you used. Page 44 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What are reaction plans and why should I use them with my control charts? If the control chart indicates that the process is out of control, then the user of the control chart must take action to bring the process back to a state of statistical control as quickly as possible. In order to take action, the user must be given some guidelines or directions. These guidelines or directions generally appear on a reaction plan. These plans are developed by the engineering, quality and manufacturing groups. The reaction plan is a written document linking specific actions that should be taken if out of control conditions appear on the control chart. The reaction plan should indicate: - Criteria to determine when action is needed. - Possible actions to take. - Information to be recorded. - Responsibilities for various actions. Possible action to be taken could include the following: - Taking a second set of samples. - Checking the equipment for a malfunction. - Adjusting the equipment. - Calling the immediate supervisor. - Checking the measuring equipment. - Calling maintenance. - Stopping the process. The reaction plan should also discuss what is to be done with any items or parts found to be out of specification. Reaction plans help operators react quickly to out of specification or out of control conditions, help minimize downtime and minimize scrap. Page 45 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What is the difference between reaction plans and corrective-preventive actions? A reaction plan is a written document linking specific actions that should be taken if out of control conditions appear on the control chart or out of specification situations appear. The purpose of a reaction plan is to provide quick fixes to the operators of the processes for unacceptable conditions in order to minimize scrap or off spec materials and to restore flow as quickly as possible. Corrective-preventive actions represent a process to identify root causes of problems in order to minimize or eliminate existing non conformities, defects or undesirable situations in order to prevent recurrence. The purpose of corrective-preventive actions is to find the actual root cause and implement effective solutions following some type of problem solving model. In reality you need both methodologies. Reaction plans for the day to day management of processes and corrective-preventive actions for the long term management of processes. Page 46 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What effect does changing the subgroup sample size have on calculating the Cpk value? A. The answer is none. The general formula for calculating Cpk is the following: USL x x LSL Cpk min , 3s 3s where USL is the upper spec and LSL is the lower spec and x is the mean of the observed data and s is the standard deviation of the observed data. In this formula, s is an estimate of the process variability. The estimate of process variability represents the variability between the individual values and not the variability between the sample means. Page 47 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. If my process is in statistical control, should I expect the Cpk value to be the same value each and every time I sample the process? No. Capability indices, like the Cpk value, are statistics just like x and s. Statistics are not parameters but estimates of parameters obtained by taking samples from the population. Parameters are fixed constants of the population. Statistics are not constants but vary from one sampling to another sampling. The difference between sample statistics and population parameters is result of sampling error. Since every item in the population is not likely to be included in the sample, sample statistics are unlikely to equal the population parameters. If you take 50 items from the population (the population has 1000 items) and calculate the mean of the sample, you expect the sample mean to be close to the actual mean of the population. If you take a second sample of 50 items, you do not expect this sample mean to be equal to the first sample mean but very close. The same is true for the Cpk value which is based on a sample from the population with sample statistics x and s. You expect repetitive estimates, based on multiple samplings of the population, of Cpk to be similar but not exactly the same. Page 48 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Are there maximum values for Cp, Cpk, Pp and Ppk? No. As long as the specification range does not change and an organization continually reduces the process variation, the capability indices will increase. Typically, most processes seem to have capability indices that range between 0.8 and 5.0. I have seen a Cpk value as high as 54 – not sure why the organization was thrilled with it. Over the years, I have been known to say to senior management that if I was in charge and you spent money to drive your capability indices above 6, I would fire you. Reducing process variation is the name of game but you must also reap the benefits of doing so. A better strategy would be to have the customer tighten the specification range – your competitors may have trouble achieving the new expectation – therefore, you may get more orders and this is a good thing. Right? Page 49 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Why would I have a Cp and Cpk indices well over 1 when some of the observations in the data set are outside the customer specifications limits? Without seeing your control charts for this data, I will have to guess that your control chart for location (individuals or xbar control chart) probably has some points out of control, even though your range or moving range control chart has all the points within the control limits. Before you calculate capability indices, you must verify that all the basic assumptions have been satisfied. The two most important assumptions to verify are: (1) the process being evaluated should be predictable, that is, the process is in a state of statistical control and (2) the observed data points follow a normal distribution. Process capability software packages are nice but most of them assume that the assumptions are being met – they leave the verification to the user of the package. Page 50 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. How do I calculate capability indices with only an upper specification limit? Very carefully. The formulas for Cp, Cpk, Pp and Ppk require a value for both the lower and upper specification. When faced with a missing specification value, one could consider the following: 1. 2. 3. Not calculating the capability indices. Entering an arbitrary value for the missing specification. Do not calculate Cp or Pp and only calculate Cpk or Ppk for the specification value given. Let’s assume you are making a powdered material that has a moisture requirement that states no more than 0.5 is allowed. If the product has too much moisture, it will cause manufacturing problems for the customer. Let’s assume that the process is in statistical control and the data comes from a normal distribution. Let’s also assume that for the last 100 lots produced the process average has been 0.0025 with a process standard deviation of 0.15. If you select Option 1, the customer will probably not be happy that you are not calculating the capability indices. If you select Option 2, you will probably argue that the lower specification limit is zero since it is impossible to have a moisture value below zero. The calculated capability indices are Cp=0.55 and Cpk=0.006. Your customer will not be satisfied with these values since they are below 1.0. If you select Option 3, Cpk =1.10 and there is no value for Cp. As you improve the process, that is you continually reduce moisture, Cpk will continue to get lower. When you improve a process, the Cpk value should increase. Therefore, when you only have one specification, you should enter only that specification and treat the other specification as missing. Page 51 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Should I calculate my process capability indices if my process is not in a state of statistical control? You cannot properly evaluate the capability of a process without establishing process control. It is certainly possible to calculate capability indices when a process is not in control, but you might ask what value these indices provide. The AIAG Statistical Process Control reference manual states: “The process must first be brought into statistical control by detecting and acting upon special causes of variation. Then its performance is predictable and its capability to meet customer expectations can be assed. This is the basis for continual improvement.” It is hard to say that you should not calculate capability indices if the process is not in control because your customer may require you to calculate these indices. It is easier to say that the less predictable your process is, that is, the more out of control it is, the less meaningful are the capability values. If a process is not in control, then your estimate of the process mean or your estimate of the process variation or both may not be a good representation of the real process performance. Page 52 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What are control plans? Control plans are written descriptions of the system used for controlling product and processes. These plans are very similar to the phrase “quality plans” as used by Dr. Juran in his books. These plans address the important characteristics and engineering requirements of the product and the process used to make the product. These plans discuss how the manufacturing process is controlled, how incoming materials are controlled, how operators are trained, how the finished product is controlled, how the measuring devices are controlled, what corrective actions or reactions need to be taken and by whom when the process is not meeting it’s performance expectations. The format of these forms can taken on many styles. The automotive industry has their version of what the form should look like and so do many other industries. Items usually documented on the forms include: sample sizes, frequency of sampling and testing, type of charts to be used, identification of the type of measuring device to be used, specification or process limits, capability indices, work instructions to be followed, etc. Page 53 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. Do members really call you or write to you or do you make the questions up? This is a question that has been asked of me several times at dinner meetings. The answer is yes. Fortunately for me our members (and in many cases non members) do call or email me with questions. If I have to make the questions up, I would not do this column. I may modify the question for the column but I do not make them up. For your information, nearly 60% of the questions come to me by telephone and the other 40% of questions come from emails. About 5% of the time, the person asking the question has requested that I not print it. For your information, I have done this column since 1995 and the questions keep coming in. Page 54 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. What are the differences between the concepts of quality control, quality assurance, quality management, and quality planning? Quality management is all the activities of the overall management function that determine the quality policy, objectives, and responsibilities and implement them by means such as quality planning, quality control, quality assurance and quality improvement within the quality system. Quality planning is the activities that establish the objectives and requirements for quality and for the application of quality system elements. Quality planning covers product planning, managerial and operational planning, and the preparation of quality plans. Quality control is operational techniques and activities that are used to fulfill requirements for quality. It involves techniques that monitor a process and eliminate causes of unsatisfactory performance at all stages of the quality loop. Quality assurance is the planned and systematic activities implemented within the quality system and demonstrated as needed to provide adequate confidence that the organization is fulfilling the quality requirements. A quality system is the organizational structure, procedures, processes and resources needed to implement quality management. Page 55 of 56 Ask Mike Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter Q. I have heard a lot about ASQ’s new Living Community Model. What is it? At the February 2004 meeting, the ASQ Board of Directors approved Phase 1 implementation of the ASQ Living Community Model (LCM). This new membership model maintains the tradition values of ASQ membership, yet builds on it by offering a variety of new and enhanced member types and benefits suited to all interested in the practice and/or profession of quality. The Living Community Model approach advances ASQ’s Vision as the “community of choice for everyone who seeks quality technology, concepts or tools to improve themselves and their world.” The new membership plan enhances key strategic initiatives sought by current members and potential members. These include: proving the economic case for quality; enhancing the image of the quality professional and ASQ; enhanced activity on national issues, including Washington D.C., presence; growing new and diverse communities of practice; and providing more personalized member relationship management. “Historically, ASQ has taken a ‘one-size-fits-all’ approach to membership, a best practice in the association world for years. The Living Community Model provides value to individuals from all backgrounds and occupations who profess an interest in quality, offering them flexible choices of involvement and affiliation with the organization and the quality movement. The model is designed to appeal to current and prospective members with new and more diverse benefits and choices, multiple points of access, varying dues structures, and networking community options. After months of research and design, the Living Community Model’s membership categories were proposed and approved in November 2003 by the board. They are: Regular, Associate, Forum, Student, Organization, Corporate and Sponsor. The Living Community Model Phase 1 implementation primarily addresses four individual membership types—Regular Member, Associate Member, Forum Member, and Student Member. Benefits and dues for these categories would go into effect for new and renewing members beginning July 1, 2004. Dues associated with all levels of the membership will help fund new activities, such as delivering tools and materials to substantiate the impact of quality management in business improvement also known as the “Economic Case for Quality,” -- and a sustained, national and global effort to enhance the image, value and voice of quality professionals, practitioners, and ASQ. Image enhancement is expected to include several promotional activities as well as a media campaign. Page 56 of 56