AskMike -- summary

advertisement
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Page 1 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I have just done my first regression analysis. A colleague of mine indicates that my analysis
is incomplete since I did not test the raw data for normality. Do I need to test the x values
and the y values for normality?
The simple answer is NO. What you should be testing for normality are the residues associated with
the expected model. The general model for simple linear regression is the following expression:
y = a + bx + ε
where a is the intercept term, b is the slope term and ε is the random error.
There are four basic assumptions associated with regression analysis. These assumptions deal with
linearity, independence, constant variance and normality.
In other words:
1.
The mean residual values (one mean residual value per value of x) lie on a straight line or
the mean y values (one mean y value per value of x) lie on a straight line.
2.
The residual values are independent or the y values are independent.
3.
The subpopulations or residual values (one subpopulation per value of x) have the same
variance or the sub-populations of y values (one subpopulation per value of x) have the same
variance.
4.
For each value of x, the subpopulation of residual values is normally distributed or for each
value of x, the subpopulation of y values is normality distributed.
For simplicity it is usually assumed that errors have a normal distribution with mean zero and
variance s2. This means that if repeat measurements of y are taken for a particular value of x then
most of them are expected to fall close to the regression line and very few to fall a long way from
the line. The assumption of normality is checked using the residuals. You can check the assumption
of normality for all y values for a given x value but this will require multiple observed y values at
each x value (this also requires more work for the data analyst).
Page 2 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. When estimating the performance of a process using process capability indices, what is the
appropriate sample size for calculating valid Cp and Cpk values?
Process capability is the long-term performance level of the process after it has been brought
under statistical control. In other words, process capability is the range over which the natural
variation of the process occurs as determined by the system of common causes. The data chosen to
estimate the variability of the process should attempt to encompass all natural variations (i.e., raw
materials, time of day, changes in ambient conditions, people, etc.). For example, one
organization might report a very good process capability value using only ten samples produced on
one day, while another organization of the same commodity might report a somewhat lesser
process capability number using data from a longer period of time that more closely represents the
true process performance. If one were to compare these process index numbers when choosing a
supplier, the best supplier might not be chosen.
As a rule of thumb a minimum of 20 subgroups (of sample size, preferably of a least 4 or 5) should
be used to estimate the capability of a process. The number of samples used has a significant
influence on the accuracy of the Cpk estimate. For example, for a random sample of size n = 100
drawn from a known normal population of Cpk = 1, the Cpk estimate can vary from 0.85 to 1.15
(with 95 % confidence ). Therefore smaller samples will result in even larger variations of the Cpk
statistics.
Page 3 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is meant by validation?
ISO defines validation as confirming that a product or system appropriately meets the intended
use. Some people confused this with verification. Verification is confirming that a product or
system meets identified specifications. To some these two words mean the same but there is a
distinction – meets intended use vs. meets identified specifications. Verification and validation
work together as a sort of “before” (verification) and “after” (validation) proof. Verification
answers the question are we doing the job right while validation answers the question are we doing
the right job.
For example, you assemble bicycles for your customers. One of the key characteristics is how tight
the chain is. Your customers have a requirement of 70 lbs/ft ± 5 lbs/ft. After assembling the bike,
every chain is checked with a torque wrench per the plant’s procedure and all the results have
been in specification. This is verification. How do you know that the torque wrench is still
calibrated? This would be validation. You have data but how reliable is the data. You are checking
the chains with a torque wrench as the procedure indicates but the procedure should probably
indicate that the check should be done with a calibrated torque wrench.
Page 4 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. When reading books and articles on process capability, the authors often mention short and
long term capability. What is the difference between short term and long term capability?
The major difference between short term and long term capability is the type of variation being
estimated. Typically, short term variation is estimated by using the within sample estimates of
variation (i.e., the average range or average standard deviation within many subgroups taken over
time) while long term variation is estimated by using the between samples estimate of variation. If
the process is truly in control both of these estimates of variability will be statistically equivalent.
One must remember that the within sample variation for any subgroup represents variation of the
process for a very short time frame. The 3-5 samples in any one of the subgroups were taken
extremely close in time and thus probably only represent common cause variation – probably the
best the process can do. In contrast, especially if the process is not in control, the between sample
variation includes special causes (i.e., drifts, shifts, cycles, etc.) and thus will be larger than the
average within sample variation.
Page 5 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Which normality test is the best?
There are several normality tests: Anderson-Darling, Lin-Mudholkar, Shapiro-Wilk, Chi square test,
skewness and kurtosis tests and many others.
Each normality test has different power in their ability to detect nonnormal distributions. Some of
the tests are more powerful with small sample sizes and some are more powerful with large sample
sizes. One must be extremely cautious in selecting a normality test because there is no one test
which is most powerful in all cases.
The Anderson-Darling test can be used to test most departures from normality and is a very
powerful test for sample sizes between 6 and 20. It can be used for larger sample sizes but has a
tendency to lose power. The test is very sensitive in detecting a kurtosis issue.
The Lin-Mudholkar test is a very powerful test for sample sizes between 10 and 50. It can be used
for smaller sample sizes but has a tendency to lose power. The test is particularly sensitive to
detecting asymmetric (skewness) alternatives to normality.
The Shapiro-Wilk test is a very effective procedure for evaluating the assumption of normality
against a wide spectrum of nonnormal alternatives even if only a relatively small number of
observations are collected. For example, if 20 observations are taken from a process that is
actually exponentially distributed, the chances are about 80 out of 100 that the null hypothesis
(The distribution is normal) will be rejected. The typical range for the Shapiro-Wilk test is 15 to 50
data points.
The Chi square test of goodness of fit is a very useful procedure if the sample sizes are over thirty.
Large sample sizes (usually greater than 50) are needed since the sample data is sorted into
groups.
The skewness and kurtosis tests are good tests if the sample sizes are large. The sample size should
be more than 50 for the skewness and kurtosis tests to be effective since one or two extreme data
points could cause these two tests to reject normality.
Page 6 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. We are having an on-going discussion at our office about calculating process capability
indices. Some of us believe that process capability indices should be calculated no matter
what the situation is (i.e., in control, not in control, periods of time when in control, etc.)
and for both product characteristics and process variables, and some of us believe that
process capability indices should only be calculated when appropriate and only for product
characteristics. Which group is correct?
WOW! This question should be a topic/lecture within any course on process capability.
Unfortunately it is not. Calculating capability indices (such as the typical ones: Cp, Cpk, Pp and
Ppk) are not appropriate or statistically correct for all situations. The intent of process capability
indices is to provide a measure of how well the product from the process is meeting the
expectations of the customer.
First of all, one must remember that the typical capability indices should not be calculated for
processes that do not meet the basic assumptions (I know all statistical software packages will
make the calculations but that does not mean that the values calculated are meaningful). These
basic assumptions are:
(1) the process is in a state of statistical control
(2) the measurements follow a normal distribution
(3) the measurements are independent of each other
Violation of any or all of these assumptions will cause the estimated capability value to be
erroneous (i.e., inflated or understated) which could lead to wrong decisions being made about the
performance of the process and cause actions to be taken when they should not be or cause actions
not be taken when they should be.
For measurements from a process that behave with a known non random pattern, like a downward
trend or an upward trend, you can see we violate all three assumptions. There may be some type
of linear mathematical model that the measurements follow that will allow us to predict what the
next measurement may be but it is not the Shewhart model.
Second, the concept of capability is more for product characteristics then for process variables.
Product characteristics usually have targets and tolerances that represent the customers’ window
of acceptability for their usage of the product. Process variables may have targets and sometimes
tolerances but these tolerances usually represent the engineering window and are used to help
minimize product unacceptability. The more important concept here is to keep the process
variables in control and use them as “knobs” for adjustments in order to keep the process output in
control and capable (that is, meeting the expectations of the customer).
Third, when dealing with capability (that is, using capability indices to estimate capability) we
want to assure ourselves that the output from the process is meeting the expectations of the
customer and that the process variables are our way of managing the process output. Thus we are
more concern that we know when the process output goes out of control (or is going out of control)
Page 7 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
so that we can detect the out of control quickly, know which process variable needs to be adjusted
in order to bring the output back to meeting the customer’s expectations, know how much we need
to adjust the process variable and verify that the adjustment to the process variable was effective
in resolving the out of control or out of specification issue.
One must keep in mind that capability indices are probably the most over abused statistics in the
field of quality. We have a tendency to put too much emphasis on their value. It is similar to the
correlation coefficient in regression. Just keep in mind, your organization is not in the business of
selling capability indices to your customer but selling multiple highly reliable parts that meet the
expectations of the customer each and every time they are used.
Page 8 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. How often should you re-calculate process capability indices for a process?
If the output characteristic of a process is in control and the Cpk values will not change
significantly from one set of calculations to the next set. There probably is no need to calculate
Cpk daily, weekly or monthly, just monitor the control chart.
If the output characteristic of a process is not in control (shifts and drifts) and then each time a
Cpk value is calculated it will probably be different than the previous calculation. There is a need
to calculate Cpk on a regular basis do to the special causes. Care must be taken in interpreting the
value since one of the major assumptions has been violated.
Page 9 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What are cusum control charts?
The cumulative sum (Cusum) control chart is used primarily to maintain control of a process at some
specified goal or target. The concept was developed by E. S. Page in 1954. The basic functions of
Cusum control charts are similar to those for Shewhart control charts except that Cusum control
charts are more sensitive to small changes in the process performance.
The distinguishing feature of Cusum control charts is that each plotted point also contains
information from all previous observations. The process performance of a particular quality
characteristic is measured by cumulating the differences between a particular statistic, Q, and a
given target value, T. The statistic, Q, can be X , X, R, s, p, c, etc. The cumulative sum (Sn) is equal
to the following expression:
Sn = (Q-T)
The Cusum technique derives its name from the fact that it accumulates successive deviations of the
process characteristic from a target value.
The major advantages of Cusum control charts are:
- Cusum control charts are good for detecting small changes in the performance of
the process quickly. For changes between 0.5 s / n and 2.0 s / n , the Cusum
control charts generally detect the change 50% faster than Shewhart control
charts.
- Cusum control charts use all the observations to detect whether a change in the
process performance has occurred or not. Ordinarily Shewhart control charts
only use the current group of observations to detect if a change in the process
performance has occurred or not.
Page 10 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What are target or nominal control charts?
Target or nominal control charts for variable data can be used when monitoring the behavior of a
single quality characteristic produced by a process running different parts. A quality characteristic
may be shared by many different parts. The characteristic
may have different target values depending upon the part being monitored. This is of particular
value in short-run or process-control situations. Target charts across these parts are based on
constructing centerlines and control limits with transformed data. Before control limits are
calculated, each measurement is normalized (coded) by subtracting a target value from the
measured value.
Target charts are simply standard control charts as described but using transformed data. The
typical transformation is the deviation from target.
Target or nominal control charts are used:




To
To
To
To
support process-oriented SPC rather than a part-by-part SPC.
better display, statistically control, and improve a family of parts.
better display, statistically control, and improve a process.
reduce the number of control charts needed.
Page 11 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Can you please give me a good explanation of where the “magical” number 5.15, used as a
multiplier of the test error in a gage capability analysis, came from? I know it is suppose to
represent 99% of the area under a normal curve but this is not the value I get when I look
up the z value for 99% in my favorite normal distribution table.
You are right the 5.15 value represents an area of 99%. The value 5.15 represents the width of the
99% probability band in z units. I suspect you get z = 2.326 from your table. This value of 2.326
represents 99% of the area under the normal curve to the left of the this value or 1% to the right of
this value. If we knew the true value of a particular part and measured it several times with a non
perfect measuring system (a measuring system that has test error greater than zero), we would get
some measured values higher than the true value and some measured values lower than the true
value – basically, the distribution of multiple measures on the same part will follow a normal
distribution. Therefore, we want the confidence band to be symmetrical around the true value –
that is, we want 99% to be distributed equally on both sides of the true value. This means we want
1% outside of the symmetrical band or 0.5% in each of the two tails. Therefore, when we use a
normal distribution table, we want to find the z value for either 0.5% or 99.5%. If we look up 99.5%,
we find that the z value equals 2.575. Since we want our band around the true value to represent
an area of 99%, we need to double this value. If we do, we get 5.15 – the width of the 99%
symmetrical band around the true value in z units.
I am not sure why 99% was originally chosen but it has been the standard multiplying factor or
several decades, ever since General Motors published the first discussion on gage capability
analysis. Today, many organizations are using the multiplier 6 (99.7% probability band) rather than
5.15. Actually the AIAG MSA manual (3rd edition) gives you the option of using 5.15 or 6 as the
multiplier.
The following table shows what the multiplying factor would be for different probability values
symmetrical around a true value.
z
Value
Area
Left
-1.645
-1.960
-2.250
-2.575
-3.000
0.0500
0.0250
0.0122
0.0050
0.0013
to z Value
1.645
1.960
2.250
2.575
3.000
Area
Right
0.9500
0.9750
0.9878
0.9950
0.9987
to Percent
Between
z Values
90.0
95.0
97.6
99.0
99.7
Width of
Confidence
Band
3.29
3.92
4.50
5.15
6.00
It should be noted that awareness of which multiplying factor is being used is critical to the
integrity of the conclusions from a gage capability study. This is especially important if a
comparison is to be made between two or more organizations on the same type of measurement
system.
Page 12 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is the difference between key quality characteristics and key process variables?
Key quality characteristics are characteristics of the product or service produced by a process that
customers have determined to be important to them. Key quality characteristics are such things as
the speed of delivery of a service, the finish on a set of stainless steel shelves, the width of the
table top, the precision with which an electronic component is calibrated, or the effectiveness of
an administrative response to a tasking by higher authority. Every product or service has multiple
key quality characteristics. These key quality characteristics need to be measured and monitored.
When you are selecting processes to improve, you need to find out the processes, or process steps,
that produce the characteristics your customers perceive as important to product quality.
Key process variables are variables that effect the performance of the process and thus the quality
of the product. Key process variables are such things as line speed, oven temperature, reaction
time, pressure, water pH, etc. Key process variables need to be measured and monitored. When
you are selecting key process variables to measure and control, you need to identify those that
have an influence on the key quality characteristics.
Page 13 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I was recently reading an article on continuous improvement and the author mentioned a
technique called EVOP. The article provided no information on the technique. Do you know
what the technique is and could you provide me with a brief description of the technique.
Evolutionary operation (EVOP), which was introduced by G. E. P. Box in 1957, is a technique that
can be used to facilitate continuous improvement. Realizing the problem with large scale
experimentation, the failure of the pilot plant scale up process and the apparent complexity of
designed of experiments, Dr. Box developed EVOP. EVOP is used today by many chemical
companies in their effort to increase their rate of process improvement with only small
investments in money and manpower. EVOP forces the plant process to produce information about
itself without upsetting production.
EVOP is based upon experimental design concepts but is used differently than the classical design
of experiments. With EVOP only very small changes are made to the settings of the process
variables so that the process is not disrupted and thus there generally is no increase in the percentage of nonconforming units. But there is a difficulty when we make small changes. The difficulty
lies in the fact that there are uncontrollable sources of variation that will cause the observed
results to vary and making it hard to see the effect of the variables in the study. EVOP overcomes
this by taking advantage of large scale production quantities to build sample sizes large enough to
overcome the problem of finding differences in the response variable. The effect of these small
changes upon product performance is noted and the process is shifted to obtain product
improvement. The procedure is then continued until the optimum settings for the variables under
study are found.
The following table summarizes the basic differences between design of experiments (DOX) and
evolutionary operations (EVOP).
DOX
EVOP
- requires many experimental
runs.
- requires few experimental
runs.
- large difference between
settings of levels.
- small difference between
settings of levels.
- finds best settings fast.
- finds best settings slow.
- usually disrupts production.
- usually does not disrupt production.
- produces information but
product may not be sold.
- produces information but
product can be sold.
- large differences need to
- small differences can be
Page 14 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
be seen before statistical
significance can be declared.
volumes.
assessed as statistically
different due to high production
Page 15 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I need help with how to chart and evaluate data that we are collecting from a process in
which the number of observations per sample can be highly variable. The variable that we
track is width of the cut. We are using a Xbar and Range control chart and the number of cuts
varies from 2 to 7. What would be the best way to manage these charts? For ease of chart
maintenance we would prefer to have all the data on one chart, but if this is impractical then
we could do one chart for each of the different sample sizes.
You do not need to construct charts for each sample size. What you could do is calculate the
control limits for each possible sample size (n=2,3,…7). You then would draw on the control chart
the control limits for the sample size that appears the most often. On the side of the chart or on
the back, you would list the control limits for all the possible sample sizes. If the actual sample
size for the current sample is different than sample size used to construct the control limits
displayed on the control chart, the individual who is plotting the data would then look at this list
and make a decision (in control or out of control) against the appropriate control limits for his/her
sample size. You should also consider have the individual record on the control the actual sample
size for each sample.
Page 16 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Realistically at some time, a control chart being used to monitor a key product
characteristic or a key process variable will show a pattern that violates one of the many out of
control rules. Does this mean that the process is no longer in control?
Assessing if a process is in control or not is not as simple as the textbooks imply. There are many
documented rules (about 12 of them) for defining when a process is statistically out of control. The
interpretation of these rules still require that common sense be used in deciding what actions, if
any, should be taken.
When a process is in a state of statistical control, it is an indication that only random variation is
present, that is, all identifiable special causes have been eliminated or corrected for. The presence
of special causes in a process is not necessary a “bad” event. Being out of control is not always
“bad”. Consider the case where an organization has been using a control chart to monitor the scrap
rate of a particular process. The scrap rate has been in a state of statistical control at 15% for
many months. For the last 8-9 weeks, the scrap rate has been steadily decreasing (exhibiting a
downward trend on the control chart – a nonrandom pattern) and the pattern violates one or more
of the standard out of control rules. Most managers and supervisors would be thrilled with this
downward trend. Indeed, this process is out of control – that is, special causes are influencing the
performance of the process. In this case (hopefully), the special causes are all the changes being
made by the organization to improve the process. These changes should be known and
documented. The organization would not want to undo the changes in order to bring the process
back to a state of control at 15%.
Defining processes as either in control or out of control is a dichotomous view of process control.
Control is really a continuous characteristic that is concerned with reducing variation and
preventing nonconformance over time. Therefore, "in control" implies more than simply meeting
the "in control" rules associated with control charts. An operational definition that is often used is
as follows:
A process is said to be in control if the output from the process exhibits only random or common
variations on the appropriate control chart and when nonrandom variations or unexpected
variations are present, actions are taken to understand the source of the special causes and, when
appropriate, actions are taken to eliminate the source of special causes in order to return the
output of the process to the state of random variations.
This operational definition requires knowing the cause and effect relationships (the relationships
between input variables and output variables) that govern the process. When people understand
cause and effect relationships of a process, they can quickly find and correct, when appropriate,
whatever is making the process behave in a state of out of control. Process control is about
recognizing when special causes are present, deciding if the effect due to the special cause is
understood and desirable and taking action (following the reaction plan) to rectify the influence of
the special causes when the effect of the special cause is determined to be unacceptable. In this
situation we can say that the organization is controlling the process even though the process may
not be in “pure” statistical control.
Page 17 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Therefore, if our control chart shows an out of control situation, will we say the process is out of
control? Yes. In this situation, will we immediately take action to rectify the out of control
situation? No. If after investigation, we conclude that the process is out of control due to actions
(special causes) that we have taken to improve the process, we will conclude that we are
controlling the process. If after investigation, we conclude that the process is out of control due to
special causes (no actions taken by us), we will conclude that we are not controlling the process.
Walter Shewhart once said:
“While every process displays variation, some processes display
controlled variation, while others display uncontrolled variation.”
Page 18 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I am a recent college graduate and I am currently working as a quality engineer. In school, I
took one statistical quality control course. I am under the impression that a process in control
is also what that is capable. Some of my co-workers are telling me that I am wrong. If there is a
difference between a process in control and a capable process, please explain it to me.
There is a big difference between a process being in control and a process being in control and
capable. When a process is in a state of statistical control, it is an indication that only random
variation is present, that is, all identifiable special causes have been eliminated or corrected for.
An in control process is one that is predictable, that is, all expected future results will fall
randomly between two limits based upon probability. My golf game is in control (unfortunately my
golf scores are no where in the neighborhood of Tiger Woods) but not capable if I want to play on
the PGA tour.
A capable process is a process that is not only in statistical control but also a process where all the
results are meeting the requirements. Since this type of process is in control this then means, as
long as we keep it in control, all future results will fall randomly between the two probability
limits and all future results will continue to meet the requirements.
There are four situations when dealing with the concepts of control and capability:
-
a
a
a
a
process in control and capable
process in control but not capable
process not in control but capable
process not in control and not capable
To evaluate if a process is in control, control charts are used. To evaluate if a process is capable,
after achieving control, capability indices are used.
Page 19 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is EVOP?
Design of experiments is a systematic technique in which process variables, known to effect the
product, are varied over a wide range of operating levels. The objective is to determine the
relationship between the response variable and a number of process variables. This is accomplished
by making large changes to the settings of the process variables. However in doing so, the scrap rate
or percentage of nonconforming items may increase significantly. This increase in scrap or
nonconformance does not make the production group happy.
Evolutionary operation (EVOP) is a technique that can be used to facilitate continuous improvement.
It is based upon experimental design concepts but is used differently than the classical design of
experiments. With EVOP only very small changes are made to the settings of the process variables so
that the process is not disrupted and thus there generally is no increase in the percentage of
nonconforming units.
The basic principles of EVOP are:
- make small changes to the process variables so that product quality is not
endangered.
- make changes in the process variables in a set pattern and repeat this pattern
several times.
- evaluate the effects of the small changes by grouping data and comparing the
averaged results.
- interpret the results for significance.
- move the process settings (if the results are significant) in the direction with the
best results.
- repeat the procedure at the new settings.
EVOP uses planned runs that are repeated over and over. An EVOP design generally involves two
levels of each variable being evaluated and a center point. The center point denotes the reference
condition. At the beginning of the process study, this reference condition represents the current
process settings for the variables being evaluated.
When two independent variables are being evaluated at two levels per variable, there are five
experimental conditions:
- Reference settings for variable A and variable B
- Variable A low and variable B low
- Variable A high and variable B high
Page 20 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
- Variable A high and variable B low
- Variable A low and variable B high
For those processes for which EVOP is applicable, EVOP can lead to some important
benefits:
- product improvement.
- better understanding of the process
- an increased awareness of the process
- a sense of involvement with process performance by operating personnel
A minor disadvantage of EVOP is that its implementation costs time and money in training personnel,
keeping and analyzing simple data, making process changes, the number of repeats needed to see a
significant change, etc.
Page 21 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I am new to the field of quality and would like to know what is the difference between
ANOVA and design of experiments?
Design of experiments is an advanced statistical tool for planning experiments while ANOVA
(analysis of variance) is a statistical test used to test the null hypothesis. In basic statistics, this is
similar to hypothesis testing and the t test. Hypothesis testing is a simple statistical tool while the t
test is one of many statistical tests used to test a null hypothesis.
Design of experiments is a systematic approach to sort out the important variables or combination
of variables that influence a system. This technique allows several variables to be evaluated at the
same time. The process is defined as some combination of machines, materials, methods,
measurements, people and environment which used together form a service, produce a product or
complete a task. Thus designed experiments is a scientific method which allows the experimenter
to better understand a process or system and how the inputs affect the outputs or responses.
Analysis of variance (ANOVA), an advanced statistical test, is used to determine whether or not
significant differences exist between several means. Basically, the analysis of variance technique
extends the concept of the t test but holds the pre-selected level of significance constant. The use
of ANOVA results in an ANOVA table.
The analysis of variance technique is based on two principles:
- Partitioning the total variability of the process into its components (the process
variables selected for the study).
- Estimating the inherent variability of the population by two methods and
comparing the two estimates.
If the two estimates are close, then there is no significant difference in the means. If the two
estimates are not close, then there is a significant difference in the means.
Page 22 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. As a follow-up to the above question. What is an ANOVA table?
The ANOVA table is output from a analysis of variance calculation used in design of experiments
and regression. This table shows the source of variation, the sum of squares (SS), the degrees of
freedom (df), the mean squares (MS), the F ratio and the significance level. An example of a very
simple ANOVA table is illustrated below. This ANOVA table is for a completely randomized design of
experiment.
Source of
Variation
Between Groups
Within Groups
Total
Sum of
Squares
70
54
124
degrees
of freedom
3
16
19
Mean
Square
23.3
3.4
F Ratio
6.9
Level of
Significance
0.003
The
Source of Variation column lists the components of the total variation. The sources of variation for
this simple case are between the groups and within the groups. The term within groups is
sometimes referred to as the error of the experiment.
The Sum of Squares (SS) column shows the sum of squares for each component of variation and the
total variation. A sum of square represents the deviations of a random variable from its mean. The
sum of squares for between groups represents the variability between all the groups in the study.
The sum of squares for within represents the averaged variability within each of the groups in the
study. The total sum of squares represents the variability among all the observations.
The degrees of freedom (df) column shows the degrees of freedom for each component of
variability and the total variability. Degrees of freedom represent the maximum number of
measurable characteristics that can freely be calculated before the rest of the characteristics are
completely determined.
The Mean Squares (MS) column shows the estimated variances for each component of variability. A
mean square is an unbiased estimate of a population variance and is determined by dividing a sum
of squares by its degrees of freedom.
The F Ratio column shows the ratio between a mean square for a component and a mean square for
error. If this value is bigger than a critical F value then the null hypothesis is not accepted.
The Level of Significance (sometimes called the p value) column shows the exact probability of
obtaining the F Ratio value. If the calculated p value is less than the chosen significance level (the
alpha value or Type I error), then the null hypothesis is not accepted. On the other hand if the
calculated p value is greater than the chosen alpha value then the null hypothesis is not rejected.
Page 23 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. When evaluating the results of a process capability study, it is recommended that process
capability indices be calculated. Two such indices are the Cp and the Cpk. Why is it necessary to
calculate both of these indices and what is the difference between the two indices?
A. Process capability indices, such as Cp and Cpk, measure the performance of a process to meet
the customer’s requirements. These two indices are concerned with only two characteristics of the
process - location and dispersion. The relationship between process and customer specifications
can be summarized by two questions:
 Is there enough room within the specification limits for the process to operate?
 Is the process properly located to take advantage of what room there is within the specification
limits?
The first question is answered by calculating Cp and the second question is answered by calculating
Cpk.
Cp is the process potential index and measures a process’s potential capability, which is defined as
the allowable spread over the observed spread of the process. The allowable spread is the
difference between the upper and lower specification limits given by the customer. The observed
spread is determined from data gathered from the actual process by estimating the standard
deviation of process and multiplying this estimate by 6. The general formal is given by:
Cp 
USL  LSL
6S
As the standard deviation increases in the process, the C p decreases in value. As the standard
deviation decreases, the Cp increases in value. By convention, when a process has a C p value less
than 1.0, it is considered potentially incapable of meeting the customer specification
requirements. Ideally, the Cp should be as high as possible. The higher the C p, the lower the
variability with respect to the customer specification limits.
However, a high Cp value does not guarantee a production process falls within the customer
specification limits because the Cp value does not imply that the observed spread of the process is
centered within the allowable spread. This is why the C p is called the process potential.
The process capability index, Cpk, measures a process’s ability to produce product with the
customer specification limits. Cpk represents the difference between the observed process average
and the closest specification limit over three times the estimated process standard deviation. The
general formal is given by:
 USL  X X  LSL 

Cpk  min
,

3S
3S


Page 24 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
By convention, when Cpk is less than 1.0, the process is considered not capable. When C pk is equal
to 1.0 or greater than 1.0, the process is considered capable of producing product within the
customer specification limits.
The Cpk is inversely proportional to the process standard deviation. The higher the C pk, the
narrower the observed process distribution as compared to the customer specification limits and
the more uniform the product. As the process standard deviation increases, the C pk index
decreases. At the same time, the potential to produce product outside the customer specification
limits increases.
The Cpk index can never be greater than the Cp, only equal to it. This happens when the observed
process average falls in the middle of the specification limits. The C pk index will equal 0.0 when
the observed process average equals one of the specification limits. The C pk index can be negative
if the observed process average is outside one of the specification limits.
The need to calculate both indices helps in deciding what the issue is (location vs. variability) if
the process is found not to be capable. If C p  1.0 and Cpk < 1.0, then the issue is location related.
If Cp < 1.0 and Cpk = Cp, then the issue is variability related. If C p < 1.0 and Cpk < 1.0, then the issue
is location and/or variability related.
Page 25 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. In recent months, I have seen in print and have heard at various conferences and seminars,
the phrase process management. I think I know what this mean but could you tell me more
about what process management is?
To achieve process excellence, an organization must manage their key processes. Process
excellence implies that waste is minimized and variability is reduced. Minimization of waste brings
about more efficient application of resources, raw materials and time. Reduced variability brings
about process consistency and improved capability. So what is required to manage a process. The
organization must:
- satisfy the needs and wants of the external customer and the internal customer
- produce and deliver acceptable goods and services on time
- understand the capabilities of each activity within the process to produce
acceptable goods and services
- identify in a timely fashion any changes in the process so that they can be
properly managed (if change is positive) or corrected (if change is negative)
before the process goes out of control and/or begins to produce unacceptable
goods
- detect unacceptable goods or services resulting from activities that come from
processes that are incapable of producing acceptable goods or services
- ensure that new people are trained before they become involved in the process
and to provide refresher training, when appropriate, to assure that the people
continue to perform as expected
- report all unacceptable findings to the appropriate people
- define the root causes of problems and initiate a process to eliminate them
- obtain customer feedback that defines process problems so that the process
can be improved
- develop an on-going feedback system to the suppliers of the organization
(within and outside the organization) that measures process performance
This is what people are calling process management. Some of the key elements of a process
management oriented company include:
Process ownership: The process owner is responsible for the process
performance, maintenance, improvements, and other aspects of its health;
Page 26 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
roles and responsibilities and accountabilities are well defined.
Performance measures: Performance of the overall process is measured,
planned, and linked to process changes; compensation is often linked to
it as well.
Strategy: The process is part of the overall strategic planning (not the result of
it).
Process management involves the design, improvement, monitoring, and maintenance
of an organization's most important processes in order to bring them up to the highest levels of
excellence. In Process Management the goal is usually to maximize profits, have high levels of
customer satisfaction, and achieve long-term business stability.
Page 27 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. At the last ASQ dinner meeting, someone at the table was talking about data mining. What is
data mining?
Generally, data mining is the process of analyzing data from different perspectives and
summarizing it into useful information - information that can be used to increase revenue, cuts
costs, or both. The concept of data mining is not new but it is the latest buzzword especially in
the database industry. To use a simple analogy, it's finding the proverbial needle in the haystack.
In this case, the needle is that single piece of knowledge your business needs and the haystack is
the large data file you've built up over a long period of time. Through the use of automated
statistical analysis techniques, organizations are discovering trends and patterns in the large
dataset that previously went unnoticed.
Regression is the oldest and most well-known statistical technique that the data mining community
utilizes. Basically, regression takes a numerical dataset and develops a mathematical model that
fits the data. When you're ready to use the results to predict future behavior, you simply take your
new data, plug it into the developed model and you've got a prediction! The major limitation of
this technique is that it only works well with continuous quantitative data (like weight, speed or
age). If you're working with categorical data where order is not significant (like color, name or
gender) you're better off choosing another technique. In this case, classification analysis may be a
better technique to use. This technique is capable of processing a wider variety of data than
regression and is growing in popularity.
Data mining is primarily used today by companies with a strong consumer focus - retail, financial,
communication, and marketing organizations. It enables these companies to determine
relationships among "internal" factors such as price, product positioning, or staff skills, and
"external" factors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted
promotions based on an individual's purchase history. By mining demographic data from comment
or warranty cards, the retailer could develop products and promotions to appeal to specific
customer segments.
WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures
point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data
to its massive data file. WalMart allows more than 3,500 suppliers, to access data on their products
and perform data analyses. These suppliers use this data to identify customer buying patterns at
the store display level. They use this information to manage local store inventory and identify new
merchandising opportunities.
The easiest way to use data mining is by using the new software packages. Data mining software is
a growing field. Nearly every statistical software company (i.e., SAS, SPSS, JMP, etc.) have
developed a data mining program.
Page 28 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Page 29 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What are skewness and kurtosis?
A fundamental task in many statistical analyses is to characterize the location and variability of a
data set. A further characterization of the data includes skewness and kurtosis. Skewness and
kurtosis are measures of the shape of the distribution.
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or
data set, is symmetric if it looks the same to the left and right of the center point.
Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That
is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly,
and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than
a sharp peak. A uniform distribution would be the extreme case.
Skewness and kurtosis can be estimated for a dataset using EXCEL. EXCEL functions SKEW and KURT
that are used like the functions AVERAGE and STDEV.
Page 30 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Six Sigma is still a little foreign to me and I am trying to learn what I can. I am wondering if
there is a formula I could use to determine what my parts per million goal would have to be if I
wanted to improve the process 1 sigma?
Six sigma technically means having no more than 3.4 defects per million opportunities in any
process, product, or service not meeting the requirements of the customer. The number 3.4 is
reached by assuming that the specification limits are not only 6 standard deviations away from the
target, but that the process average may drift over the long term by as much as 1.5 standard
deviations despite best efforts to control it. This results in a one-sided integration under the
normal curve beyond 4.5 standard deviations, which produces an area of about 3.4 defects per
million opportunities.
In contrast, the old three sigma quality standard of 99.73% translates to 2,700 per million defects,
assuming zero drift in the mean. A process operating in this mode will produce 1350 defects per
opportunities beyond each specification limit (total would be 2700).
For processes with a series of steps, the overall yield is the product of the yields of the different
steps. For example, if we had a simple two step process where step #1 had a yield of 80% and step
#2 had a yield of 90%, then the overall yield would be 0.8 x 0.9 = 0.72 = 72%. Note that the overall
yield from processes involving a series of steps is always less than the yield of the step with the
lowest yield. If three sigma quality levels (99.73% yield) are obtained from every step in a ten step
process, the quality level at the end of the process will contain 26,674 defects per million!
Considering that the complexity of modern processes is usually far greater than ten steps, it is easy
to see that Six Sigma quality isn’t optional; it’s required if an organization is to remain viable.
The following table shows the number of defects per million for various sigma values with the
assumed 1.5 sigma shift.
Sigma
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Number of Defects per million
500,000
308,300
158,650
67,000
22,700
6,220
1,350
233
32
3.4
Six Sigma isn't twice as good as three Sigma, it's almost 20,000 times better. This is because the
relationship between 1, 2, 3, 4, 5, 6 sigma is not linear. Remember the area under the normal
curve is also not linear.
Page 31 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. When should I recalculate the control limits on my control chart?
This is perhaps the most frequently asked question by people who use control charts. While there is
no simple answer, there are some useful guidelines. It is obvious that the control limits need to
revised/recalculated when the sample size has been changed or the specification target has been
changed
The primary guideline for computing and/or recomputing control limits is: The purpose of the
control limits is to adequately reflect the voice of the process.
Remember, control charts are intended as aids for making decisions, and as long as the limits
appropriately reflect what the process can do, or can be made to do, then the control limits do not
need to be revised. Therefore, the following questions are generally used to help determine when
control limits should be revised.
- Does the current data, on the control chart, display a distinctly different kind of behavior and/or
pattern past data?
- Is the reason for this change in behavior and/or pattern known?
- Is the new process behavior and/or pattern desirable?
- Is it intended and expected that the new behavior and/or pattern will continue?
If the answer to all four questions is yes, then it is appropriate to revise the control limits based on
data collected since the change in the process.
If the answer to question 1 is no, then there is no need to revise the control limits.
If the answer to question 2 is no, then one should look for the assignable/special cause instead of
wondering if the control limits should have to be revised.
If the answer to question 3 is no, then you should be working to remove the detrimental
assignable/special cause instead of wondering if the control limits should have to be revised.
If the answer to question 4 is no, then you should again be looking for the assignable/special cause
instead of wondering if the control limits should have to be revised. The objective is to discover
what the process can do, or can be made to do.
Page 32 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. The new ISO 9001:2000 standard stresses continual improvement. Recently, our third party
auditor was doing a surveillance audit and implied we were not involved in continual
improvement activities. What is your interpretation of what is meant by continual
improvement in the ISO 9001:2000 standard?
Many quality system standards include direct and indirect reference to continual improvement. For
example, clause 8.5.1 in ISO 9001:2000, clause 8.5.1 in ISO/TS 16949 and clause 4.2.5 in QS
9000:1998. In general, the primary requirement in these standards refers to continually improving
the effectiveness of the quality management system. In my opinion this appears to be a very
narrow interpretation. Continual improvement should also include the entire organization’s
effectiveness if an organization is to improve customer satisfaction, gain a long term competitive
advantage and improve overall process performance.
Continual improvement should be viewed as a type of change that is focused on increasing the
effectiveness and/or efficiency of the entire organization to fulfill its policy and objectives. It
should not be limited to just quality initiatives. Improvement in business strategy, business results,
customer, employee and supplier relationships can be subject to continual improvement. Continual
improvement should focus on enablers such as leadership, communication, resources, organization
architecture, people and processes. Continual improvement should also lead to better results such
as price, cost, productivity, time to market, delivery, responsiveness, profit and customer and
employee satisfaction.
Page 33 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is a Weibull distribution and what is it used for?
The Weibull distribution is another parametric probability distribution. It is a member of a special
class of parametric distributions known as location-scale distributions. The Weibull distribution is
characterized by its shape and scale parameters. By changing the shape parameter, the Weibull
distribution can be made to have many different shapes, from highly skewed like an exponential
distribution to nearly bell-shaped like a normal distribution. The Weibull distribution was named
after its inventor, Waloddi Weibull of Sweden in 1939.
While the normal distribution is described by two parameters  and , the Weibull distribution
requires three parameters to define a particular Weibull. These three parameters are the scale
parameter , the shape parameter  and the location parameter . The following is the density
function for the Weibull:
β(t  γ) β1
f(t) 
e
α
 (t  γ)β
α
The parameter  is also known as the Weibull slope and is a positive number. The parameter  is
also a positive number. In addition,  is called the characteristic life since it represents the 63.2th
percentile of the distribution.
The Weibull distribution is used in many statistical data analyses and reliability analyses. It is
primarily used as a distribution of strength of certain materials.
Page 34 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q: How will lack of normality affect my Cpk statistics for a process? Will the
calculated statistics be higher or lower than they should be?
There is no one answer to this question. It depends on how different from normal the distribution
is, whether the distribution is skewed to one side, how close to a specification the distribution is,
and other considerations. Even moderate departures from normality that affect the tails of the
process distribution may severely impact the validity of the process capability calculations.
The construction and interpretation of process capability statistics are based on the process being
distributed as a normal distribution. For a normal distribution, approximately 99.73% of the
observations should fall within 3 standard deviations (s) above the mean and 3 standard deviations
below the mean.
The Cp statistic is designed to be equal to 1.0 when the process spread (± 3s) is the same as the
specification width. With a Cp equal to 1.0 for a normally distributed and centered process, we
would expect about 0.27% of the output (2700 parts per million) to be beyond the specification
limits. The Cp statistic assumes that the process is centered, which may not be true. Therefore the
Cpk statistic is typically reported.
The Cpk statistic should be equal to 1 when the ± 3s process spread coincides with one or both of
the specification limits. With a Cpk equal to 1.0 for a normally distributed process, we would
expect about 0.27% or less of the output to be beyond the specification limits.
When the distribution differs significantly from a normal distribution, the calculated process
capability indices will probably be incorrect. There may also be significant discrepancies between
the predicted (theoretical) proportion and actual proportion of the process output that are beyond
the specification limits. This is because the theoretical percentages are calculated based on the
tail probabilities for a normal distribution.
Page 35 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I have been using xbar control charts for several years and recently was told
that I am making a mistake by not having a range control chart or a standard
deviation chart with the xbar control chart. Is this true and if so why?
Yes, it is true you are making a mistake. Control charts are used to check for process stability. In
this context, a process is said to be in statistical control if the probability distribution representing
the quality characteristic is constant over time. If there is some change over time in this
distribution, the process is said to be out of control. When dealing with a quality characteristic
that is variable, it is standard practice to control both the mean value of the quality characteristic
and its variability. Control of the process average is usually done using xbar control charts. The
xbar control chart shows how uniform or consistent the averages are between the samples. Process
variability or dispersion can be controlled with either a control chart for the standard deviation or
a control chart for the range. Xbar control charts show how the average performance varies
between samples with respect to time. Range control charts or standard deviation control charts
show how uniform or consistent the individual values within the sample are. Therefore you need a
control chart to monitor the average performance and another control chart to monitor the within
sample variability.
Page 36 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is the best way to summarize the performance of a process – using control
charts, using histograms or calculating capability indices?
I do not think that any one of these methods is better than another for summarizing the
performance of a process. Control charts allow the user to determine if the process from which the
data was collected is in a state of statistical control. Histograms allow the user to determine if the
data follows a normal distribution or not. In addition, histograms can be used to determine what
percent of the data is in or out of specification. Capability indices allow the user to determine if
the process from which the data was collected is capable of meeting the customer’s requirements
or not.
All three of these methods have their strengths and weaknesses. Control charts can not help the
user determine if the underlying distribution is normal or not and if the process is capable or not.
Histograms cannot help the user determine if the process is in control or not. Capability indices
cannot help the user determine if the underlying distribution is normal or not and if the process is
in control or not.
In the long run, the best thing to do is use all three methods. This will permit you to make a more
intelligent decision about the process in the long run.
Page 37 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Recently, I have been hearing people talking about visual controls in a manufacturing
environment. My readings indicate that it is a tool of lean manufacturing. Can you tell me what
visual controls are?
A. Visual controls are indeed associated with the methodology of lean manufacturing but they can
be used in many other applications, not just manufacturing. The intent of a visual controls is that
the whole workplace is set-up with signs, labels, color-coded markings, etc. such that anyone
unfamiliar with the process can, in a matter of minutes, know what is going on, understand the
process, and know what is being done correctly and what is out of place.
There are two types of application in visual factory: displays and controls.
- A visual display relates information and data to employees in the area.
For example, charts showing the monthly revenues of the company or a
graphic depicting a certain type of quality issue that group members
should be aware of.
- A visual control is intended to actually control or guide the action of the
group members. Examples of controls are readily apparent in society:
stop signs, handicap parking signs, no smoking signs, etc.
The most important benefit of visual controls is that it shows when something is out of place or
missing or not working correctly.
Visual controls help keep things running as efficiently as they were designed to run. The efficient
design of the production process that results from lean manufacturing application carries with it a
set of assumptions. The process will be as successful as it was designed to be as long as the
assumptions hold true. A factory with expansive visual control applications will allow employees to
immediately know when one of the assumptions has not held true.
Visual controls can also help prevent mistakes. Color coding is a form of visual display often used to
prevent errors. Shaded "pie slices" on a dial gauge tell the viewer instantly when the needle is out
of the safe range. Matching color marks is another approach that can help people use the right tool
or assemble the right part.
Examples of visual controls include, but are not limited to the following:
-
color-coded pipes and wires
painted floor areas for good stock, scrap, trash, etc.
shadow boards for parts and tools
indicator lights
workgroup display boards with charts, metrics, procedures, etc.
production status boards
direction of flow indicators
Page 38 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Several months ago, the Evansville-Owensboro newsletter mentioned in one of their articles
the Box-Cox transformation as a way to transform data in order to convert non normal data to
data from a normal distribution. The article was vague on describing the technique, what
exactly is the Box-Cox transformation?
A. Certain assumptions about the distributions of the populations are necessary for most statistical
procedures to be valid. One such assumption is that one or more populations are normally
distributed. When this assumption is violated then using the statistical procedure that requires
normality is not valid. It sometimes happens that applying the appropriate transformation to the
original data , will more nearly satisfy the assumption of normality. The success in finding a good
transformation depends in part on the experience one has in the particular field that the data
came from. One very useful transformation is the Box-Cox transformation. The Box-Cox
transformation is a family of transformations and is a power law transformation. The general
formal is as follows:
y
x 1

where:
x is the original data
y is the data after transforming
 is a number usually between -1 and 1
The task becomes selecting the appropriate value for  so that the transformed data is now
normally distributed (verified by using some type of test for normality, such as the Shapiro Wilk or
Anderson-Darling or others).
It should be noted that if  = 0, the transformation is simply the natural log of the original data. If
 = 1.0, then no transformation is needed and if  = -1.0, then the transformation is the reciprocal
of the original data. If  = 0.5, the transformation is the square root of the original data.
Page 39 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. How do you calculate Cpk if the customer gives you a target value of 0 with no tolerances.
In addition, all the test data is positive (i.e., conductivity in a bath, the target is 0)?
A. The general formula for calculating Cpk is the following:
 USL  x x  LSL 
Cpk  min 
,

3s 
 3s
where USL is the upper spec and LSL is the lower spec and x is the mean of the observed data and
s is the standard deviation of the observed data.
The USL and LSL are given to the organization by the customer and x and s are statistics calculated
from the observed data. As you can see from the above formula, the target is not used to calculate
the Cpk value.
If the customer only provides a target (i.e., target is 0) then Cpk cannot be calculated. If the
customer provides USL (i.e., USL = 4) and a target value (i.e., target is 0) then the following
formula is used to calculate Cpk:
 USL  x 
Cpk  

 3s 
For the above example, the organization wants the mean ( x ) of the process to be very close to
zero with very few observed results close to zero. This implies that the statistical distribution of
the process will probably not be normal distribution but a very skewed distribution. For this
example, some type of transformation would have to be found and used to convert the skewed
distribution to a normal distribution.
Page 40 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is measurement system discrimination and when is it a concern?
Discrimination, with respect to a measuring system, is the ability of the measuring system to
detect small changes in the response being measured. A lack of discrimination exists when the
measured response may be grouped into few data categories. If the measuring system does not
have a minimum number of data categories, the gage does not have enough discrimination hence
the gage will not be able to monitor and evaluate the process of interest. If this is the case, an
alternative measuring system needs to be used with the desired discrimination required by the
using organization.
One way to evaluate the discrimination of a measuring system is to use a range control chart. If the
range chart shows four or fewer values, it can be concluded that the measuring system has
inadequate discrimination. For a measuring system to have adequate discrimination, the range
chart must show five or more values.
Page 41 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is the difference between a dependent and independent variable?
With respect to regression analysis and design of experiments, variables can be classified into two
categories: (1) dependent and (2) independent. The dependent variable is the response variable,
that is, the variable that is used to assess the results process. They represent the measurable
outcomes from the study such as product yield, product strength, failure rate, coating thickness,
temperature, tensile strength, etc.
The independent variable is the variable which might influence the dependent variable.
It is important to understand the role each independent variable has on the dependent variable.
Independent variables are variables that can be changed (knowingly or unknowingly) over time. As
these independent variables change, the organization must have an understanding on how these
changes effect the process and product.
Page 42 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. How do I generate a list of variables for an experiment?
The typical method used is brainstorming. This brainstorming session should involve process
experts, process technicians and process operators. These are the people who have experience and
knowledge of the process under investigation. After a brainstormed list has been create then, the
variables need to be prioritized as to importance to the response variable. You should not try and
run an experiment with more than 4 or 5 variables. Managing more than five variables is usually
very difficult for most organizations.
Page 43 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Why, when calculating the standard deviation, do we divide by n-1 rather than n?
The reason that n-1 is used instead of n in the formula for calculating the sample standard
deviation is as follows: The sample variance (the square of the sample standard deviation) can be
thought of as a random variable (i.e., a function which takes on different values for different
samples from the same population). Its use is as an estimate for the true variance of the
population. In the real world, one typically does not know the true variance. We typically use the
sample variance to estimate the true variance. Since the sample variance is a random variable, it
usually has a mean or average value. One would hope that this average value is close to the actual
value that the sample variance is estimating. In fact, if we use n-1 in the calculation of the sample
variance, we do obtain an unbiased estimate of the true population variance. If we use n in the
calculation, we obtain a biased estimate of the true population variance. In general, when using n
the sample variance is (n-1)/n times as large as the true variance. This can be illustrated by using
EXCEL and normal distribution function. Randomly have EXCEL draw 100 sets of data in groups of 5
from a normal distribution with a known mean and variance. Calculate the mean, the variance
using n-1 in the formula and the variance using n in the formula for each subgroup of 5. Calculate
the mean variance of the 100 estimates and see which method (n-1 or n) comes the closer to the
true variance you used.
Page 44 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What are reaction plans and why should I use them with my control charts?
If the control chart indicates that the process is out of control, then the user of the control chart
must take action to bring the process back to a state of statistical control as quickly as possible. In
order to take action, the user must be given some guidelines or directions. These guidelines or
directions generally appear on a reaction plan. These plans are developed by the engineering,
quality and manufacturing groups. The reaction plan is a written document linking specific actions
that should be taken if out of control conditions appear on the control chart.
The reaction plan should indicate:
- Criteria to determine when action is needed.
- Possible actions to take.
- Information to be recorded.
- Responsibilities for various actions.
Possible action to be taken could include the following:
- Taking a second set of samples.
- Checking the equipment for a malfunction.
- Adjusting the equipment.
- Calling the immediate supervisor.
- Checking the measuring equipment.
- Calling maintenance.
- Stopping the process.
The reaction plan should also discuss what is to be done with any items or parts found to be out of
specification.
Reaction plans help operators react quickly to out of specification or out of control conditions, help
minimize downtime and minimize scrap.
Page 45 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What is the difference between reaction plans and corrective-preventive actions?
A reaction plan is a written document linking specific actions that should be taken if out of control
conditions appear on the control chart or out of specification situations appear. The purpose of a
reaction plan is to provide quick fixes to the operators of the processes for unacceptable conditions
in order to minimize scrap or off spec materials and to restore flow as quickly as possible.
Corrective-preventive actions represent a process to identify root causes of problems in order to
minimize or eliminate existing non conformities, defects or undesirable situations in order to
prevent recurrence. The purpose of corrective-preventive actions is to find the actual root cause
and implement effective solutions following some type of problem solving model.
In reality you need both methodologies. Reaction plans for the day to day management of
processes and corrective-preventive actions for the long term management of processes.
Page 46 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What effect does changing the subgroup sample size have on calculating the Cpk value?
A. The answer is none. The general formula for calculating Cpk is the following:
 USL  x x  LSL 
Cpk  min 
,

3s 
 3s
where USL is the upper spec and LSL is the lower spec and x is the mean of the observed data and
s is the standard deviation of the observed data.
In this formula, s is an estimate of the process variability. The estimate of process variability
represents the variability between the individual values and not the variability between the sample
means.
Page 47 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. If my process is in statistical control, should I expect the Cpk value to be the same value
each and every time I sample the process?
No. Capability indices, like the Cpk value, are statistics just like x and s. Statistics are not
parameters but estimates of parameters obtained by taking samples from the population.
Parameters are fixed constants of the population. Statistics are not constants but vary from one
sampling to another sampling.
The difference between sample statistics and population
parameters is result of sampling error. Since every item in the population is not likely to be
included in the sample, sample statistics are unlikely to equal the population parameters. If you
take 50 items from the population (the population has 1000 items) and calculate the mean of the
sample, you expect the sample mean to be close to the actual mean of the population. If you take
a second sample of 50 items, you do not expect this sample mean to be equal to the first sample
mean but very close. The same is true for the Cpk value which is based on a sample from the
population with sample statistics x and s. You expect repetitive estimates, based on multiple
samplings of the population, of Cpk to be similar but not exactly the same.
Page 48 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Are there maximum values for Cp, Cpk, Pp and Ppk?
No. As long as the specification range does not change and an organization continually reduces the
process variation, the capability indices will increase. Typically, most processes seem to have
capability indices that range between 0.8 and 5.0. I have seen a Cpk value as high as 54 – not sure
why the organization was thrilled with it. Over the years, I have been known to say to senior
management that if I was in charge and you spent money to drive your capability indices above 6, I
would fire you. Reducing process variation is the name of game but you must also reap the benefits
of doing so. A better strategy would be to have the customer tighten the specification range – your
competitors may have trouble achieving the new expectation – therefore, you may get more orders
and this is a good thing. Right?
Page 49 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Why would I have a Cp and Cpk indices well over 1 when some of the observations in the
data set are outside the customer specifications limits?
Without seeing your control charts for this data, I will have to guess that your control chart for
location (individuals or xbar control chart) probably has some points out of control, even though
your range or moving range control chart has all the points within the control limits. Before you
calculate capability indices, you must verify that all the basic assumptions have been satisfied. The
two most important assumptions to verify are: (1) the process being evaluated should be
predictable, that is, the process is in a state of statistical control and (2) the observed data points
follow a normal distribution. Process capability software packages are nice but most of them
assume that the assumptions are being met – they leave the verification to the user of the package.
Page 50 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. How do I calculate capability indices with only an upper specification limit?
Very carefully. The formulas for Cp, Cpk, Pp and Ppk require a value for both the lower and upper
specification. When faced with a missing specification value, one could consider the following:
1.
2.
3.
Not calculating the capability indices.
Entering an arbitrary value for the missing specification.
Do not calculate Cp or Pp and only calculate Cpk or Ppk for the specification value given.
Let’s assume you are making a powdered material that has a moisture requirement that states no
more than 0.5 is allowed. If the product has too much moisture, it will cause manufacturing
problems for the customer. Let’s assume that the process is in statistical control and the data
comes from a normal distribution. Let’s also assume that for the last 100 lots produced the process
average has been 0.0025 with a process standard deviation of 0.15.
If you select Option 1, the customer will probably not be happy that you are not calculating the
capability indices.
If you select Option 2, you will probably argue that the lower specification limit is zero since it is
impossible to have a moisture value below zero. The calculated capability indices are Cp=0.55 and
Cpk=0.006. Your customer will not be satisfied with these values since they are below 1.0.
If you select Option 3, Cpk =1.10 and there is no value for Cp.
As you improve the process, that is you continually reduce moisture, Cpk will continue to get
lower. When you improve a process, the Cpk value should increase. Therefore, when you only have
one specification, you should enter only that specification and treat the other specification as
missing.
Page 51 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Should I calculate my process capability indices if my process is not in a state of statistical
control?
You cannot properly evaluate the capability of a process without establishing process control. It is
certainly possible to calculate capability indices when a process is not in control, but you might ask
what value these indices provide. The AIAG Statistical Process Control reference manual states:
“The process must first be brought into statistical control by detecting and acting upon special
causes of variation. Then its performance is predictable and its capability to meet customer
expectations can be assed. This is the basis for continual improvement.”
It is hard to say that you should not calculate capability indices if the process is not in control
because your customer may require you to calculate these indices. It is easier to say that the less
predictable your process is, that is, the more out of control it is, the less meaningful are the
capability values. If a process is not in control, then your estimate of the process mean or your
estimate of the process variation or both may not be a good representation of the real process
performance.
Page 52 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What are control plans?
Control plans are written descriptions of the system used for controlling product and processes. These
plans are very similar to the phrase “quality plans” as used by Dr. Juran in his books. These plans
address the important characteristics and engineering requirements of the product and the process
used to make the product. These plans discuss how the manufacturing process is controlled, how
incoming materials are controlled, how operators are trained, how the finished product is controlled,
how the measuring devices are controlled, what corrective actions or reactions need to be taken and
by whom when the process is not meeting it’s performance expectations. The format of these forms
can taken on many styles. The automotive industry has their version of what the form should look like
and so do many other industries. Items usually documented on the forms include: sample sizes,
frequency of sampling and testing, type of charts to be used, identification of the type of measuring
device to be used, specification or process limits, capability indices, work instructions to be followed,
etc.
Page 53 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. Do members really call you or write to you or do you make the questions up?
This is a question that has been asked of me several times at dinner meetings. The answer is yes.
Fortunately for me our members (and in many cases non members) do call or email me with
questions. If I have to make the questions up, I would not do this column. I may modify the
question for the column but I do not make them up. For your information, nearly 60% of the
questions come to me by telephone and the other 40% of questions come from emails. About 5% of
the time, the person asking the question has requested that I not print it. For your information, I
have done this column since 1995 and the questions keep coming in.
Page 54 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. What are the differences between the concepts of quality control, quality assurance, quality
management, and quality planning?
Quality management is all the activities of the overall management function that determine the
quality policy, objectives, and responsibilities and implement them by means such as quality
planning, quality control, quality assurance and quality improvement within the quality system.
Quality planning is the activities that establish the objectives and requirements for quality and for
the application of quality system elements. Quality planning covers product planning, managerial
and operational planning, and the preparation of quality plans.
Quality control is operational techniques and activities that are used to fulfill requirements for
quality. It involves techniques that monitor a process and eliminate causes of unsatisfactory
performance at all stages of the quality loop.
Quality assurance is the planned and systematic activities implemented within the quality system
and demonstrated as needed to provide adequate confidence that the organization is fulfilling the
quality requirements.
A quality system is the organizational structure, procedures, processes and resources needed to
implement quality management.
Page 55 of 56
Ask Mike
Excerpts from Mike Mazu’s Columns for ASQ Section 0915 Newsletter
Q. I have heard a lot about ASQ’s new Living Community Model. What is it?
At the February 2004 meeting, the ASQ Board of Directors approved Phase 1 implementation of the
ASQ Living Community Model (LCM). This new membership model maintains the tradition values of
ASQ membership, yet builds on it by offering a variety of new and enhanced member types and
benefits suited to all interested in the practice and/or profession of quality. The Living Community
Model approach advances ASQ’s Vision as the “community of choice for everyone who seeks quality
technology, concepts or tools to improve themselves and their world.”
The new membership plan enhances key strategic initiatives sought by current members and
potential members. These include:

proving the economic case for quality;

enhancing the image of the quality professional and ASQ;

enhanced activity on national issues, including Washington D.C., presence;

growing new and diverse communities of practice; and

providing more personalized member relationship management.
“Historically, ASQ has taken a ‘one-size-fits-all’ approach to membership, a best practice in the
association world for years. The Living Community Model provides value to individuals from all
backgrounds and occupations who profess an interest in quality, offering them flexible choices of
involvement and affiliation with the organization and the quality movement.
The model is designed to appeal to current and prospective members with new and more diverse
benefits and choices, multiple points of access, varying dues structures, and networking community
options.
After months of research and design, the Living Community Model’s membership categories were
proposed and approved in November 2003 by the board. They are: Regular, Associate, Forum,
Student, Organization, Corporate and Sponsor.
The Living Community Model Phase 1 implementation primarily addresses four individual
membership types—Regular Member, Associate Member, Forum Member, and Student Member.
Benefits and dues for these categories would go into effect for new and renewing members
beginning July 1, 2004.
Dues associated with all levels of the membership will help fund new activities, such as delivering
tools and materials to substantiate the impact of quality management in business improvement also known as the “Economic Case for Quality,” -- and a sustained, national and global effort to
enhance the image, value and voice of quality professionals, practitioners, and ASQ. Image
enhancement is expected to include several promotional activities as well as a media campaign.
Page 56 of 56
Download