Lecture 1

advertisement
Module III Inference
Lecture 1
The Logic of Statistical Decision Making
Assume that a manufacturer of computer devices has a process which coats a
computer part with a material that is supposed to be 100 microns (one micron =
1/1000 of a millimeter) thick. If the coating is too thin, then proper insulation of the
computer device will not occur and it will not function reliably. Similarly if the
coating is too thick, the device will not fit properly with other computer
components.
The manufacturer has calibrated the machine that applies the coating so that
it has an average coating depth of 100 microns with a standard deviation of 10
microns. When calibrated this way the process is said to be "in control".
Any physical process, however, will have a tendency to drift. Mechanical
parts wear out, sprayers clog, etc. Accordingly the process must be monitored to
make sure that it is "in control". How can statistics be applied to this problem?
To understand the way statistics might be applied to this problem, let us shift
to a different non-statistical problem.
Consider a person on trial for a "criminal" offense in the United States.
Under the US system a jury (or sometimes just the judge) must decide if the person
is innocent or guilty while in fact the person may be innocent or guilty. This results
in a table as shown below:
Person is
Innocent
Guilty
Innocent No Error
Error
Jury Says
Guilty
Error
No Error
Person is
Innocent
Guilty
Innocent No Error
Error
Jury Says
Guilty
Error
No Error
Are both of these errors equally important? That is, is it as bad to decide
that a guilty person is innocent and let them go free as it is to decide an innocent
person is guilty and punish them for the crime?
Is a jury supposed to be totally objective, not assuming that the person is
either innocent or guilty and make their decision based on the weight of the evidence
one way or another?
Actually in a criminal trial, there is a favored assumption. An initial bias if
you will. The jury is instructed to assume the person is innocent, and only decide
that the person is guilty if the evidence convinces them of such.
When there is a "favored assumption", as the presumed innocence of the
person is in this case, and the assumption is true, but the jury decides it is false and
decides the person is guilty, this is called a Type I error.
Conversely, if the "favored assumption" is false, i.e. the person is really
guilty, but the jury decides it is true, that is that the person is innocent, then this is
called a Type II error.
This is shown in the table below:
Favored Assumption: Person is Innocent
Person is
Innocent
Guilty
Innocent No Error
Type II
Error
Jury Says
Guilty
Type I
Error
No Error
In some countries the "favored assumption" is that the person is guilty. In
this case the roles of the Type I and Type II errors would reverse to give:
Favored Assumption: Person is Guilty
Person is
Innocent
Guilty
Innocent No Error
Type I
Error
Jury Says
Guilty
Type II
Error
No Error
Further in order to decide the person is guilty, the jury must find that the
evidence convinces them beyond a reasonable doubt.
In other words, if the balance of evidence toward guilt is just a bit more than
the balance toward innocence, the jury should decide the person is innocent.
Let us assume that the favored assumption is that the person is innocent, and
let
=P(Type I Error)=P(Jury Decides Guilty | Person is Innocent).
Clearly we would like to keep  small.
The alternative error is:
=P(Type II Error)=P(Jury Decides Innocent | Person is Guilty).
Clearly we would also like to keep  small.
In fact one can make  = 0 by simply not convicting anyone. However, every
guilty person would then be released so that  would then equal 1.
Alternatively, one can make  = 0 by convicting everyone. However every
innocent person would not be convicted so that  would be equal to 1.
Although the relationship between  and  is not simple, it is clear that they
work in opposite directions.
One can think of  as related to the concept of "reasonable doubt". If  is
small, say .001, then it will require more evidence to convict an innocent person. On
the other hand if  is large, say .1, then less evidence will be required. If
mathematics could be precisely applied to the court-room (which it can't), then by
choosing an acceptable level of , we could determine how convincing the evidence
would have to be to be "beyond a reasonable doubt".
Suppose we pick an , say .05, and then the jury hears the evidence in the
case. The defense attorney will ask the jury to decide innocence if they can think of
any way the person on trial could be considered innocent. The prosecuting attorney
will ask the jury to use their common sense and decide if something is beyond a
reasonable doubt.
The jury then makes their determination. It should be emphasized that just
because the jury decides the person is innocent, it does not mean that the person is
really innocent. Similarly, if the jury decides the person is guilty it does not mean
that the person really is guilty.
What does all this have to do with the manufacturing problem?
In the logic of statistical inference, one also has a favored assumption. This
favored assumption is called the Null Hypothesis. In the case of the jury trial the
favored assumption was that the person is innocent. In the manufacturing problem
the natural favored assumption is that the process is in control.
The alternative to the Null Hypothesis is called the Alternative Hypothesis. In
the case of the jury trial it was that the person is guilty. In the manufacturing
problem the alternative is that the process is out of control.
Our decision table would now look like the following in the manufacturing
problem:
Null Hypothesis: Process is In Control
Alternative Hypothesis: Process is Out of Control
Process
In
Out of
Control
Control
In Control
No Error
Type II
Error
Out of Control
Type I
Error
No Error
We Say
In fact, the Null Hypothesis is still not sufficiently precise for we need to
define what it means to say that the process is in or out of control.
We shall say that the process is in control if the mean is 100 and out of
control if the mean is not equal to 100. One usually abbreviates this with a
shorthand notation. We let H0 stand for the Null Hypothesis and HA stand for the
alternative hypothesis. We would then write:
H 0 :   100
H A :   100
as the formal specification of the decision. Our table would now look like:
Null Hypothesis: Process Mean equals 100
Alternative Hypothesis: Process Mean not 100
Mean
Equals Not Equal
100
to 100
Mean is 100
No Error
Type II
Error
Mean is not 100
Type I
Error
No Error
We Say
The next step is to define reasonable doubt or equivalently  . This is an
individual choice but the two most commonly used values are .05 (or about a one in
twenty chance) and .01 (or about a one in one hundred chance).
As in the trial we must now collect evidence. In a statistical situation, the
evidence consists of the data. We will have the data x1, x2, . . . , xn. In order to use
this data as evidence for or against the null hypothesis we need to know how to
summarize the information. Since our hypothesis is a statement about the
population mean, it is natural to use the sample mean as out evidence.
Let us assume that the thickness of the computer coating follows a normal
distribution with a mean of 100 and a standard deviation of 10. Suppose we took a
random sample of 4 chips and computed the sample mean.
Statistical theory would say that:
E ( x )  100
SD( x )  10 / 4  5
Statistical theory would also indicate that the sample mean is normally
distributed. This distribution is called the sampling distribution of the sample
mean.
To illustrate this, I simulated 200 values of the normal distribution with a
mean of 100 and a standard deviation of 10 using the command:
=norminv(rand(), 100, 10).
A histogram of the data is shown below:
Raw Normal Data
40
35
25
20
15
10
5
Thickness
More
130
125
120
115
110
105
100
95
90
85
80
75
0
70
Frequency
30
I then rearranged the values into 50 rows of four columns, and averaged the
values in each row which is equivalent to simulating a random sample of 50
averages of 4 normal observations. The results are shown below:
Means of size 4
9
8
7
Frequency
6
5
4
3
2
1
More
115
112.5
110
107.5
105
102.5
100
97.5
95
92.5
90
87.5
85
0
Average Thickness
As can be seen, the data is still bell shaped, but the standard deviation is much
smaller, in fact it is now 5.
Notice that almost no values of the mean are below 90 or above 110. This
follows from the fact that the sampling distribution of the sample mean based on
samples of size 4 is normal with a mean of 100 and a standard deviation of the
sampling distribution of 5. Therefore we know that approximately 95% of the
observations will fall within the limits of +/- 1.96 standard deviations of the mean.
In this case the values are:
100 – 1.96*5 = 90.2
and
100 + 1.96*5 = 109.8.
We now have a procedure for determining if the observed data is beyond a
reasonable doubt. The procedure goes like this:
a) Take a random sample of four chips, compute the thickness for each and
average the results;
b) If the sample mean falls between 90.2 and 109.8, then this is what would
happen approximately 95 % of the time if the manufacturing process was in control,
therefore decide it is in control;
c) If the sample mean falls outside the range of 90.2 and 109.8 then either one
of two things has happened: I) the process is in control and it just fell outside by
chance (which will happen about 5% of time); or II) the process is out of control.
Notice that under b) just because you decide the process is in control does not
mean that it in fact is in control.
Similarly under c) if you decide the process is out of control then it might just
be a random mean outside the interval which happens about 5% of the time.
The procedure described above is an example of Quality Control . In this
area, one monitors processes as we are doing here trying to use sample data to
decide if the process is performing as it should be. (The case we have been discussing
is probably the simplest situation. If you are heavily involved in Quality Control in
your job, you would need a more specialized course in statistics then the one you are
now taking).
If you load the EXCEL file qc.xls located in the Module III datasets, you will
see a chart like the one below:
Simulated Averages of 4 from Normal Mean = 100 SD = 10
115
110
mean
105
100
95
90
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
9
11
7
5
3
1
85
Trial
Notice that most of the averages are within the +/- 1.96 limits, with a few
points outside. If you press F9 when you are in EXCEL, you will get another set of
simulated values. Look at several of them. Remember, this process is in control so
all the values that fall outside the limits are false indicators of problems, that is, they
are Type I errors!
Here is another simulated sequence, notice that in this case several values
have fallen far outside the limits, and yet the process is in control.
Simulated Averages of 4 from Normal Mean = 100 SD = 10
115
110
100
95
90
Trial
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
9
11
7
5
3
85
1
mean
105
Often people expect the results to look like this sequence of simulated values:
Simulated Averages of 4 from Normal Mean = 100 SD = 10
115
110
mean
105
100
95
90
Trial
Here all the values stayed within the quality control limits.
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
9
11
7
5
3
1
85
I hope the above charts have indicated to you the great deal of variability
that can occur even when a process is in control. A statement that is often heard is
"Statistics has proved such and such". And of course there is the counter argument
that "Statistics can prove anything". In fact, now that you see the structure of
statistical logic, it should be clear that one doesn't "prove" anything. Rather we are
looking at consistency or inconsistency of the data with the hypothesis. If data is
"consistent" with the hypothesis (in our example the sample mean falls between 90.2
and 109.8), then instead of proving the hypothesis true, a better interpretation is
that there is no reason to doubt that the null hypothesis is true.
Similarly, if the data is "inconsistent" with the hypothesis (in our example
the sample mean falls outside the region 90.2 to 109.8), then either a rare event has
occurred and the null hypothesis is true (how rare?? less than .05 or .01), or in fact
the null hypothesis is not true.
The above quality control procedure was based on the fact that the sample
mean of four observations taken from a normal distribution, also followed a normal
distribution. However, we know that most business data does not follow a normal
distribution and in fact is usually right skewed!!! What do we do if we cannot
assume normality?
The answer is based on one of the most fundamental results in statistical
theory, the Central Limit Theorem. To understand the idea of the central limit
theorem, I will use simulation results.
First I simulate 500 pseudo random numbers using the =rand() command
from EXCEL. Theoretically, this should give me a frequency distribution that is
flat since each value should be equally likely. In the EXCEL file clt.xls, I did this
with the resulting histogram of the 500 simulated values looking like:
Distribution of Uniform Data
70
60
Frequency
50
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
As you can see the sample values came out relatively uniform.
Now I arrange the 500 random numbers into 50 rows, each containing 10
numbers and I averaged the ten numbers. This corresponds to taking 50 random
samples from the uniform distribution, each of size 10, and computing the mean. I
now put the 50 means into a frequency distribution. This is called the sampling
distribution of the sample mean based on a sample size of 10. The resulting
frequency distribution is shown below:
Sampling Distribution of Means of 10 from Uniform
12
10
Frequency
8
6
4
2
1
0
0.
05
0.
1
0.
15
0.
2
0.
25
0.
3
0.
35
0.
4
0.
45
0.
5
0.
55
0.
6
0.
65
0.
7
0.
75
0.
8
0.
85
0.
9
0.
95
0
xbar
Notice that the distribution of the mean values is no longer flat! In fact the
results are beginning to look bell shaped!
I repeated the above procedure, except this time, using some theoretical
results, I simulated 500 values from a distribution which was highly right skewed.
The histogram of the simulated data is shown below:
Right Triangular Distribution
100
90
80
Frequency
70
60
50
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
I then rearranged the data exactly as before into 50 rows of ten numbers and
computed the average of the ten numbers in each row. The histogram of the 50
sample means is shown below:
Sampling Dist Mean of Ten from Right Triangular
16
14
frequency
12
10
8
6
4
2
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
xbar
As can be seen the right skew has almost disappeared and the histogram
again looks bell shaped!
Finally, I generated 500 values randomly from a highly left skewed
distribution with the resulting histogram given below:
Left Triangular Distribution
120
100
Frequency
80
60
40
20
0
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
Again, I rearranged the 500 values into 50 rows of ten numbers and
computed the average for each of the rows. The resulting histogram looked like:
Samling Distribution Mean of 10 from Left Trianglular
16
14
Frequency
12
10
8
6
4
2
1
0.
9
0.
8
0.
7
0.
6
0.
5
0.
4
0.
3
0.
2
0.
1
0
0
xbar
The left skew is much diminished and the resulting curve gain is looking bell
shaped!!
The above examples are all special cases of the central limit theorem. This
remarkable result holds under very general conditions and allows us to make
inferences about means with only one sample of data.
To motivate the Central Limit Theorem (CLT), assume we have a population
of arbitrary shape (right skewed, left skewed, flat, bimodal, etc) with mean  and
standard deviation . Take 1000 random samples each of size n from the
population and for each compute the sample mean, the sample median and the
sample variance. You would wind up with a table that looked something like:
Sample
Sample Values
Sample Statistics
1
x 11 , x 12 ,....., x 1 n
x 1 , med 1 , s 12
x 21 , x 22 ,......, x 2 n
x 2 , med 2 , s 22
.
.
.
.
.
.
x 1000 ,1 , x 1000,2 ......, x 1000,n
2
x 1000 , med 1000 , s 1000
2
.
.
.
.
1000
Now if we took any of the samples of size n, say the 2nd, and put the values
into a frequency distribution, the resulting frequency distribution would tend to
look like the population from which we took the random sample. This would be
true for any of the 1000 random samples.
Instead of doing that, take the 1000 mean values, each based on a random
sample of size n, and put these into a frequency distribution. This is called the
Sampling Distribution of the sample mean based on samples of size n.
The Central Limit Theorem states three things that are unrelated to the
shape of the population distribution:
1) The sampling distribution of the sample mean tends to the normal
distribution;
2) the mean of this normal distribution is given by:
E( x )  
3) the standard deviation of the sampling distribution of the sample mean
(called simply the standard error of the sample mean) is given by:
SD( x )   x 

n
The key feature here is that this tends to occur irrespective of the shape of
the distribution in the population from which the sample was taken.
Note that the Central Limit Theorem says that the distribution tends to the
normal distribution. This tendency is essentially governed by the size of n, the
sample size. If the population distribution is close to normal, then samples of size as
small as 5 will result in sampling distributions for the mean which are close to
normal. If the population distributions are highly skewed, then large sample sizes
will be necessary.
It is the case, however, that most samples of 30 or more, even from the most
skewed populations, will have sampling distributions that are close to the normal
curve.
The Central Limit Theorem also applies to proportions. Suppose we take a
random sample of n people and count the number who are female (say x). Define:
p̂  x / n
which is the proportion of females in the sample. Now define a random variable i
which takes on the value 1 if the ith person is female and 0 if not. Then it is easy to
show that:
    i / n  x / n  p̂
i
In other words, p̂ , the sample proportion of females, is an average. Thus by the
Central Limit Theorem, the sampling distribution of p̂ will be approximately
normal with mean
E ( p̂ )  p
and standard error given by:
SD( p̂ ) 
p( 1  p )
n
It turns out that regresssion coefficients (remember our discussion of
multiple regression in module I) also can be represented as averages so the Central
Limit Theorem also applies to them.
The major advantage of the Central Limit Theorem is since we know that the
sampling distribution of the mean will be normal, we do not have to take thousands
of samples. Instead we can just take one sample and invoke the theoretical result.
The idea of knowing the sampling distribution of a statistics is so critical,
that theoretical statisticians devote a great deal of research to this topic. It turns out
that other statistics have sampling distributions which approach the normal
distribution, even though they are not averages.
For example, if I took the sample medians from the 1000 samples previously
discussed, and put them into a frequency distribution, I would get the sampling
distribution of the sample median of a sample of size n.
Surprisingly, this also approaches the normal distribution. However instead
of being centered around the population mean , the sampling distribution of the
sample median is centered around the population median, say median, and it has
standard error (the standard deviation of the sampling distribution) approximately
equal to:
SD( ˆ median )  ( 1.25 )

n
Thus for the same sample size the sample mean is a more precise estimate of
the population mean than the sample median is of the population median.
At this point you might expect that all sampling distributions are
approximately normal. This is not the case. For example if you took the 1000
values of the sample variances ( s i2 ), and put them into a frequency distribution, you
would not get a bell shaped curve. Instead you would get a right skewed curve
called a  2 distribution.
However, the Central Limit Theorem applies to the most useful measures
used in business situations such as means, proportions, and regression coefficients.
We have now established one way of testing statistical hypotheses of the
form:
H0 :   0
H A :   0
based on a sample of size n. We shall call this the Quality Control method. In
summary we set up the acceptance region:
 0  z / 2

n
 x   0  z / 2

n
If our sample mean falls in this interval, we "accept" the null hypothesis, otherwise
we reject. Of course we must specify our  level. If we use  = .05, then we take
z/2=1.96. If we use  = .01, then we take z/2 = 2.576.
In our case with a mean of 100 and a standard deviation of 10, based on a
sample of size 4 and with  = .05, our acceptance region is 90.2 to 109.8.
Consider the three cases:
a) x  110 in which case we would reject the null hypothesis;
b) x  105 in which case we accept the null hypothesis;
c) x  91.5 in which case we accept the null hypothesis.
A second method is commonly found in research articles. It is simply a rearrangement of the basic quality control results. The following steps show the rearrangement:
P (  0  z / 2

n
 x   0  z / 2

n
)  1 
if and only if,
P (  z / 2

n
 x   0  z / 2

n
)  1 
if and only if,
P (  z / 2 
( x  0 )

 z / 2 )  1  
n
if and only if,
P (  z / 2  z obs  z / 2 )  1  
where
z obs  n ( x   0 ) / 
.
This is called the z-score method and it very easy to apply.
The steps are:
1) Determine  and look up z/2 (in our case  =.05 so z.025 =1.96);
2) Compute z obs  n ( x   0 ) /  ;
3) If  z / 2  z obs  z / 2 then accept H0 ;
4) Otherwise reject H0.
For  = .05, we use the cutoffs +/- 1.96.
So in Case a) z 0 bs  4 ( 110  100 ) / 10  2.00 so we reject the hypothesis
that  = 100 since it is outside the range +/- 1.96.
In Case b) z obs  4 ( 105  100 ) / 10  1.00 so we accept the hypothesis that
 = 100 since it is inside the range +/- 1.96.
In Case c) z obs  4 ( 91.5  100 ) / 10  1.70 so we accept the hypothesis that
 = 100.
Notice that are conclusions are exactly as before since they are based on the
same relationship.
The third method for testing hypotheses is used in most computer programs
(in fact we used it when we did step-wise regression in Module I). One of the steps
in testing a hypothesis is to pick the Type I error rate . The computer program
has no idea what value you will pick so it cannot, for example, perform the z-score
test discussed above. All it could to is compute the zobs statistic and the manager
would still have to look up the appropriate z/2 . One way around this problem is to
compute the probability of the observed statistic or anything more extreme. This is
called the p-value. Since we have defined rare as anything that has a lower
probability than  of occurring, the p-value procedure is very simple:
Step 1: Compute the probability of the observed result or anything more
extreme occurring, call it the p-value:
Step 2: If p-value <  reject the hypothesis;
Step 3: Otherwise (i.e. is p-value   ) accept the hypothesis.
Let us apply this to our three sample cases.
Case a)
Step 1 -- In this case the sample mean value is 110. What is more extreme?
Certainly any value greater than 110 makes us even more skeptical that the mean of
the process is at 100. But this is only the upper tail of the distribution. Remember
our rejection region consisted of values that were too large and also values that were
too small. If we only report the probability of exceeding 110, it can never exceed one
half of  . To get around this problem ,when we are testing hypotheses where the
alternative is non-equality, we simply double the computed value. This is called a
two-sided p-value.
Step 2 -- In many EXCEL functions, the p-value is computed for you.
Unfortunately, EXCEL does not have a direct procedure for the specific example we
are studying. The two sided p-value for this particular case can be computed using
the EXCEL formula:
pvalue  2 * ( 1  normsdist ( sqrt ( n ) * abs( x   0 ) /  ))
This gives the value two-sided p-value = .0455 which is less than .05, so we
reject the hypothesis.
Case b)
In this case the mean is 105. Using the formula given above yields the twosided p-value = .3173 which is greater than .05. Therefore we accept the null
hypothesis.
Case c)
In this case the mean is 91.5. Using the formula given above, the two-sided pvalue = .0891 which is greater than .05. Therefore we accept the null hypothesis.
Notice that once again the conclusions are exactly the same. This is because
we are just doing the same thing in different ways.
Sometimes in reading journals in business you may see both methods two
and three used simultaneously as in the following phrase:
"We tested the hypothesis that the mean agreed with our model's prediction of
100 and found the result in conformance with our theory ( z = 1.6, p= .11)."
What the authors are saying is that they tested the hypothesis that the mean
was equal to 100, obtained a zobs = 1.6 and since a common default value for  is .05,
accepted the hypothesis since the two-sided p-value was .1096 and thus greater than
.05.
Notice that all three of the above methods placed emphasis on the null
hypothesis value for the mean of 100. However, what would happen if we took
exactly the same cases and tested the hypothesis that the mean was equal to 99
instead of 100?
Method 1
Acceptance Region becomes 99 +/- 9.8 = 89.2 to 108.8.
Case a) sample mean = 110  reject
Case b) sample mean = 105  accept
Case c) sample mean = 91.5  accept
Method 2
Case a) zobs = 2.2  reject
Case b) zobs = 1.2  accept
Case c) zobs = -1.5  accept
Method 3
Case a) two-sided pvalue = .0278  reject
Case b) two-sided pvalue = .2301  accept
Case c) two-sided pvalue = .1336  accept
The conclusions are exactly the same! In Case a) we declare that the mean is
not 100 and the mean is not 99. The natural question becomes "Based on this data,
what values of the mean are consistent with the data?"
In Cases b) and c) the same data has indicated that we could say that the
mean is 100 and we can say that the mean is 99!
There is really no conflict since Cases b) and c) are saying that both the
values of 100 and 99 are "consistent" with the data. There is only a problem if you
make the incorrect inference that statistics has "proved" the mean is 100 or 99.
The natural question to ask is "What is the set of all possible values of 
which would be accepted if we did a formal hypothesis test at level ?" The answer
to this question leads us to the fourth method called the method of confidence
intervals.
To apply this method to our particular problem, begin with the quality
control (method 1) acceptance region for a test at level .05 of:
  1.96

n
 x    1.96

n
We could rearrange the left inequality as :
  1.96

x
n

  x  1.96

n
and also rearrange the right inequality as:
x    1.96

n
 
x  1.96

n

This leads to the statement:
x  1.96

n
   x  1.96

n
This is called a confidence interval on  . It is the set of all possible values of 
which if either methods one, two or three were applied testing the null hypothesis,
the resulting conclusion would be “accept the null hypothesis”
In Case a), with sample mean 110, the interval is:
100.2    119.8
This implies that if we hypothesized any value of  between 100.2 and 119.8 , the
observed mean of 110 would lead us to accept the hypothesis. Notice that 100 is not
in the interval so that value would not lead to accepting the hypothesis which means
it would be rejected. This leads us to the rule:
Accept the hypothesis that  = 0 if 0 is in the confidence interval
otherwise reject the null hypothesis.
In Case b), with sample mean equal to 105, the confidence interval becomes:
95.2    114.8
Since 100 is in the interval, we would accept the null hypothesis.
In Case c) with sample mean equal to 91.5, the confidence interval would be:
81.7    101.3
Again 100 is in the interval, so we would accept the null hypothesis.
Notice again that the conclusions in all four cases are exactly the same.
When I derived the confidence interval from the quality control limit, I did
not include the probability statement. This is because the quality control limit is a
statement made before data is collected about what values of the sample mean are
expected if the hypothesized population mean value is 100.
The confidence interval, in contrast, can only be constructed after the data
has been collected (since you need the sample mean). It cannot therefore be a
probabilistic statement in the same way.
Taking  = .05, we know that 95% of the time the sample mean will fall
within the quality control limit. This implies that 95% of the time the confidence
interval will bracket the true population mean, . Therefore there is a 95% chance
that the confidence interval constructed will contain the true mean. We would say
that we have 95% confidence that a particular confidence interval contains the true
mean value.
The general formula for a (1 -  )*100% confidence interval, in this situation,
is:
x  z / 2

n
   x  z / 2

n
Since all four methods will always yield the same conclusions, which of any is
preferred?
Clearly if one is trying to regularly control a process, Method I, the quality
control approach is preferred.
Research articles in business tend to use Method II, the z-score method.
Most computer programs report the p-value of Method III.
However, by far the most practical for most managers are confidence
intervals. The most important reason is that if you reject a hypothesis about the
mean, say that the mean is equal to 100, the natural question is what value does it
have. In Case a) we rejected the null hypothesis mean value of 100. The confidence
interval indicated that the mean had shifted and was somewhere between 100.2 and
119.8. Secondly, even if we accept the hypothesis, the confidence interval indicates
that there is a range of values that are consistent with the data so we are not
tempted to claim that one particular value has been “proven”. For example in Case
b, we accepted the hypothesis that the mean equaled 100, however we could just
have easily have accepted any value between 95.2 and 114.8.
The only drawback is that in some cases, it is complicated to compute the
confidence interval (i.e. the mathematics involved is quite elaborate). However, for
most of the cases used by working managers and executives involving means and
proportions, relatively simple confidence intervals can be used.
Notice also, that one can construct a confidence interval even if one is not
formally testing a hypothesis. This is called interval estimation. Given the observed
data, the confidence interval gives a range of estimates of the population value.
Given the practical implications, I will tend to use confidence intervals
whenever possible.
Before finishing this section, I should point out there is another kind of
hypothesis, called a one-side hypothesis test that you will encounter in most statistics
books. This is usually invoked in a situation like the following:
A supplier requires that the defective rate of delivered good has a defective
rate no higher than 1%. Accordingly, our company does quality control to test the
hypothesis that our manufacturing process does not have a higher defective rate.
The null hypothesis would usually be taken to be
H 0 : p  .01
H A : p  .01
This implies that we are only concerned if we exceed the limit, and are
unconcerned if we have a much lower defective rate. Now any practical executive
would realize that if his product had a significantly lower defective rate than is
required, and he could compete on price, he would want to advertise this fact. Using
the above one-sided structure a significantly lower defective rate could never be
detected.
The only time we will discuss one-sided hypothesis test is in situations that
naturally involve the square of a statistic. When a value deviates too much in a
positive direction, the square will be large. Similarly, when a value deviates too
much in a negative direction, the square will also be large.
Since most business problems wish to determine whether or not a value
diverges from a target value in either a positive or negative direction, we shall do
almost all of our inferences using two-sided alternatives involving confidence
intervals.
Download