ch04

advertisement
1
2
3
4-1 Statistical Inference
• The field of statistical inference consists of those
methods used to make decisions or draw
conclusions about a population.
•These methods utilize the information contained in
a sample from the population in drawing
conclusions.
• Example: Estimate the average height of the class.
4
4-1 Statistical Inference
5
4-2 Point Estimation
6
4-2 Point Estimation
• An estimator should be close in some sense to the
true value of the unknown parameter.
• Θ is an unbiased estimator of  if E(Θ) = 
• If the estimator Θ is not unbiased, then the
difference E(Θ)   is called the bias of the
estimator Θ
7
4-2 Point Estimation
Example 4-1
• Suppose that X is a random variable with mean  and
variance 2.
• Let X1, X2,.., Xn be a random sample of size n from the
population represented by X.
• Show that the sample mean X and sample variance S2 are
unbiased estimators of  and variance 2, respectively.
 n
2 
(
X

X
)
n
 i

1
2
i 1

E (S )  E 
E  ( X i  X )2
n 1

 n  1 i 1


n
2
1
1
 n

2

E ( X i  2 X X i  X ) 
E   X i2  n X 
n  1 i 1
n  1  i 1

2 
1  n
2

E ( X i )  nE( X ) 


n  1  i 1

8
4-2 Point Estimation
Example 4-1
•
•
•
E( X i2 )   2   2
E( X )    
2
2
2
n
n
2 
1

2
2
E (S ) 
E ( X i )  nE( X )


n  1  i 1


1
2
2
2
2

n  n  n  
n 1
2


9
4-2 Point Estimation
Example 4-1
• Sample variance S2 are unbiased estimators of
variance 2.
• Sample standard deviation S are not unbiased
estimators of population standard deviation .
10
4-2 Point Estimation
Different Unbiased Estimators
•
Sometimes there are several unbiased estimators of
the sample population parameter.
• Example: a random sample of size n =10.
1. Sample mean
2. Sample median
3. The first observation X1
•
•
All are unbiased estimator of population X.
Cannot rely on the property of unbiasedness alone
to select the estimator.
11
4-2 Point Estimation
Different Unbiased Estimators
•
•
•
•
Suppose that Θ1 and Θ2 are unbiased estimators of 
The variances of these two distribution of each
estimator may be different.
Θ1 has a smaller variance than Θ2 does.
Θ1 is more likely to produce an estimate close to the
true value .
12
4-2 Point Estimation
How to Choose an Unbiased Estimator
•
A logical principle of estimation is to choose the
estimator that has minimum variance.
13
4-2 Point Estimation
How to Choose a Good Estimator
•
•
•
•
In practice, one must occasionally use a biased
estimator.
For example, S for .
What is the criterion?
Mean square error
14
4-2 Point Estimation
Mean Square Error
•
MSE(Θ)
= E[ΘE(Θ)2] + [E(Θ)]2
= V(Θ)+(bias)2
•
Unbiased estimator  bias = 0
•
A good estimator is the one that minimizes
V(Θ)+(bias)2
15
4-2 Point Estimation
Mean Square Error
•
•
Given two estimator Θ1 and Θ2,
The relative efficiency of Θ1 and Θ2 is defined as
r = MSE(Θ1)/MSE(Θ2)
•
r < 1  Θ1 is better than Θ2.
•
•
•
Example:
Θ1= X: the sample mean of sample size n.
Θ2= Xi: the i-th observation.
16
4-2 Point Estimation
Mean Square Error
•
•
•
r = MSE(Θ1)/MSE(Θ2) = (2/n)/2 = 1/n
Sample mean is a better estimator than a single
observation.
The square root of the variance of an estimator,
V(Θ), is called the standard error of the estimator.
17
4-2 Point Estimation
Methods of obtaining an estimator:
1. Method of Moments Estimator (MME)
Example:
 x
iid
f
(
x
,

)


e
,x 0
X1 , X 2 ,..., X n ~

1
0

E ( X )   xf ( x,  )dx 
If the random sample is really from the population with pdf
f ( x,  ) , the sample mean should resemble the population
mean and the MME of  can be obtained by solving
1
X 
. Therefore, the MME of 
ˆ
1
ˆ
is  
.
X
18
MME – cont’d
Example 2:
X 1 , X 2 ,..., X n iid N (  ,  2 )
~
E (X )  
V ( X )  E( X 2 )  E( X )   2
2
2
The MME of  and  can be obtained by solving
X  ˆ
and
1
2
2
2
.
ˆ
X

X



i
n
Therefore, the MMEs of  and  2 are
ˆ  X
and ˆ 2  1 ( X  X ) 2
 i
n
19
The MMEs of the population parameters can be
obtained by:
1. Express the k population moments as functions of
parameters
2. Replace the notation for population parameters by
those of the MMEs
3. Equate the sample moments to the population
moments
4. Solve the k equations to obtain the MMEs of the
parameters
20
4-2 Point Estimation
Methods of obtaining an estimator:
2. Maximum Likelihood Estimator (MLE)
The likelihood function of  , given the data x1 , x2 ,..., xn
is the joint distribution of X1, X 2 ,..., X n
n
L( )  f ( x1 , x2 ,..., xn ; )   f ( xi ; )
i 1
The idea of ML estimation is to find an estimator of
which maximizes the likelihood of observing x1 , x2 ,..., xn
21
4-2 Point Estimation
2. Maximum Likelihood Estimator (MLE)
Example: X1, X 2 ,..., X n iid f ( x,  )  e  x , x  0
~
n
L( )  f ( x1 , x2 ,..., xn ; )   f ( xi ; )   n e

 
 xi
i
i 1
Maximizing L( ) is equivalent to maximizing l  ln L( )

l  n ln     xi
i
We obtain the MLE by solving the first derivative equation:

n
  xi  0 , hence the MLE of  is
i
1
ˆ

X 22
It can be shown that the MLEs of the mean
2
variance  of the normal distribution are:
 and
ˆ  X
n
1
2
ˆ   ( X i X )
n i 1
2
23
4-3 Hypothesis Testing
The purpose of hypothesis testing is to determine whether
there is enough statistical evidence in favor of a certain
belief about a parameter.
Examples
•Is there statistical evidence in a random sample of
potential customers, that support the hypothesis that
more than p% of the potential customers will purchase
a new products?
•Is a new drug effective in curing a certain disease? A
sample of patient is randomly selected. Half of them
are given the drug where half are given a placebo. The
improvement in the patients conditions is then measured
24
and compared.
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
We like to think of statistical hypothesis testing as the
data analysis stage of a comparative experiment,
in which the engineer is interested, for example, in
comparing the mean of a population to a specified
value (e.g. mean pull strength).
25
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
The critical concepts of hypothesis testing.
•
There are two hypotheses (about a population parameter)
[ for example m = 5]
H 0 The null hypothesis
H1 The alternative hypothesis [m > 5]
Assume the null hypothesis is true.
Build a statistic related to the parameter hypothesized.
Pose the question: How probable is it to obtain a statistic
value at least as extreme as the one observed from the sample
26
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
The critical concepts of hypothesis testing-Continued
•
Make one of the following two decisions (based on the test):
Reject the null hypothesis in favor of the alternative
hypothesis.
Do not reject the null hypothesis in favor of the alternative
hypothesis.
•Two types of errors are possible when making the decision
whether to reject H0
Type I error - reject H0 when it is true.
Type II error - do not reject H0 when it is false.
27
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
For example, suppose that we are interested in the
burning rate of a solid propellant used to power aircrew
escape systems.
• Now burning rate is a random variable that can be
described by a probability distribution.
• Suppose that our interest focuses on the mean burning
rate (a parameter of this distribution).
• Specifically, we are interested in deciding whether or
28
not the mean burning rate is 50 centimeters per second.
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
Two-sided Alternative Hypothesis
One-sided Alternative Hypotheses
29
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
Test of a Hypothesis
• A procedure leading to a decision about a particular
hypothesis
• Hypothesis-testing procedures rely on using the information
in a random sample from the population of interest.
• If this information is consistent with the hypothesis, then we
will conclude that the hypothesis is true; if this information is
inconsistent with the hypothesis, we will conclude that the
30
hypothesis is false.
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Based on the hypotheses and the information in the
sample, the sample space is divided into two parts:
Reject Region and Accept Region.
31
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Rejection region (RR) is the subset of sample space leads to the
rejection of the null hypothesis. RR  {x  48.5, orx  51.5} in Fig4-3.
The complement of the rejection region is the acceptance region.
AR  {48.5  x  51.5} .
32
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
•The Probabilities of committing errors are calculated to
determine the performance of a test.
Sometimes the type I error probability is called the
significance level, or the -error, or the size of the test.
33
  P{X  48.5orX  51.5   50}
 P{X  48.5}  P{X  51.5}
48.5  50
51.5  50
 P{Z 
}  P{Z 
}
2.5
2.5
10
10
 P{Z  1.9}  P{Z  1.9}
 2 0.0288
 0.0576
34
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
If the mean is 50, the
probability of obtaining a
sample mean less than 48.5
or greater than 51.5 is 0.0576.
35
  P{48.5  X  51.5 H1istrue}
Since the alternative hypothesis is composite, there are many
distributions in the corresponding subset of parameter space.
Therefore, the probability of committing type II error depends
on the distribution chosen to calculate  .
  P{48.5  X  51.5   52}
48.5  52
51.5  52
 P{
Z 
}
2.5
2.5
10
10
 P{4.43  Z  0.63}
 P{Z  0.63}  P{Z  4.43}  0.2643
36
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
If the mean is actually 52,
the probability of falsely
accept H0 :   50 is 0.2643
37
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Similarly, the probability of
falsely accept H0 :   50
when   50.5 is 0.8923,
which is much higher than
the previous case.
It is harder to detect the
difference if two distribution
are close.
38
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
The probabilities of
committing errors depends
also on the sample size n, the
amount of information.
n  10
  0.2643
n  16
  0.2119

decreases as n increases.
39
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
1.

can be decreased by making the AR larger, at the price that will
increase.
2. The probability of type II error is a function of population mean

3. The probabilities of committing both types of error can be reduced at
the same time only by increasing the sample size.
40
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
• The power is computed as 1 - , and power can be interpreted as
the probability of correctly rejecting a false null hypothesis. We
often compare statistical tests by comparing their power properties.
• For example, consider the propellant burning rate problem when
we are testing H 0 :  = 50 centimeters per second against H 1 :  not
equal 50 centimeters per second . Suppose that the true value of the
mean is  = 52. When n = 10, we found that  = 0.2643, so the
power of this test is 1 -  = 1 - 0.2643 = 0.7357 when  = 52.
41
Example #4-19.(p157)
 X  175 185 175
(a)  P( X  185  175)  P


 20 / 10 20 / 10 
= P(Z > 1.58) = 1  P(Z  1.58) = 1  0.94295
= 0.057
 X  200 185 200


(b)  P( X  185  200)  P
 20 / 10 20 / 10 
= P(Z  2.37) = 0.00889.
c) 1 -  = 1 – 0.00889 = 0.99111 (The power of the
test when mean=200)
42
Example #4-20 (p157).
a) Reject the null hypothesis and conclude that the
mean foam height is greater than 175 mm.
b) P( X  190  175)  P( Z 
190 175
)
20 / 10
 P(Z  2.4)  0.0082
x
The probability that a value of at least 190 mm
would be observed (if the true mean height is 175
mm) is only 0.0082. Thus, the sample value of =
190 mm would be an unusual result.
43
4-3 Hypothesis Testing
4-3.3 P-Values in Hypothesis Testing
44
Example #4-21 (p157).
Using n = 16:
(a)  P( X  185  175)  P(Z  2)  0.0228
(b)P( X  185  200)  P(Z  3)  0.00135
(c)1    1  0.00135 0.99865
45
Example #4-22(a) (p157).
n = 16:
c  175
)
a) 0.0571 = P( X  c   175)  P(Z 
20 / 16
20
c  175 1.58
 182.9
16
46
4-3 Hypothesis Testing
4-3.3 One-Sided and Two-Sided Hypotheses
Two-Sided Test:
One-Sided Tests:
47
4-3 Hypothesis Testing
4-3.5 General Procedure for Hypothesis Testing
48
4-4 Inference on the Mean of a Population,
Variance Known
Assumptions
49
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
We wish to test:
The test statistic is:
50
4-4 Inference on the Mean of a Population,
Variance Known
•The analyst selects critical values according to a preassigned  ,
The rejection of H0 is a “strong conclusion”.
•  depends on the sample size and the true value of the parameter,
“fail to reject H0” is a “weak conclusion”. To accept H0, we need
more evidence.
• The critical values are determined by
2-sided hypothesis:
  P{ z  Z 0  z }
2
2
1-sided hypothesis:
  P{Z0   z }
, or
  P{Z0  z }
51
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
Reject H0 if the observed value of the test statistic z0 is
either:
z0 > z/2 or z0 < -z/2
Fail to reject H0 if
-z/2 < z0 < z/2
where z0 is the observation of Z0
52
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
Reject H0 if the observed value of the test statistic z0 is
either:
or
Fail to reject H0 if
53
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
54
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
55
Example #4-33 (p173)
a) 1) The parameter of interest is the true mean yield, .
2) H0 :  = 90
3) H1 :   90
4)  = 0.05
5) z0 
x
/ n
6) Reject H0 if z0 < z /2 where z0.025 = 1.96 or
z0 > z/2 where z0.025 = 1.96
7) x  90.48 ,  = 3
z0 
90.48  90
3/ 5
 0.36
8) Since 1.96 < 0.36 < 1.96 do not reject H0 and
conclude the yield is not significantly different from
90% at  = 0.05.
56
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 P-Values in Hypothesis Testing
57
2-sided
H1 :   0
P-value  P{reject H 0 if the critical value is z0   0}
 P{Z0   z0orZ0  z0}  2[1  ( z0 )]
1-sided
H1 :   0 P-value  P{Z0  z0}
H1 :   0
P-value  P{Z0  z0}
 1  ( z0 )
 ( z0 )
58
•The p-value of a test is the probability of observing a
test statistic at least as extreme as the one computed,
given that the null hypothesis is true.
•The p - value provides information about the amount of
statistical evidence that supports the alternative
hypothesis.
•We can conclude that the smaller the p-value
the more statistical evidence exists to support the
alternative hypothesis.
59
4-4 Inference on the Mean of a Population,
Variance Known
4-4.2 P-Values in Hypothesis Testing
60
Example #4-33 (p173)-continued
a) P-value = P{ Z0  0.36}  2[1  (0.36)]
 1  0.64058  0.7188
Interpreting the p-value
Because the probability that the absolute value of the
standardized test statistics assume a value of more
than 0.36 when   90 is so large (0.7188), there are
reasons to believe that the null hypothesis is true.
61
The p-value and rejection region methods
The p-value can be used when making decisions
based on rejection region methods as follows:
•Define the hypotheses to test, and the required
significance level .
•Perform the sampling procedure, calculate the
test statistic and the p-value associated with it.
•Compare the p-value to . Reject the null
hypothesis only if p   ; otherwise, do not
reject the null hypothesis.
62
Conclusions of a test of Hypothesis
•If we reject the null hypothesis, we conclude that
there is enough evidence to infer that the alternative
hypothesis is true.
•If we do not reject the null hypothesis, we conclude
that there is not enough statistical evidence to infer
that the alternative hypothesis is true.
63
4-4 Inference on the Mean of a Population,
Variance Known
4-4.3 Type II Error and Choice of Sample Size
Finding The Probability of Type II Error 
64
4-4 Inference on the Mean of a Population,
Variance Known
4-4.2 Type II Error and Choice of Sample Size
Finding The Probability of Type II Error 
65
Z0 
For a 2-sided test, recall
X  0

n
  P{ z  Z 0  z   0   }
2
 P{ z 
2
2



n
X  ( 0   )

 z 
2
n
 n
 n
 ( z 
)  (  z 
)
2
2




}
n
66
If   0 , the second term can be neglected,
 n
 z  z 
2

Solving for n, we have
( z  z ) 
2
n
2

2
2
67
4-4 Inference on the Mean of a Population,
Variance Known
4-4.2 Type II Error and Choice of Sample Size
Sample Size Formulas
68
4-4 Inference on the Mean of a Population,
Variance Known
4-4.2 Type II Error and Choice of Sample Size
Sample Size Formulas
69
  P{Z0  z   0   }
 P{
X  ( 0   )

n
 z 


}
n
 n
 ( z 
)

( z  z  ) 
 n
 z   z 
We have n 
2


2
2
70
4-4 Inference on the Mean of a Population,
Variance Known
4-4.2 Type II Error and Choice of Sample Size
71
Example #4-33(p173)-continued   85 ,   0.05 ,   0.05

b) n =

2 2
z / 2  z 

2
 z0.025  z0.05  3
2 2

85  90
2
.  165
.  9
196
2

 5
2
 4.67
n  5.
c) If n=5,   92
92  90
92  90
  ( z0.025 
)  (  z0.025 
)
3
3
5
5
= (1.96 – 1.49)  (1.96 – 1.49)
= (0.47)  (–3.45)
= 0.680822  0.000280
= 0.680542
72
4-4 Inference on the Mean of a Population,
Variance Known
4-4.2 Type II Error and Choice of Sample Size
73
4-4 Inference on the Mean of a Population,
Variance Known
4-4.3 Large Sample Test
• We assume that σ2 is known
• In most practical situation, σ2 is unknown
74
4-4 Inference on the Mean of a Population,
Variance Known
4-4.3 Large Sample Test
In general, if n  30, according to the Central
Limit Theorem,
X n  N ( ,

2
n
)
the sample variance s2 will be close to σ2 for
most samples, and so s can be substituted for
σ in the test procedures with little harmful
effect.
75
4-4 Inference on the Mean of a Population,
Variance Known
4-4.4 Some Practical Comments on Hypothesis
Testing
The Seven-Step Procedure
Only three steps are really required:
76
4-4 Inference on the Mean of a Population,
Variance Known
4-4.4 Some Practical Comments on Hypothesis
Testing
Statistical versus Practical Significance
77
4-4 Inference on the Mean of a Population,
Variance Known
4-4.4 Some Practical Comments on Hypothesis
Testing
Statistical versus Practical Significance
If the sample size becomes very large, it is almost sure that the
null hypothesis will be rejected and conclude that the true mean
is not equal to 0.
78
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
An interval estimator draws inferences
about a population by estimating the value
of an unknown parameter using an interval.
79
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Two-sided confidence interval:
One-sided confidence intervals:
Confidence coefficient:
80
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
81
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
How is an interval estimator produced from a
sampling distribution?
To estimate  , a sample of size n is drawn from
the population, and its mean X is calculated.
Under certain conditions, X is normally
distributed (or approximately normally
distributed.), thus x  
Z
82
 n
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
83
– We know that


P(  z  2
 x    z 2
)  1 
n
n
–This leads to the relationship
P( x  z  2


   x  z 2
)  1 
n
n
--1 - of all the values of x obtained
in repeated
X
sampling from this distribution, construct an interval

 

x  z 2 n , x  z 2 n 


that includes (covers) the expected value of the
population.
84
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
85
Example
Discuss Example 4-5
86
Example #4-33(p173)–continued
(d) Find a 95% two-sided CI on the true mean yield.
For  = 0.05, z 2  z0.025  1.96 , x  90.48
l  x  z
u  x  z

2
2
3
 90.48  1.96
 87.85
n
5

3
 90.48  1.96
 93.11
n
5
Therefore, 87.85    93.11is the 95% CI on the
true mean yield.
With 95% confidence, we believe the true mean
yield of the chemical process is between 87.85%
and 93.11%.
87
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Relationship between Tests of Hypotheses and
Confidence Intervals
If [l,u] is a 100(1 - ) percent confidence interval for the
parameter, then the test of significance level  of the
hypothesis
will lead to rejection of H0 if and only if the hypothesized
value is not in the 100(1 - ) percent confidence interval
88
[l, u].
Interval estimators can be used to test
hypotheses.
Calculate the 1   confidence level interval
estimator, then
if the hypothesized parameter value falls within
the interval, do not reject the null hypothesis,
if the hypothesized parameter value falls outside
the interval, conclude that the null hypothesis can
be rejected (m is not equal to the hypothesized
89
value).
Example #4.33(p.173)-continued
(e) Use the CI found in (d) to test the hypothesis.
The 95% CI on the true mean yield is
[87.85,93.11].
The null hypothesis cannot be rejected since
the hypothesized value, 90%, lies within this
interval.
90
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Confidence Level and Precision of Estimation
The length of the two-sided 95% confidence interval is
whereas the length of the two-sided 99% confidence
interval is
•Because the 99% confidence interval is wider, it is more
likely to include the value of .
91
Interpreting the interval estimate
It is wrong to state that the interval estimator is an
interval for which there is 1   chance that the population
mean lies between the l and the u .
This is so because the  is a parameter, not a random
variable.
Note that l and
u are random variables.


l  x  z 2
u  x  z 2
n
n
Thus, it is correct to state that there is chance 1  
that l will be less than  and u will be greater than

92
.
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
The width of the interval estimate is a function of:
the population standard deviation, the confidence
level, and the sample size.
We can control the width of the interval estimate by
changing the sample size.
Thus, we determine the interval width first, and derive the
93
required sample size.
The phrase “estimate the mean to within E units”,
translates to an interval estimate of the form
E  X    z
 z 
2

n
 E


2




n
2
94
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
95
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
96
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
97
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Confidence Interval on the Mean
One-Sided Confidence Bounds
98
Example #4.36(p174)
a) 1) The parameter of interest is the true mean life, .
2) H0:  = 540
3) H1:   540
4)  = 0.05
5) z0 
x
/ n
6) Reject H 0 if z0 z where z0.05  1.65
7) x  551.33,   20, z 0 
551.33  540
 2.19
20 / 15
8) Since 2.19 > 1.65, reject the null hypothesis and conclude there
is sufficient evidence to support the claim the life exceeds 540
99
hrs at  = 0.05.
b) P-value = P(Z > 2.19) = 1  P(Z  2.19)
 1  (2.19)  0.0143

(560  540) 15 

c)  =  z 0.05 


20


= (1.65  3.873) = (2.223) = 0.0131

z
d) n =

 z  
2
2
2

z0.05  z0.10  

2
2
(560 540) 2
(1.65  1.29) 2 (20) 2

 8.644,
2
(20)
n  9.
100
e) x  z0.050

n

551.33  1.645
20

15
542.83  
With 95% confidence, the true mean life is at
least 542.83 hrs.
f) Since 540 does not fall within this interval, we
can reject the null hypothesis in favor of
the alternative.
101
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 General Method for Deriving a Confidence
Interval
102
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
When  is unknown, we will replace it by an estimator of it.
X  0
T0 
S
n
~ t (n  1)
Since S is a random variable, that increases the variation in T0 .
103
Theorem Let X1, X 2 ,..., X n be a random sample of size n
from N (, ) . Then
2
(1) X ~ N (  ,
(2)
2
n
2
(
X

X
)
 i
2
(3) X and S
2
)

(n  1) S 2
2
~  2 (n  1)
are independent.
104
Definition Let Z ~ N (0,1) and
Y ~  2 ( )
be
independent. Then the random variable defined by
Z
T
Y
( )
has the Student’s t-distribution with

degrees of
freedom.
• The t probability density function is
[(k  1) / 2]
1
f ( x) 

( k 1) 2
2
k [k 2] ( x k )  1


  x  
105
T0 can be rewritten as
X  0
X  0
T0 

S
n

n
(n  1) S 2
2
(n  1)
and we have the following result.
106
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
107
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
108
Example
P(T (1)  .325)  .4
P(T (10)  1.812)  .05
P(1.812  T (10)  1.812)  0.9
P(T (4)  2.776)  1  0.025  0.975
P(T (4)  2.776 or T (4)  2.776)  0.05
109
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
Calculating the P-value
110
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
111
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
112
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
Example 4-7
• Discuss Example 4-7
• Parameter of interest: the mean coefficient of
restitution 
• Null Hypothesis, H0: =0.82
• Alternative Hypothesis, H1: >0.82
x  0
• Test statistic:
t0 
s/ n
113
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
114
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
Example 4-7
• Reject H0: if the P-value < 0.05
• Computations: x  0.83725 t0  2.72
• Conclusion:
P(T (14)  2.72)  0.008297
• REJECT
115
Example #4.50 (p.185) A particular brand of diet
margarine was analyzed to determine the level of
polyunsaturated fatty acid. A sample of 6 packages resulted
in the following data: 16.8, 17.2, 16.9, 17.4, 16.5, 17.1.
Normal Probability Plot
.999
.99
.95
Probability
The normality
assumption
appears to be
satisfied.
.80
.50
.20
.05
.01
.001
16.5
16.6
16.7
16.8
16.9
17.0
17.1
17.2
17.3
17.4
Fatty
Av erage: 16.9833
StDev : 0.318852
N: 6
Anderson-Darling Normality Test
A-Squared: 0.143
P-Value: 0.934
116
16.98  17
0.319 / 6
 0154
.
a) 1) The parameter of interest is the true mean level of
polyunsaturated fatty acid, .
2) H0:  = 17
3) H1:   17 4)  = 0.01
x  0
5) t0 
s
n
6) RejectH 0 if t0  t 2 ,n 1 where t0.005,5  4.032 or
t0  t0.005,5  4.032
16.98  17
 0.154
7) x = 16.98 s = 0.318 n = 6 t0 
0.318
6
8) Since 4.032 < 0.154 < 4.032, do not reject the null
hypothesis and conclude the true mean level is not
significantly different from 17% at  = 0.01.
117
P-value = 2P(t > 0.154): for degrees of freedom of 5
we obtain 2(0.40) < P-value
0.80 < P-value
118
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
119
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.2 Type II Error and Choice of Sample Size
=0+
Where T0 has a noncentral t-distribution.
Fortunately, this unpleasant task has already been done,
and the results are summarized in a series of graphs in
Appendix A Charts Va, Vb, Vc, and Vd that plot for the
t-test against a parameter  for various sample sizes n.
120
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.2 Type II Error and Choice of Sample Size
These graphics are called operating characteristic
(or OC) curves. Curves are provided for two-sided
alternatives on Charts Va and Vb. The abscissa scale
factor d on these charts is defined as
121
b) Using the OC curves on Chart Vb, with
d = 1.567, n = 10, when   0.1.
Therefore, the current sample size of 6 is inadequate.
122
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.3 Confidence Interval on the Mean
P(t / 2,n1  T  t / 2,n1 )  1  
X 
P(t / 2,n1 
 t / 2,n1 )  1  
S/ n
Rearrange the above equation
P( X  t / 2,n1S / n    X  t / 2,n1S / n )  1  
123
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.3 Confidence Interval on the Mean
124
c) For  = 0.01, n=6, t0.005,5  4.032
0.318
16.98  4.032
6
we have 16.455    17.505.
With 99% confidence, we believe the true mean level
of polyunsaturated fatty acid is between 16.455% and
17.505%.
125
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.3 Confidence Interval on the Mean
126
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.4 Confidence Interval on the Mean
127
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
Test Statistic:
128
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
129
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
130
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
Chi-squared
distribution
is a skewed
distribution.
131
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
132
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
133
4-6 Inference on the Variance of a
Normal Population
4-6.1 Hypothesis Testing on the Variance of a
Normal Population
Discuss Example 4-10
134
4-5 Inference on the Mean of a Population,
Variance Unknown
4-6.2 Confidence Interval on the Variance of a
Normal Population
P( 
2
1 / 2, n 1
(n  1)S
2

  / 2,n1 )  1  
2

2
Rearrange the above equation
2
(n 1)S 2
(
n

1
)
S
2
P( 2
  2
)  1
 / 2,n1
1 / 2,n1
135
4-6 Inference on the Variance of a
Normal Population
4-6.2 Confidence Interval on the Variance of a
Normal Population
136
4-6 Inference on the Variance of a
Normal Population
4-6.2 Confidence Interval on the Variance of a
Normal Population
Discuss Example 4-11
137
Example 4-59(p.191) n=15, s=0.016mm
(a) In order to use 2 statistic in hypothesis testing and confidence
interval construction, we need to assume that the underlying
distribution is normal.
1)The parameter of2interest is the true standard deviation
 .
of the diameter,
2
2
H
:

 0.0004
H
:


0
.
0004
1
2) 0
vs.
3)  = 0.05
2
2
(
n

1
)
s
14

0
.
016
4)  2 

 8.96
0
2
2
0
0.02
2
2



0
0.05,14  23.68
5) Reject H 0 if
6) Since 8.96 < 23.685 do not reject H 0 and conclude
there is insufficient evidence to indicate the true
standard deviation of the diameter exceeds 0.02 at
 = 0.05.
138
2
P-value = P(   8.96)
for 14 degrees of freedom: 0.5 < P-value < 0.9
2

b) 95% lower confidence interval on
For  = 0.05 and n = 15,
2,n1  02.05,14  23.68
14  0.016
 2
23 .68
The 95% lower confidence interval on
2
is
0.00015  2
c) Based on the lower confidence bound, we cannot
reject the null hypothesis.
139
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
We will consider testing:
140
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
141
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
142
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
143
4-7 Inference on Population Proportion
4-7.2 Type II Error and Choice of Sample Size
144
4-7 Inference on Population Proportion
4-7.2 Type II Error and Choice of Sample Size
145
4-7 Inference on Population Proportion
4-7.2 Type II Error and Choice of Sample Size
Discuss Example 4-13
146
Example #4-75(p201) In the process of manufacturing lens,
a machine will be qualified if the percentage of the polished lens
contains surface defects less than 4%. A random sample of 300
lens contains 11 defective lenses.
(a) Formulate and test the hypotheses at   0.05
1) The parameter of interest is the true percentage of polished
lenses that contain surface defects, p.
2) H0 : p = 0.04 vs. H1 : p < 0.04
3)  = 0.05
pˆ  p0
4) z0  x  np0 
np0 (1  p0 )
p0 (1  p0 )
n
6) Reject H 0 if z0   z   z0.05  1.645
11 300 0.04
7) Since z0 
 0.295  1.645
300 0.04 0.96
do not reject the null hypothesis and conclude the machine
147
cannot be qualified at the 0.05 level of significance.
b) P-value = (-0.295) = 0.386
= 0.614
c)   P ( Z  z0 p  0.02 )  1   (
p0  p  z
p0 (1  p0 ) / n
p (1  p ) / n
)
0.04  0.02  1.645 0.04 0.96 / 300
 1  (
)
0.02 0.98 / 300
 1  (0.1648)  0.4345
d) (4-69)
2
 z p0 (1  p0 )  z p(1  p)  1.645 0.04 0.96  1.645 0.02 0.98 
n
 

p

p
0
.
04

0
.
02
0

 

 763 .6
Take n=764.
148
2
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
P( z / 2  Z  z / 2 )  1  

P p
P( z / 2 
 z / 2 )  1  
p(1  p) n

P
X
n
Rearrange the above equation


p(1  p)
p(1  p)
P( P  z / 2
 p  P  z / 2
)  1
n
n
149
i
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
But we don’t know p! (Cannot construct CI!)

Idea: Use P instead of p!

P( P  z / 2
 
 

P(1  P)
P(1  P)
 p  P  z / 2
)  1
n
n
150
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
151
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
152
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
Choice of Sample Size

• P is the point estimator of p
•

E=| P - p|
153
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
Choice of Sample Size
p+1/p 154
 4
Example
E  pˆ  p  0.03
  0.05  z  1.96
2
z p(1  p)
2
n
2
E
2
1.962

 1067.11
2
4  0.03
Round up to n=1068
155
4-8 Other Interval Estimates for a
Single Sample
4-8.1 Prediction Interval
•
In some situation, we are interested in predicting a
future observation of a random variable
•
Find a range of likely values for the variable
associated with making the prediction
•
Given X1,..,Xn, to predict Xn+1  [?,?]
156
4-8 Other Interval Estimates for a
Single Sample
4-8.1 Prediction Interval
To predict the value of a single future value, X n1 .
The point estimator is
X
.
The expected value of the prediction error, E( X n1  X )  0
1
2
V
(
X

X
)


(
1

)
The variance of the prediction error,
n 1
n
X X
When  2 is unknown, T  n1
has t n 1 distribution.
s 1 1
n
157
4-8 Other Interval Estimates for a
Single Sample
4-8.1 Prediction Interval
Discuss Example 4-16
158
4-8 Other Interval Estimates for a
Single Sample
4-8.2 Tolerance Intervals for a Normal Distribution
159
4-10 Testing for Goodness of Fit
• So far, we have assumed the population or probability
distribution for a particular problem is known. (parametric
approach).
• There are many instances where the underlying
distribution is not known, and we wish to test a particular
distribution. (nonparametric approach無母數統計).
• Use a goodness-of-fit test procedure based on the chisquare distribution.
160
4-10 Testing for Goodness of Fit
• Oi: the observed frequency in the i-th class interval
• Ei: the expected frequency in the i-th class interval from
the hypothesized probability distribution
Test Statistics:
161
Where
Oi
Ei is the expected number of observations in ith group, and
is the observed number observations in the ith group.
Then
Theorem:
X ~  (k  p 1)
2
0
2
Where k is the number of groups
p is the number of parameters estimated by the sample.
162
4-10 Testing for Goodness of Fit
• Discuss Example 4-18
163
Example #4-85(p207)
Value
0
Observed
8
Frequency
Expected
9.1
Frequency
P(2.4)
1
25
2
23
3
21
4
16
5
7
21.8
26.2
20.9
12.5
6.02
P(X=0)=0.091, P(X=1)=0.218, P(X=2)=0.262, P(X=3)=0.209,
P(X=4)=0.125, P(X=5)=0.06, P(X>5)=0.035
k=6, p=1 df=4
2
2
(
8

9
.
1
)
(
7

6
.
02
)
X 02 
 ... 
 2.1   02.05, 4  9.49
9.1
6.02
Do not reject H 0
2
P-value = P( X  2.1) lies between 0.5 and 0.9 (p.437). 164
Download