Testing the Population Variance

advertisement
Ch 12 實習
Introduction

We shall develop techniques to estimate
and test three population parameters.



2
Population mean m
Population variance s2
Population proportion p
Jia-Ying Chen
Inference About a Population Mean When
the Population Standard Deviation Is
Unknown
Recall that when s is known we use the following
statistic to estimate and test a population mean
z
xm
s
n
When s is unknown, we use its point estimator s,
and the z-statistic is replaced then by the t-statistic
3
Jia-Ying Chen
The t - Statistic
t
The t distribution is mound-shaped,
and symmetrical around zero.
d.f. = v2
v1 < v2
4
d.f. = v1
0
xm
s
n
The “degrees of freedom”,
(a function of the sample size)
determine how spread the
distribution is (compared to the
normal distribution)
Jia-Ying Chen
自由度
5

統計學上的自由度(degree of freedom,
df),是指當以樣本的統計量來估計總體的
參數時, 樣本中獨立或能自由變化的資料
的個數,稱為該統計量的自由度

Ex:
Jia-Ying Chen
How to calculus sample variance
From the data we have
 x ,  x , thus

x

x 
2
i
i
2
s 
2
6
2
i
i
n 1
n
Jia-Ying Chen
Testing m when s is unknown

7
Example 1

In order to determine the number of workers
required to meet demand, the productivity of newly
hired trainees is studied.

It is believed that trainees can process and distribute
more than 450 packages per hour within one week
of hiring.

Can we conclude that this belief is correct, based on
productivity observation of 50 trainees
Jia-Ying Chen
Testing m when s is unknown

Example 1 – Solution



The problem objective is to describe the
population of the number of packages
processed in one hour.
The data are interval.
H0:m = 450
H1:m > 450
The t statistic
t
8
x m
s
n
d.f. = n - 1 = 49
Jia-Ying Chen
Testing m when s is unknown

Solution continued (solving by hand)

The rejection region is From the data w e hav e
2
t > ta,n – 1
x

23
,
019
x
i
 i  10,671,357, thus
ta,n - 1 = t.05,49
@ t.05,50 = 1.676.
23,019
x
 460.38, and
50

x

x 

n

2
s2
2
i
i
n 1
s  1507.55  38.83
9
 1507.55.
Jia-Ying Chen
Testing m when s is unknown
Rejection region

The test statistic is
t
10
x m
s
n
1.676

460.38  450
38.83
50
1.89
 1.89

Since 1.89 > 1.676 we reject the null hypothesis in
favor of the alternative.

There is sufficient evidence to infer that the mean
productivity of trainees one week after being hired
is greater than 450 packages at .05 significance
level.
Jia-Ying Chen
Estimating m when s is unknown

Confidence interval estimator of m when s
is unknown
x  ta
11
s
2
n
d.f .  n  1
Jia-Ying Chen
Estimating m when s is unknown

Example 2



12
An investor is trying to estimate the return
on investment in companies that won
quality awards last year.
A random sample of 83 such companies is
selected, and the return on investment is
calculated had he invested in them.
Construct a 95% confidence interval for the
mean return.
Jia-Ying Chen
Estimating m when s is unknown

Solution (solving by hand)



The problem objective is to describe the
population of annual returns from buying
shares of quality award-winners.
The data are interval.
x  15.02 s 2  68.98
s  68.98  8.31
Solving by hand

From the data we determine
x  ta
13
2, n 1
s
@ 15.02  1.990
n
t.025,82@ t.025,80
8.31
83
 13.19,16.85
Jia-Ying Chen
Checking the required conditions



14
We need to check that the population is
normally distributed, or at least not
extremely nonnormal.
There are statistical methods to test for
normality
From the sample histograms we see…
Jia-Ying Chen
A Histogram for Example 1
14
12
10
8
6
4
2
0
400
425
450
475
500
525
550
Packages
A Histogram for Example 2
30
575
More
25
20
15
10
5
0
-4
15
2
8
14
Returns
22
30
More
Jia-Ying Chen
Summary of Test Statistics to be Used in a
Hypothesis Test about a Population Mean
Yes
s known ?
Yes
n > 30 ?
No
Yes
Use s to
estimate s
s known ?
Yes
z
16
x m
s/ n
No
x m
t
s/ n
x m
z
s/ n
No
Popul.
approx.
normal
?
No
Use s to
estimate s
x m
t
s/ n
Increase n
to > 30
Jia-Ying Chen
Example 1
17

How much money do winners go home with from the
television quiz show Feopardy? To determine an
answer, a random sample of winners was drawn and
the amount of money each won was recorded and is
listed here. Estimate with 95% confidence the mean
winnings for all show’s players (Assume the random
variable is normally distributed)

26650 6060 52820 8490 13660
25840 49840 23790 51480 18960
990 11450 41810 21060 7860
Jia-Ying Chen
Solution
18
Jia-Ying Chen
Example 2
19

A federal agency responsible for enforcing laws governing weights
and measures routinely inspects packages to determine whether the
weight of the contents is at least as great as that advertised on the
package. A random sample of 18 containers whose packaging
states that the contents weigh 8 ounces was drawn. The contents
were weighted and the results follows. Can we concluded at the
1% significance level that on average the containers are mislabeled?
(Assume the random variable is normally distributed)

7.80 7.91 7.93 7.99 7.94 7.75
7.97 7.95 7.79 8.06 7.82 7.89
7.92 7.87 7.92 7.98 8.05 7.91
Jia-Ying Chen
Solution

H0:μ=8
H1:μ<8
There is enough evidence to conclude that
the average container is mislabeled
20
Jia-Ying Chen
Inference About a Population Variance


Sometimes we are interested in making
inference about the variability of
processes.
Examples:



21
The consistency of a production process for
quality control purposes.
Investors use variance as a measure of risk.
To draw inference about variability, the
parameter of interest is s2.
Jia-Ying Chen
Inference About a Population Variance


The sample variance s2 is an unbiased,
consistent and efficient point estimator for s2.
(n  1) s 2
The statistic
has a distribution
2
s
called Chi-squared, if the population is
normally distributed.
d.f. = 5
d.f. = 10
22
Jia-Ying Chen
Testing the Population Variance

Example 3 (operation management application)



23
A container-filling machine is believed to fill 1
liter containers so consistently, that the variance
of the filling will be less than 1 cc (.001 liter).
To test this belief a random sample of 25 1-liter
fills was taken, and the results recorded
Do these data support the belief that the
variance is less than 1cc at 5% significance level?
Jia-Ying Chen
Testing the Population Variance

Solution



24
The problem objective is to describe the population
of 1-liter fills from a filling machine.
The data are interval, and we are interested in the
variability of the fills.
The complete test is:
H0: s2 = 1
H1: s2 <1
Jia-Ying Chen
Testing the Population Variance
• Solving by hand
– Note that (n - 1)s2 = S(xi - x)2 = Sxi2 – (Sxi)2/n
– From the sample, we can calculate Sxi = 24,996.4,
and Sxi2 = 24,992,821.3
– Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78
There is insufficient evidence
to reject the hypothesis that
the variance is less than 1.
25
Jia-Ying Chen
Testing the Population Variance
a = .05
1-a = .95
Rejection
region
 2  13.8484
13.8484 20.8
2
.295,251
Do not reject the null hypothesis
26
Jia-Ying Chen
Testing and Estimating a Population
Variance

From the following probability statement
P(21-a/2 < 2 < 2a/2) = 1-a
we have (by substituting 2 = [(n - 1)s2]/s2.)
27
Jia-Ying Chen
Example 3


28
With gasoline prices increasing, drivers are becoming
more concerned with their cars’ gasoline consumption.
For the past 5 years, a driver has tracked the gas
mileage of his car and found that the variance from
fill-up to fill-up was σ2=23 mpg2. Now that his car is 5
years old, he would like to know whether the
variability of gas mileage has changed. He recorded
the gas mileage from his last eight fill-ups; these are
listed here. Conduct a test at a 10% significance level
to infer whether the variability has changed.
28 25 29 25 32 36 27 24
Jia-Ying Chen
Solution

29
H0:σ2=23
H1:σ2≠23
Jia-Ying Chen
Example 4


30
During annual checkups physician routinely send their
patients to medical laboratories to have various tests
performed. One such test determines the cholesterol
level in patients’ blood. However, not all tests are
conducted in the same way. To acquire more information,
a man was sent to 10 laboratories and in each had his
cholesterol level measured. The results are listed here.
Estimate with 95% confidence the variance of these
measurements.
4.70 4.83 4.65 4.60 4.75 4.88 4.68 4.75 4.80 4.90
Jia-Ying Chen
Solution
31
Jia-Ying Chen
Inference About a Population Proportion


32
When the population consists of nominal
data, the only inference we can make is
about the proportion of occurrence of a
certain value.
The parameter p was used before to
calculate these probabilities under the
binomial distribution.
Jia-Ying Chen
Inference About a Population Proportion

Statistic and sampling distribution

the statistic used when making inference
about p is:
x
where
n
x  the number of successes.
n  sample size.
pˆ 
33
– Under certain conditions, [np > 5 and n(1-p) > 5],
pˆ is approximately normally distributed, with
m = p and s2 = p(1 - p)/n.
Jia-Ying Chen
Testing and Estimating the Proportion

Test statistic
for p
pˆ  p
Z
p(1  p) / n
where np  5 and n(1  p)  5

Interval estimator for p (1-a
confidence level)
pˆ  z a / 2 pˆ (1  pˆ ) / n
provided npˆ  5 and n(1  pˆ )  5
34
Jia-Ying Chen
Example 5

35
A dean of a business school wanted to know
whether the graduates of her school used a
statistical inference technique during their first
year of employment after graduation. She
surveyed 314 graduates and asked about the
use of statistical technique. After tallying up the
responses, she found that 204 used statistical
inference within one year of graduation.
Estimate with 90% confidence the proportion of
all business school graduates who use their
statistical education within a year of graduation.
Jia-Ying Chen
Solution
36
Jia-Ying Chen
Example 6

37
In some states the law requires drivers to turn on
their headlights when driving in the rain. A
highway patrol officer believes that less than
one-quarter of all drivers follow this rule. As a
test, he randomly samples 200 cars driving in
the rain and counts the number whose
headlights are turned on. H finds this number to
be 41. Does the officer have enough evidence at
the 10% significance level to support his belief?
Jia-Ying Chen
Solution
There is enough evidence to support the
officer’s belief
38
Jia-Ying Chen
Selecting the Sample Size to Estimate the
Proportion

Recall: The confidence interval for the proportion
is
pˆ  za / 2 pˆ (1  pˆ ) / n

Thus, to estimate the proportion to within W, we
can write
W  za / 2 pˆ (1  pˆ ) / n
39
Jia-Ying Chen
Selecting the Sample Size to Estimate the
Proportion

The required sample size is
 za / 2 pˆ (1  pˆ )
n
W

40



2
Jia-Ying Chen
Selecting the Sample Size
Two methods – in each case we choose a value for
solve the equation for n.
Method 1 : no knowledge of even a rough value of
a ‘worst case scenario’ so we substitute = .50
then
. This is
Method 2 : we have some idea about the value of . This is
a better scenario and we substitute in our estimated
value.
41
Jia-Ying Chen
12.41
Download