Sample Size

advertisement
CHAPTER 5
SAMPLING AND SAMPLING DISTRIBUTIONS
5-1.
Parameters are numerical measures of populations. Sample statistics are numerical measures of
samples. An estimator is a sample statistic used for estimating a population parameter.
5-2.
x = 97.9225
(estimate of  )
s = 51.8303
s2 = 2,686.38
(estimate of  )
(estimate of  2—the population variance)
5-3.
p̂ = x/n = 5/12 = 0.41667
(5 out of 12 accounts are over $100.)
5-4.
5-5.
x = 15.333
s = 2.5546
a) average price: 1.690385
Basic Statistics from Raw Data
Gas Prices
Measures of Central tendency
Mean 1.6903846
Median
1.69
Mode
1.69
Range
IQR
0.23
0.115
Measures of Dispersion
If the data is of a
Sample
Population
Variance 0.00545185
0.00524216
St. Dev. 0.07383662
0.07240276
b) Assuming a normal distribution N(1.64, 0.04), P(X> 1.69039) = 0.3373
Mean
1.64
x
1.69039
Stdev
0.12
P(X>x)
0.3373
104
5-6.
p̂ = x/n = 11/18 = 0.6111, where x = the number of users of the product.
5-7.
We need 25 elements from a population of 950 elements. Use the rows of Table 5-1, the
rightmost 3 digits of each group starting in row 1 (left to right). So we skip any such 3-digit
number that is either > 950 or that has been generated earlier in this list, giving us a list of 25
different numbers in the desired range. The chosen numbers are:
480, 11, 536, 647, 646, 179, 194, 368, 573, 595, 393, 198, 402, 130, 360, 527, 265, 809, 830,
167, 93, 243, 680, 856, 376.
5-8.
We will use again Table 5-1, using columns this time. We will use right-hand columns, first 4
digits from the right (going down the column):
4,194 3,402 4,830 3,537 1,305.
5-9
We will use Table 5-1, sets of 2 columns using all 5 digits from column 1 and the first 3 digits
from column 2, continuing by reading down in these columns. Then we will continue to the set:
column 3 and first 3 digits column 4. We skip any numbers that are > 40,000,000. The resulting
voter numbers are:
10,480,150
22,368,465
24,130,483
37,570,399
1,536,020.
5-10.
There are 7 x 24 x 60 minutes in one week: (7)(24)(60) = 10,080 minutes. We will use Table 5-1
Start in the first row and go across the row, then to the next row (left to right using all 5 digits
in each set), discarding any of the resulting 5-digit numbers that are > 10,080. The resulting
minute numbers are:
1,536 2,011 6,243 7,856 6,121 6,907
5-11.
A sampling distribution is the probability distribution of a sample statistic. The sampling
distribution is useful in determining the accuracy of estimation results.
5-12.
Only if the population is itself normal.
5-13.
E  X    = 125
SE  X    / n  20/ 5 = 8.944
5-14.
The fact that, in the limit, the population distribution does not matter. Thus the theorem is very
general.
5-15.
When the population distribution is unknown.
5-16.
The Central Limit Theorem does not apply.
5-17.
P̂ is binomial. Since np = 1.2, the Central Limit Theorem does not apply and we cannot use the
normal distribution.
105
5-18.
 2 = 10,000
 = 1,247


P( X < 1,230) = P  Z 
n = 100
1,230  1,247 
100 / 10
 = P(Z < –1.7) = .5 – .4554 = 0.0446

Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
1247
100
Sample Size
n
100
P(X<x)
0.0446
5-19.
Sampling Distribution of X-bar
Mean
1247
Stdev
10
x
1230




P X    8 = 1 – P X    8 = 1 – P(–8 < X   < 8)
8
 8

Z
 = 1 – P(–1.78 < Z < 1.78)
55 / 150 
 55 / 150
= 1 – P
= 1 – 2(.4625) = 0.075
5-20.


P(X > 3.6) = P  Z 
3.6  3.4 
 = P(Z > 1.333) = 0.0912
1.5 / 100 
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
3.4
1.5
Sample Size
n
100
x
3.6
Sampling Distribution of X-bar
Mean
3.4
Stdev
0.15
P(X>x)
0.0912
106
5-21.
 3.7  3.8
P(3.7 < X < 3.9) = P 
 1.2 / 36
Z
3.9  3.8 

1.2 / 36 
= P(–0.5 < Z < 0.5) = 2 (.1915) = .3830 (approximately)
(Use template: Sampling Distribution.xls, sheet: x-bar)
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
3.8
1.2
Sample Size
n
36
x1
3.7
5-22.
s = 4,500

Sampling Distribution of X-bar
Mean
Stdev
3.8
0.2
P(x1<X<x2)
0.3829
x2
3.9
n = 225


P X    800 = P  Z 


800 
  800
 = P 
Z

4,500 / 15 
4,500 / 225 
 4,500 / 15
800
= P(–2.667 < Z < 2.667) = 2(.4961) = 0.9923
5-23.
p = 0.18
n = 200

P( Pˆ  .20 ) = P  Z 


.02 
 = P  Z 
 = P(Z  .736)

.02717 
(.18)(.82 ) / 200 

.20  .18
= .5 – .2692 = 0.2308
5-24.
The claim is that p = 0.58. We have n = 250 and x / n = 123/250 = 0.492.

P( P̂  .492) = P  Z 


 = P(Z < -2.819) = 0.0024
(.58)(.42 ) / 250 
.492  .58
107
5-25.
a) P(X > 125000) = 0.0907
Mean
119600
Stdev
35000
Sampling Distribution of X-bar
Mean
Stdev
119600
4041.45
75
P(X<x)
x
x
125000
P(X>x)
0.0907
b)
5-26.
stdev
30000
32000
P(X>125000)
0.0595
0.0720
34000
36000
38000
40000
0.0845
0.0970
0.1092
0.1212
n = 16  = 1.5
 =2

0  1.5 
 = P(Z > -3) = .5 + .4987 = 0.9987
P( X > 0) = P  Z 
2 / 16 

Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
1.5
2
Sample Size
n
16
x
0
Sampling Distribution of X-bar
Mean
1.5
Stdev
0.5
P(X>x)
0.9987
108
5-27.
p = 1/7


.10  .143
 = P(Z < 1.648) = 0.5  0.4503 =
P( P̂ < .10) = P  Z 


(
1
/
7
)(
6
/
7
)
/
180


0.0497, a low probability. The sample size, along with np and n(1 – p), are large enough here that
the sample distribution (over all the different samples of 180 people in the population) of the
proportion of people who get hospitalized during the year is going to be pretty close to normal.
Therefore, any one such sample proportion will be close to the predicted mean 1/7 with
reasonable probability, and 1/10 is far enough away from that mean given our estimated sample
standard deviation that the probability of falling even farther away than that from the mean is
small.
5-28.
 = 700
 = 100
n = 60
 680  700
720  700 

P(680  X  720) = P 
Z 
100 / 60 
 100 / 60
= 2TA(1.549) = 0.8786
5-29.
p =  = 0.35


 =
(0.35)(0.65) / 500 = 0.0213
P Pˆ  p  0.05 = P( P̂ < 0.30) + P( P̂ > 0.40)
0.30  0.35 
0.40  0.35 


= PZ 
 + PZ 

0
.
0213
0.0213 



= 1 – 2TA(2.344) = 0.0190
5-30.
Estimator B is better. It has a small bias, but its variance is small. This estimator is more likely to
produce an estimate that is close to the parameter of interest.
5-31.
I would use this estimator because consistency means as n   the probability of getting close
to the parameter increases. With a generous budget I can get a large sample size, which will make
this probability high.
5-32.
ŝ 2 = 1,287
5-33.
Advantage: uses all information in the data.
Disadvantage: may be too sensitive to the influence of outliers.
 n  2  100 
s2 = 
 ŝ = 
 1,287 = 1,300
 n 1 
 99 
109
5-34.
Depends also on efficiency and other factors. With respect to the bias:
A has bias = 1/n
B has bias = 0.01
A is better than B when 1/n < 0.01, that is, when n > 1/0.01 = 100
5-35.
Consistency is important because it means that as you get more data, your probability of getting
closer to your “target” increases.
5-36.
n1 = 30, n 2 = 48, n3 = 32. The three sample means are known. The df for deviations from the
three sample means are:
df = n1 + n 2 + n3 – 3 = 30 + 48 + 32 – 3 = 107
5-37.
a) the mean is the best number to use.
mean =
Sample
34
51
40
38
47
50
52
44
37
43.667
Deviation Deviation
from mean squared
-9.667 93.45089
7.333 53.77289
-3.667 13.44689
-5.667 32.11489
3.333 11.10889
6.333 40.10689
8.333 69.43889
0.333 0.110889
-6.667 44.44889
SSD =
358
degrees of freedom = 8
MSD = SSD / df = 358 / 8 = 44.75
110
b) choose the means of the respective block of numbers: 40.75, 49.667, 40.5 minimized SSD =
195.917, df = 6, MSD = 32.65283
mean =
Sample
34
51
40
38
47
50
52
44
37
40.75
49.667
Deviation Deviation
from mean squared
-6.75 45.5625
10.25 105.0625
-0.75
0.5625
-2.75
7.5625
-2.667 7.112889
0.333 0.110889
2.333 5.442889
3.5
12.25
-3.5
12.25
SSD =
40.5
195.9167
c) Each of the numbers themselves. SSD = 0. MSD indicates that the variance is zero, which is
true since we are using each of the individual numbers to reduce SSD to zero.
d) SSD = 719, df = 9, MSD = 79.889
mean =
Sample
34
51
40
38
47
50
52
44
37
50
Deviation Deviation
from mean squared
-16
256
1
1
-10
100
-12
144
-3
9
0
0
2
4
-6
36
-13
169
SSD =
5-38.
5-39.
719
No, because there are n – 1 = 19 – 1 = 18 degrees of freedom for these checks once you know
their mean. Since 17 is on less, there is a remaining degree of freedom and you cannot solve for
the missing checks.
Yes. ( x1 +  + x18 + x19 )/19 = x . Since 18 of the x i are known and so is x , we can solve the
equation for the unknown x19 .
5-40.
df = n-k
as k increases, df decreases, SSD decreases, MSD decreases
5-41.
E( X ) =  = 1,065
V( X ) =  2 /n = 5002/100 = 2,500
111
5-42.
 2 = 1,000,000
Want SD( X )  25
SD( X ) =  / n = 1,000 / n
1,000 / n  25
n  1,000/25 = 40
n  1,600. The sample size must be at least 1,600.
5-43.
 = 53
 = 10
E( X ) =  = 53
n = 400
SE( X ) =  / n = 10 / 400 = 0.5
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
53
10
Sample Size
n
5-44.
5-45.
p = 0.5
n = 120
SE( P̂ ) =
p(1  p)
=
n
Stdev
0.5
(.5)(.5)
= 0.0456
120
E( P̂ ) = p = 0.2
SE( P̂ ) =
5-46.
Sampling Distribution of X-bar
Mean
53
400
p(1  p)
=
n
(.2)(.8)
= 0.04216
90
P = 0.5 maximizes the variance of P̂ .
V( P̂ ) =
Proof:
p(1  p)
n
dV ( Pˆ )
1 d
1
=
(pp 2) = (1 – 2p)
dp
n dp
n
Set the derivative to zero:
1
(1 – 2p) = 0
1 = 2p
p = 1/2
n
The assertion may also be demonstrated by trying different values of p.
112
5-47.
 500  600
700  600 

Z 
P(500 < X < 700) = P 
600 / 30 
 600 / 30
= P(–.913 < Z < .913) = 2(.3194) = .6388
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
600
600
Sample Size
n
30
x1
500
5-48.
Sampling Distribution of X-bar
Mean
600
P(x1<X<x2)
0.6387
Stdev
109.545
x2
700
1,000  1,065 
 650 


P( X  1.000) = P  Z 
 = PZ 

500 / 10 
500 


= P(Z  1.3) = .5 + .4032 = 0.9032
We need to use the Central Limit Theorem for a normal distribution.
5-49.
 = 53
 = 10
n = 400
54  53 
 52  53
P(52 < X < 54) = P 
Z 
 = P(2 < Z < 2) = 0.9544
10
/
20
10 / 20 

5-50.
p = 0.5
n = 120


.45  .5
 = P(Z  1.095) = 0.8632
P( P̂  .45) = P  Z 


(.
5
)(.
5
)
/
120


5-51.
a. $8,128.08 found by $3.3M/406 = 8,128.08

7000  8128 .08 
 = P(Z < 2.256)
b. P( X < 7000) = P  Z 
2000 / 16 

= .5000  .4880 = 0.012
P(X<x)
0.0120
x
7000
113
5-52.
0.06  p  0.10
SE( P̂ ) =
p(1  p) / n  0.03
Assume p = 0.06:
SE( P̂ ) =
(.06)(.94) / n  .03
(.06)(.94)/n  .032
62.66  n
Now assume the other extreme, p = 0.10:
SE( P̂ ) =
(.1)(.9) / n  .03
(.1)(.9)/n  .032
100  n
Now, we also know that the function SE( P̂ ) does not have a maximum point between p = 0.06
and p = 0.10 because the only maximum point of the function occurs at p = 0.5 (as we know from
Problem 5-46). Hence SE( P̂ ) is monotonic between p = 0.06 and 0.10, and thus n = 100 is the
minimum required sample size.
5-53.
Random samples from the entire population of interest reduce the chance of a bias and increase
chance of being representative of the entire population. Also, we have a known probability of
being within certain distances of the parameter of interest. We use a frame and a random number
generator or a table of random numbers. A simple random sample is such that every possible set
of n elements has an equal chance of being selected.
5-54.
A bias is a systematic deviation away from the target of estimation. A bias takes us away from
the target parameter in repeated sampling. If the bias is small and variance of the estimator is also
small, the bias may be tolerated, especially if the bias decreases as n increases.
5-55.
The sample median is unbiased. The sample mean is more efficient; it is also sufficient. This is
why we prefer the sample mean. We must assume normality for using the sample median to
estimate  . The median is more resistant to outliers.
5-56.
S 2 has n – 1 in the denominator because there are n – 1 degrees of freedom for deviations from
the sample mean. Using n – 1 instead of n makes S 2 an unbiased estimator of  2 .
114
5-57.
 = 19.5
 = 5.3
n = 100
20  19.5 

P( X > 20) = P  Z 
 = P(Z > .9434) = .5  .3273 = 0.1727
5.3 / 10 

Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
19.5
5.3
Sample Size
n
100
x
20
5-58.
Sampling Distribution of X-bar
Mean
19.5
Stdev
0.53
P(X>x)
0.1727
95% bounds on X :
  1.96 / n = 19.5  1.96(5.3/10) = [18.4612, 20.5388]
90% bounds on X :
19.5  1.645(5.3/10) = [18.62815, 20.37185]
Symmetric Intervals
P(x1<X<x2)
x1
x2
18.46122
0.95
20.538779
18.62823
0.9
20.371772
5-59.
 = 2.9
 = 0.5

P( X > 3.0) = P  Z 

n = 25
3.0  2.9 
 = 0.5 – 0.3413 = 0.1587
0.5 / 25 
(Use template: Sampling Distribution.xls, sheet: x-bar)
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
2.9
0.5
Sample Size
n
25
x
3
Sampling Distribution of X-bar
Mean
Stdev
2.9
0.1
P(>x)
0.1587
115
5-60.
df = (rows-1)(columns-1) = (5-1)(3-1) = 8
5-61.
p = 0.38
n = 100

.30  .38
P( P̂ > 0.30) = P  Z 

(.38)(.62) / 100


 = P(Z > 1.648)


= .5 + .4503 = 0.9503
where stdev = SQRT(.38*.62)
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
0.38
0.48539
Sample Size
n 100
x
0.3
5-62.
Sampling Distribution of X-bar
Mean
0.38
Stdev
0.04854
P(X>x)
0.9503
X is normal. But since  is unknown and we use S, the quantity ( X   )/(S/ n )
has the t ( n 1) distribution rather than the standard normal distribution Z.
5-63.
No minimum (n = 1 is enough for normality).
5-64.
X , P̂ , S 2 are unbiased. S is the square root of an unbiased estimator of  2 , thus
it is not unbiased. Proof:
Assume E(S) = 
then: (E(S))2 =  2
and: E(S 2) – (E(S))2 =  2   2 = 0
(since E(S 2) =  2 ).
But E(S 2) – (E(S))2 = V(S)
V(S) = 0 means that S is not a statistical estimator. The contradiction establishes the proposition
that S is biased.
5-65.
This estimator is also consistent. It is more efficient than X , because  2 /n 2 <  2 /n.
5-66.
df = 124 –3 = 121
116
5-67.
a. Normal population requires the smallest minimum n.
b. Mound-shaped population requires the next higher minimum n.
c. Discrete population needs the highest minimum n.
d. Slightly skewed population: n more than for (b), less than for (c).
e. Highly skewed population: n less than for (c), but more than for (d).
The relative minimum required sample sizes are as follows:
n a < nb < n d < n e < n c
5-68.
Yes. SE( X ) decreases as n increases:
SE( X ) =  / n , which goes to 0 as n goes to  . Statistically, it is always good to have as large
a sample as possible.
5-69.
Draw repeated samples, preferably by simulation on a computer, and determine the empirical
distribution of the statistic: the relative frequency distribution of its values.
5-70.

.15  .20 
P( P̂ < .15) = P  Z 
= P(Z < 1.976) = .5  .4759 = 0.0241


(.
2
)(.
8
)
/
250


5-71.
 = 25
 =2
n = 100
24  25 

P( X < 24) = P  Z 
 = P(Z < 5) = 0.0000003
2 / 10 

Not probable at all.
5-72.


P Pˆ  0.80  0.07 = P(0.73  P̂  0.87)


.73  .80
.87  .80
 = P(2.475  Z  2.475)
= P
Z 
 (.80)(.20) / 200

(.
80
)(.
20
)
/
200


= 2TA(2.475) = 0.9866
where stdev = SQRT(.80*.20)
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
0.8
0.4
Sample Size
n
200
x1
0.73
P(x1<X<x2)
0.9867
Sampling Distribution of X-bar
Mean
0.8
Stdev
0.02828
x2
0.87
117
5-73.
 1.52  1.57
1.62  1.57 
 = 2TA(1.768) = 0.923
P(1.52 < X < 1.62) = P 
Z 
0.4 / 200 
 0.4 / 200
5-74.
a) point estimate for the sample mean is 52
Population Distribution
Mean
Stdev
52
2.4
Sample Size
n
40
P(X<x)
Is the population normal?
Sampling Distribution of X-bar
Mean
Stdev
52
0.37947
x
P(X>x)
x
x1
52
b) P( 52 < X < 53) = 0.4958
5-75
(Use template: Sampling Distribution.xls, sheet: p-hat)
n = 400
p = 0.06
Sampling Distribution of Sample Proportion
Population Proportion
p
0.06
Sample Size
n
400
Sampling Distribution of P-hat
Mean
Stdev
0.06
0.01187
P(P-hat < 0.05) = 0.1999
P(<x)
0.1999
x
0.05
118
P(x1<X<x2)
0.4958
x2
53
5-76
(Use template: Sampling Distribution.xls, sheet: x-bar)
μ = 15830
σ = 458
n = 10
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
15830
458
Sample Size
n
10
Sampling Distribution of X-bar
Mean
Stdev
15830
144.832
P( X  16000)  0.1202
x
16000
5-77
P(>x)
0.1202
(Use template: Sampling Distribution.xls, sheet: x-bar)
μ = 750.4
σ = 1.2
n = 50
Sampling Distribution of Sample Mean
Population Distribution
Mean
Stdev
750.4
1.2
Sample Size
n
50
Sampling Distribution of X-bar
Mean
Stdev
750.4
0.16971
P(749.5  X  750.5)  0.7222
x1
749.5
P(x1<X<x2)
0.7222
x2
750.5
119
Download