STT 825 - Michigan State University

advertisement
STT 825
F 02
EXAM 1
SOLUTIONS
1. (32 points) We want to estimate the mean hourly wage for electricians working in Ingham County. We have
a list (available from the State of Michigan) of the 540 licensed electricians with addresses in Ingham Country,
and will take a simple random sample from this list.
a. Suppose we know the wages are between $20.00 and $50.00 per hour.
Assume a skewed (right triangle- J shape) for the distribution, and find S for planning the sample size.
S2 = 302/18 = 50,
S = 7.07
b. We decided to use an old study value of S = $5.00 for planning. How large a sample is needed to estimate
the mean wage to within  .75 with 95% confidence?
no = 170.73 = (1.96)2 52/ (.75)2
n* = 129.7,
recommend n* = 130
c. If, based on past experience, we think we will get 50% nonresponses, how many questionnaires should we send
out to get the precision we want in (b)?
2 (your answer in (b)) = 260
d. Some electricians are not licensed and thus do not appear on the original list. This would be a source of
[sampling error
selection bias
measurement error ]
in our study.
e. Some licensed electricians charge different hourly rates depending on whom they are working for. This would
be a source of
[sampling error
selection bias
measurement error ]
in our study.
f. Some of the electricians selected for the sample refused to respond to the survey. This would be a source of
[sampling error
selection bias
measurement error ]
in our study.
g. Some electricians work in Ingham County but don’t live there. This would be a source of
[sampling error
selection bias
measurement error ]
in our study.
h. Samples were taken by two different research firms and they then reported confidence intervals that weren’t
the same. This is due to
[sampling error,
selection bias,
1
measurement error ].
2. (55 points) Twelve adults signed up for a special fitness class.
Adult Label
Age
1
19
2
56
3
34
4
29
5
35
6
40
7
28
8
33
9
25
10
37
Note summary characteristics for these 12 adults.
Variable
age
N
12
Mean
34.08
Median
33.50
TrMean
33.40
StDev
9.39
SE Mean
2.71
We will take samples of size 4 from this population of 12 adults.
a. How many samples of size 4 are possible from this population of 12 adults?
12C4
= 495
b. Consider the sampling plan to the right:
Si
1,3,4,10
3,4,7,8
1,2,5,8
6,9,11,12
P(Si)
1/8
1/8
1/4
1/2
i. Find the probability that unit #8 is selected via this sampling plan.
1/8 + 1/4 = 3/8
ii. Find y for sample S1.
(19 + 34 + 29 + 37)/4 = 29.75
iii. Find E( y ).
(1/8)(29.75) + 1/8(31) + ¼(34.5) + ½(34.5) = 33.719
iv. Is y biased or unbiased under this sampling plan?
biased
2
y
****
31.0
35.5
34.5
11
31
12
42
#2 continued.
c. Describe the sampling plan which is systematic sampling, samples of size 4, from this population of size 12.
(Give all the Si and their P(Si).
Si
P(Si)
1,4,7,10
2,5,8,11
3,6,9,12
1/3
1/3
1/3
d. Suppose we take a simple random sample of size 4 from this population of 12 adults.
i. What is the probability of selecting sample {1,2,3,4}?
1/495
ii. Find Var( y ) for the simple random sampling plan. (Note population characteristics on the previous
page.)
(1-n/N) S2/n
= (1 - 4/12) (9.39)2/4 = 14.695
e. We are thinking of starting the special fitness class for a local athletic club, and want to estimate the mean age
of their 500 adult members. Use the characteristics from the population of size 12 given below for planning:
Variable
age
N
12
Mean
34.08
Median
33.50
TrMean
33.40
StDev
9.39
SE Mean
2.71
If we want to estimate the mean age to within 6% of the actual mean age with 95% confidence, how large a
sample should we take?
Relative error e = .06
CV = 9.39/34.08 = .2755
no = (1.96)2(.2755)2/(.06)2 = 81.01
n* = 69.71, report n*=70.
(iteration with t is also accepted)
f. Your answer in (e) is for a population of size 500. For larger populations (planning with the same parameters
values and for the same precision), we need a larger sample size. No matter how large the population, the needed
sample size never goes above what value? (Give one number.)
no = 81
3
3. (57 points) A local green house has 2000 flats with up to 6 plants in each flat. To monitor insect infestation, a
simple random sample of 40 flats is taken, the plants are examined, and the number of insects is counted for each
plant then totaled to get a count yi for each sample flat.
Sample Statistics:
Number of insects
Number of plants
Mean
14.0
5.4
Standard deviation
3.2
0.7
a. What is the sampling unit?
Flat
b. What is the observation unit?
Plant
c. The flats are labeled 1,2,…,2000. Use the portion of the random number table given below to get the first
THREE flat labels in the SRS.
23481 41279 04190 82147 59171 90812 75215 57214 59832 14879 64152 51087 23410 89743
1412, 1908, 1275
d. Note the sample statistics above.
Compute a 95% t-confidence interval for the mean number of insects per flat.
14 2.021 (.50) which gives 14  1.00
e. Note the sample statistics given above.
Estimate the mean number of insects per plant (do not find the standard error).
estimate a ratio by 14/5.4 = 2.6
4
#3 continued.
f. When doing the experiment, a simple random sample of 40 labels was taken from labels {1,2,…,2000}.
Flat #587 was selected, but was damaged and not usable in the experiment. To replace #587,
i. the technician just took Flat #588 instead. Does this produce an SRS?
YES
NO
ii. the technician flipped a coin to decide between putting Flat #588 and Flat #586 in the sample.
Does this produce an SRS?
YES
NO
iii. the technician randomly selected a flat from those not yet selected. Does this produce an SRS?
YES
NO
g. Suppose we sampled every 50th flat in the list of 2000, but started the selection at random. Does this produce
an SRS?
YES
NO
h. Suppose we decided to sample plants rather than flats.
i. Select a simple random sample of 100 flats, then select 2 plants at random from each selected flat.
Does this produce an SRS of plants?
YES
NO
ii. At each stage, select a flat at random, then select a plant at random from those not yet selected in the
flat (note that if m plants remain unselected in the flat, each has chance 1/m of being selected at that
stage). If all the plants have been selected in a flat, then select another flat at random. Does this produce
an SRS of plants?
YES
NO
iii. At each stage, select a flat at random, then select a plant position at random from 1-6. If that position
is empty or that plant has been selected, then begin selecting from all flats again. Does this produce an
SRS of plants?
YES
NO
iv. Consider sampling frame _ _ _ _/_ where the first 4 digits identify the flat, and the last digit
identifies the plant in the flat. If we take an SRS from this frame, will we get an SRS of plants?
YES
NO
================================================================================
5
4. (30 points) We took a simple random sample of 50 stores from a list of 370 stores in the area. Data and
summary statistics are reported under Problem #4 on the gold sheet. Sales are in $1000 units.
a. For an SRS of size 50, what is the probability that store #206 would be selected?
SRS is an epsem scheme, and thus, the probability of any single unit being selected = sampling fraction
= n/N = 50/370 = .135
b. A 95% confidence interval for the mean sales is given by 53.56  8.88.
i. Use this to get a 95% confidence interval for the total sales.
370 ( CI above) gives 19,817.2  3285.6
ii. The confidence interval in (i) requires approximate normality of the sample mean. Note the histogram
of the data. Would you say that n=50 is large enough to use the normal?
YES
NO
d. The number of employees was also recorded for each store. Note the scatter diagram and summary statistics
given under #4 on the gold sheet.
i. Looking at the scatter diagram for (x,y), which estimator of the total sales would you recommend?
[ ratio
regression ]
ii. Compute the ratio estimate of the total sales (note that the average number of employees for the 370
stores is 8.7 per store). Do NOT compute the standard error.
53.56 x (370 x 8.7)/10.34 = 16,674.05 Note that you had to get tx = 370 x 8.7 to use for estimating ty.
iii. Note observation #1 on your gold sheet, and find ei to use if you were computing the standard error
of your ratio estimator. Find the ei for only the FIRST observation, not the others!
ei = 55 - 5.18 (14) = - 17.52
===========================================================================
6
5. (26 points) We want to estimate the proportion of 370 stores that stay open at night.
a. Define the measurement yi.
= 1 if open at night
= 0 if not open at night
b. Data gave 46 stores out of 50 staying open at night. Compute a 95% confidence interval for the proportion
staying open at night. Use the Z-distribution.
sample proportion = 46/50 = .92
.92  (1 - 50/370)1/2 [.92(.08)/49]1/2
.92  1.96 (.93) (.0388)
.92  1.96 (.036)
.92  .0707
c. Is the confidence interval in (b) valid? (That is, do the needed assumptions approximately hold?)
Why or why not?
No, n (q-hat) = 50 (.08) = 4 < 5 required to use the normal.
d. We want to plan a similar study about stores in a larger area.
i. If we assume that the proportion staying open at night is more than 85%, what value of p should we
plan with?
.85
ii. If we assume that the proportion staying open at nights is below 90%, what value of p should we plan
with?
.5
7
PROBLEM #4
Variable
N
Mean
Median
TrMean
StDev
SE Mean
sales
50
53.56
53.50
53.36
17.67
2.50
number
50
10.340
11.000
10.364
5.542
0.784
Data Display
Obs
1
2
3
4
5
6
number
sales
14
55
15
63
13
64
16
85
7
40
14
74
etc. up to
50 observations
HISTOGRAM OF SALES DATA
7
6
Frequency
5
4
3
2
1
0
20
30
40
50
60
70
80
90
sales
SCATTER DIAGRAM (x=number of employees,
Regression Plot
sales = 24.0869 + 2.85040 number
S = 7.99314
R-Sq = 80.0 %
R-Sq(adj) = 79.5 %
90
80
sales
70
60
50
40
30
20
0
10
20
number
8
y=sales)
Download