Study Guide 1

advertisement
Statistics 511
Study Guide 1
Fall 2001
This study guide is to help you determine if you need to review prerequisite.
Questions
A) Questions 1-11 refer to the data below.
The effect of apple mosaic on growth was measured on 2-year old seedling
unpruned trees, propagated from 10 infected and 6 uninfected buds from the same
mother tree. The data is the stem volume in cubic centimeters. We will use only the
data from the 6 healthy (uninfected) buds. It is already known that uninfected buds
from an infected mother tree may differ significantly from buds from a disease-free
mother tree.
Stem Volume
1384
1325
1870
1324
1065
1652
1) What is the population of interest in this study? (That is, what is the population which
the investigator wants to make inferences about?)
2) What is the sampling population? (What is the population from which the sample
was taken?)
3) What is the sample size?
4) Compute a 95% confidence interval for the population mean.
5) What assumptions have you made about the data in forming the confidence interval?
6) How could you check these assumptions?
7) Is the interval in 4) an interval for the mean of the population of interest, or of the
sampling population?
8) Compute an estimate of the population variance.
9) Compute an estimate of the variance of the sample mean.
10) From many previous studies it is known that the mean stem volume for buds on
healthy trees (no infected buds) is 1442. Test whether the mean stem volume for
healthy buds on this mother tree is 1442.
11) What assumptions about the data have you made in performing this test?
B)
12) Draw a normal probability plot to match each of the histograms below.
-1-
Statistics 511
Study Guide 1
Fall 2001
13) Which of the histograms show skewness? Which of them show heavy tails?
14) Suppose you had a sample of size 25, and the histogram of the sample looked like
one of the plots below. You hoped to use a t-test of the mean. For which of the plots
below would the test be valid?
-2-
Statistics 511
Study Guide 1
Fall 2001
STUDY GUIDE 1 SOLUTIONS
1. The population of interest is uninfected buds from infected mother trees.
2. The sampling population is uninfected buds from this one infected mother tree.
3. The sample size is 6.
4. Inference about the mean is based on assumptions of normality and independence (see #5).
 Yi is distributed N ,  2  and Y   is distributed tn-1.
Under these assumptions: Y 
 n 
n
S( Y )


Then confidence limits for , the population mean, are given by Y  t[, n  1]  S(Y)
where S2 (Y)  S2 / n and S2   (Yi  Y) 2 /( n  1)
Y = 1,436.6667 t[, n-1] = t[.05, 5] = 2.571
S = 282.920248 S( Y )  S / n  282.92248 / 6  115.5017
The confidence interval is 1436.6667  115.501708  2.571 = [1139.7,1733.6]
5. Inference concerning the population mean  is based on the assumptions:
1)The sample is a random sample (independent and identically distributed).
2)The observations have a normal distribution with unknown mean  and variance 2.
6.These normality assumptions can be checked by a histogram of the data. An NSCORE plot
(normal probability plot) of the data or the deviations from the mean could also be obtained and
is more accurate for small sample size. A “bell-shaped" histogram and a straight line NSCORE
plot suggest normality. Independence may be difficult to check.
7.The interval in 4) is for the sampling population mean.
-3-
Statistics 511
Study Guide 1
Fall 2001
8.The population variance 2 is estimated by S2, the sample variance.
S2 
 (Y - Y)
i
n -1
2
=400,219.3334/5 = 80,043.8667  80,043.9
9.The sample mean Y is distributed as a normal with mean  and variance 2/n, i.e.,
Y N(,2/n). An estimate of its variance is S2/n = 80043.9/6 = 13,340.64433.
10.Test H0:  = 0 = 1442 vs. HA:   0
Y  0
Test statistic is t* =
 t[n-1]
S( Y )
The  = .05 decision rule is reject H0 if |t*| > t[, n-1] = t(.05,5) = 2.571.
|t*| = ( Y   0 ) / S( Y ) = |(1436.6667 - 1442)/115.501708| = |-0.046172| = 0.046172.
Hence do not reject H0.
Therefore, there is not enough evidence to conclude that the mean stem volume for healthy
buds on this tree is not 1442.
Alternately, notice that 1442 is within the 95% confidence interval for . Since any value in
the (1-)100% confidence interval will not be rejected in a two-tailed test of size , it is clear
that H0 is not rejected.
11.The assumptions are exactly the same as for the confidence interval.
12.Normal probability plots check for normality by plotting the ordered observations (on the Y
axis) against their expected values if they were drawn from a normal population (on the X
axis). Points drawn from a normal population will tend to fall on a straight line. Points
drawn from non-normal distributions may systematically vary from a straight line.
-4-
Statistics 511
Study Guide 1
Fall 2001
a. NORMAL DISTRIBUTION: Points tend to fall on straight line.
b. HEAVY-TAILED DISTRIBUTION: “Surplus" of extreme values causes curvature in either
end of normal score plot.
-5-
Statistics 511
Study Guide 1
Fall 2001
c. SKEWED-RIGHT DISTRIBUTION: “Surplus" of low values and extreme values on the
upper tails combine to give curvature to normal score plot.
extreme positive values give upward curve
surplus of low values creates bulge
d. LIGHT-TAILED DISTRIBUTION: “Dearth" or relative lack of extreme values causes
curvature at both ends of normal score plot.
Dearth of extreme values creates curvature
-6-
Statistics 511
Study Guide 1
Fall 2001
e. BIMODAL, SKEW LEFT: Think of a skew left distribution, then add a “bump" caused by the
second hump in the histogram.
overall, similar to a skew left
“bump" caused by second hump
f.SLIGHTLY SKEW DISTRIBUTION WITH LIGHT TAILS: Similar to light-tailed
distribution, but asymmetric.
slightly more points in lower end
produces asymmetry
13.Histogram (c) is skew. Histogram (e), which is bimodal, may also be considered to be skew.
Histogram (b) has heavy tails.
14.T tests are valid for normal data, hence the t test is valid for histogram (a). T tests are also
valid for near-normal data with light tails. This is true because the relative lack of extreme
values will tend to make Y closer to , and thus the t-statistic would be smaller than if the data
were normal. Hence the test is less likely to reject H0 when it is true. Such a test is said to be
conservative. Histograms (d) and (f) will produce conservative t-tests.
-7-
Download