Statistics 511 Study Guide 1 Fall 2001 This study guide is to help you determine if you need to review prerequisite. Questions A) Questions 1-11 refer to the data below. The effect of apple mosaic on growth was measured on 2-year old seedling unpruned trees, propagated from 10 infected and 6 uninfected buds from the same mother tree. The data is the stem volume in cubic centimeters. We will use only the data from the 6 healthy (uninfected) buds. It is already known that uninfected buds from an infected mother tree may differ significantly from buds from a disease-free mother tree. Stem Volume 1384 1325 1870 1324 1065 1652 1) What is the population of interest in this study? (That is, what is the population which the investigator wants to make inferences about?) 2) What is the sampling population? (What is the population from which the sample was taken?) 3) What is the sample size? 4) Compute a 95% confidence interval for the population mean. 5) What assumptions have you made about the data in forming the confidence interval? 6) How could you check these assumptions? 7) Is the interval in 4) an interval for the mean of the population of interest, or of the sampling population? 8) Compute an estimate of the population variance. 9) Compute an estimate of the variance of the sample mean. 10) From many previous studies it is known that the mean stem volume for buds on healthy trees (no infected buds) is 1442. Test whether the mean stem volume for healthy buds on this mother tree is 1442. 11) What assumptions about the data have you made in performing this test? B) 12) Draw a normal probability plot to match each of the histograms below. -1- Statistics 511 Study Guide 1 Fall 2001 13) Which of the histograms show skewness? Which of them show heavy tails? 14) Suppose you had a sample of size 25, and the histogram of the sample looked like one of the plots below. You hoped to use a t-test of the mean. For which of the plots below would the test be valid? -2- Statistics 511 Study Guide 1 Fall 2001 STUDY GUIDE 1 SOLUTIONS 1. The population of interest is uninfected buds from infected mother trees. 2. The sampling population is uninfected buds from this one infected mother tree. 3. The sample size is 6. 4. Inference about the mean is based on assumptions of normality and independence (see #5). Yi is distributed N , 2 and Y is distributed tn-1. Under these assumptions: Y n n S( Y ) Then confidence limits for , the population mean, are given by Y t[, n 1] S(Y) where S2 (Y) S2 / n and S2 (Yi Y) 2 /( n 1) Y = 1,436.6667 t[, n-1] = t[.05, 5] = 2.571 S = 282.920248 S( Y ) S / n 282.92248 / 6 115.5017 The confidence interval is 1436.6667 115.501708 2.571 = [1139.7,1733.6] 5. Inference concerning the population mean is based on the assumptions: 1)The sample is a random sample (independent and identically distributed). 2)The observations have a normal distribution with unknown mean and variance 2. 6.These normality assumptions can be checked by a histogram of the data. An NSCORE plot (normal probability plot) of the data or the deviations from the mean could also be obtained and is more accurate for small sample size. A “bell-shaped" histogram and a straight line NSCORE plot suggest normality. Independence may be difficult to check. 7.The interval in 4) is for the sampling population mean. -3- Statistics 511 Study Guide 1 Fall 2001 8.The population variance 2 is estimated by S2, the sample variance. S2 (Y - Y) i n -1 2 =400,219.3334/5 = 80,043.8667 80,043.9 9.The sample mean Y is distributed as a normal with mean and variance 2/n, i.e., Y N(,2/n). An estimate of its variance is S2/n = 80043.9/6 = 13,340.64433. 10.Test H0: = 0 = 1442 vs. HA: 0 Y 0 Test statistic is t* = t[n-1] S( Y ) The = .05 decision rule is reject H0 if |t*| > t[, n-1] = t(.05,5) = 2.571. |t*| = ( Y 0 ) / S( Y ) = |(1436.6667 - 1442)/115.501708| = |-0.046172| = 0.046172. Hence do not reject H0. Therefore, there is not enough evidence to conclude that the mean stem volume for healthy buds on this tree is not 1442. Alternately, notice that 1442 is within the 95% confidence interval for . Since any value in the (1-)100% confidence interval will not be rejected in a two-tailed test of size , it is clear that H0 is not rejected. 11.The assumptions are exactly the same as for the confidence interval. 12.Normal probability plots check for normality by plotting the ordered observations (on the Y axis) against their expected values if they were drawn from a normal population (on the X axis). Points drawn from a normal population will tend to fall on a straight line. Points drawn from non-normal distributions may systematically vary from a straight line. -4- Statistics 511 Study Guide 1 Fall 2001 a. NORMAL DISTRIBUTION: Points tend to fall on straight line. b. HEAVY-TAILED DISTRIBUTION: “Surplus" of extreme values causes curvature in either end of normal score plot. -5- Statistics 511 Study Guide 1 Fall 2001 c. SKEWED-RIGHT DISTRIBUTION: “Surplus" of low values and extreme values on the upper tails combine to give curvature to normal score plot. extreme positive values give upward curve surplus of low values creates bulge d. LIGHT-TAILED DISTRIBUTION: “Dearth" or relative lack of extreme values causes curvature at both ends of normal score plot. Dearth of extreme values creates curvature -6- Statistics 511 Study Guide 1 Fall 2001 e. BIMODAL, SKEW LEFT: Think of a skew left distribution, then add a “bump" caused by the second hump in the histogram. overall, similar to a skew left “bump" caused by second hump f.SLIGHTLY SKEW DISTRIBUTION WITH LIGHT TAILS: Similar to light-tailed distribution, but asymmetric. slightly more points in lower end produces asymmetry 13.Histogram (c) is skew. Histogram (e), which is bimodal, may also be considered to be skew. Histogram (b) has heavy tails. 14.T tests are valid for normal data, hence the t test is valid for histogram (a). T tests are also valid for near-normal data with light tails. This is true because the relative lack of extreme values will tend to make Y closer to , and thus the t-statistic would be smaller than if the data were normal. Hence the test is less likely to reject H0 when it is true. Such a test is said to be conservative. Histograms (d) and (f) will produce conservative t-tests. -7-