Notes Set #1 Stat402B (Spring 2016) Last update: January 18, 2016 Stat 402B (Spring 2016): Slide Set 1 Preliminaries • Sir Ronald Fisher pioneered use of mathematical statistics in DOE at the Rothampstead Agricultural Experiment Station 1920-1940 • Major ideas of DOE were developed at the time. Naturally, common terms in DOE came from this background • Terms like treatment, plot, block are some examples. When appropriate these continue to be used today • Does not really matter as statisticians think in term of linear mathematical models which apply whatever the field of application is. • A field experiment to compare varieties of soybean or a lab experiment run to compare operating temperatures in a catalytic cracker may use the same model 1 Stat 402B (Spring 2016): Slide Set 1 Preliminaries (continued) • Some differences exist between agrcultual and industrial experimentation: – Time span of an experiment - usually long for ag expts. requiring designs that allow more control e.g., using blocked designs – Cost of exprimentation - industrial expts are more expensive so expts are smaller expts run in sequence compared to complete expts in ag. – Type of inference - estimation of effects important in industry rather than hypothesis tests as the use of anova in ag. • Text book takes these differences into consideration: – distributional assumptions made about data from agricultural expts. may not hold for industrial data – explains how concepts in DOE enable us to make inference from industrial data even if the assumptions do not hold 2 Stat 402B (Spring 2016): Slide Set 1 Choices of Sample Size Example: Single sample case • √ A (1 − α)100% CI for µ√is given by ȳ ± tα/2,n−1(s/ n). Thus the width of this CI is W = 2(s/ n)tα/2,n−1 • √ W ≤ L for a fixed α, i.e. 2(s/ n)tα/2,n−1 ≤ L Which gives us the inequality n≥ 4s2t2α/2,n−1 L2 Could use this relationship to choose a sufficiently largely sample size to have the required precision as specified by L? 3 Stat 402B (Spring 2016): Slide Set 1 • We must use an estimate s2 from a preliminary experiment, and then collect additional observations if the sample size is not enough. Example Suppose we want the mean of certain measurement to be estimated within ±0.15 of the sample mean with 95% confidence. That is, we are specifying that L = 0.30 when α = 0.05 in the above formula. • Draw a sample of size 10: 42.71 42.65 42.73 42.78 42.48 42.09 42.12 42.49 42.48 42.88 ȳ = 42.541 and s2 = .070987 gives a 95% CI of 42.541 ± .1905 i.e. W = .381 Thus W ≥ L, the precision of our estimate does not meet the specification. 4 Stat 402B (Spring 2016): Slide Set 1 • Compute a new sample size n using the inequality 4s2tα/2,n−1 4(.070987)(2.262) = = 16.14 ≈ 17 n≥ L2 .32 Draw 17-10=7 more samples: 42.80 42.82 42.66 42.24 42.84 42.15 42.88 From the total sample of 17 observations we have ȳ = 42.576, s2 = .075959 which gives a 95% CI of 42.576±.146 , i.e. W = .292 • If an estimator s2 was available from a previous experiment, we could use trial and error to arrive at a suitable value for n. Suppose s2 ≈ .08. 5 Stat 402B (Spring 2016): Slide Set 1 Then the inequality becomes 4 × .08t2.025,n−1 n≥ .32 Now substituting values for n in both sides of the inequality: n 10 15 16 17 RHS 18.2 16.4 16.14 15.98 The inequality n must exceed the RHS is satisfied when n ≥ 17. 6 Stat 402B (Spring 2016): Slide Set 1 Choices of Sample Size (Continued) Hypothesis Testing We want to select a sample size that will allow us to control the type II error rate (or equivalently the power of the test) for a specified type I error rate, α. Recall: β=P(type II error)=P(fail to reject H0|H0 is false) And that 1 − β is the power of the test for a specified alternative Ha. We use the operating characteristic curve of a test (OC) curve for determining the sample size required to have a certain power. Example Two-sample Problem H0 : µ1 = µ2 vs H1 : µ1 6= µ2 (σ 2 unknown) 7 Stat 402B (Spring 2016): Slide Set 1 Define δ = µ1 − µ2, d = |µ1 − µ2|/2σ = |δ|/2σ. The OC curve for the two-sided test is below: Note that the value on the curve in this case is actually 2n-1. 8 Stat 402B (Spring 2016): Slide Set 1 Some properties of the test can be illustrated using the OC curve 1 For fixed α and n, β decreases as d increases. Thus, the larger the difference in the means the more likely the test will detect it, a useful property for a test to have. 2 For fixed d and α, β decreases as the sample size n increases. Thus , the larger the sample size the higher the power of the test (1 − β) to detect a given difference in the means. Suppose that in the Portland cement mortar problem (see Table 2.1), if the difference in the means is as 0.5 kgf/cm2 we want to detect it with type II error probability of 0.05 or less. −µ2 | 0.25 Compute d = |µ12σ = 0.5 = 2σ σ . So if σ = .25 (from past data or experimenter’s educated guess), d = 1 and from the curve the sample size needed to meet the criterion is approx. 8. 9 Stat 402B (Spring 2016): Slide Set 1 Power of a test is defined as 1-β Requiring a low type II error probability is equivalent to the test having high power. Want to compare mean egg production for Diets A and B. Say we want to test whether B is better. H0 : µA = µB vs Ha : µB > µA Set α = .05. Want to have a Power of .9 to detect a difference in mean egg production of 12. Assume σ 2=100 • −µB | Compute d= |µA2σ = • From the chart for a one-sided test, (see below), for d=.6 and β = .1, we get n* approximately equal to 25. 12 20 = .6 Hence n*=2n-1=25 giving the required sample size of n=13. 10 Stat 402B (Spring 2016): Slide Set 1 11 Stat 402B (Spring 2016): Slide Set 1 The Paired Comparison Design (Section 2.5) Design 1 Completely Randomized Design Take 20 specimens (say, same metal but scrap pieces cut from different rods). Assign Tip 1 to 10 specimens and Tip 2 to the other 10 completely randomly. Obtain yij , j = 1, . . . , 10; j = 1, . . . , 10 Two-sample t-statistic = ȳ1 −ȳ2 S.E.(y¯1 −ȳ2 ) = ȳ1 −ȳ2 q Sp n1 + n1 1 2 The estimate of error Sp2 measures the variability of the specimens. Thus any different due to the two tips may be harder to detect if S.E.(ȳ1 − ȳ2) is inflated because this variability is large. Design 2 Paired Design (Section 2.5) Take 10 metal specimens. Use each specimen to test both tips. Decide which (Tip 1 or Tip 2) is used with each specimen first by tossing a coin. 12 Stat 402B (Spring 2016): Slide Set 1 The two observations taken from each specimen are paired. Since the variability of specimen does not affect the difference in these pairs of observations, the sample variance computed from the difference is a more accurate estimate of S.E.(ȳ1 − ȳ2) • The two measurement made from the same specimen of metal form a block. Hence the variability among specimens does not affect the difference. • Blocks are formed so that experiment units within blocks are relatively more homogenous than among blocks with respect to the response being measured. • Blocking represents a restriction on randomization. • In the CRD case, the t-statistic has 18 d.f. We have “lost” 10 d.f. in estimating the variance of difference in sample means. (Question: Is it better or worse to lose d.f.?) • A “blocked experiment” is not always better. Have to compare designs after experiments. 13