• • Stat 402B (Spring 2016): Slide Set 1 2 – distributional assumptions made about data from agricultural expts. may not hold for industrial data – explains how concepts in DOE enable us to make inference from industrial data even if the assumptions do not hold Text book takes these differences into consideration: – Time span of an experiment - usually long for ag expts. requiring designs that allow more control e.g., using blocked designs – Cost of exprimentation - industrial expts are more expensive so expts are smaller expts run in sequence compared to complete expts in ag. – Type of inference - estimation of effects important in industry rather than hypothesis tests as the use of anova in ag. Some differences exist between agrcultual and industrial experimentation: Preliminaries (continued) Last update: January 18, 2016 Stat402B (Spring 2016) Notes Set #1 n≥ Which gives us the inequality L2 4s2t2α/2,n−1 √ W ≤ L for a fixed α, i.e. 2(s/ n)tα/2,n−1 ≤ L √ A (1 − α)100% CI for μ√is given by ȳ ± tα/2,n−1(s/ n). Thus the width of this CI is W = 2(s/ n)tα/2,n−1 3 Could use this relationship to choose a sufficiently largely sample size to have the required precision as specified by L? • • Example: Single sample case Choices of Sample Size Stat 402B (Spring 2016): Slide Set 1 1 A field experiment to compare varieties of soybean or a lab experiment run to compare operating temperatures in a catalytic cracker may use the same model • Terms like treatment, plot, block are some examples. When appropriate these continue to be used today • Does not really matter as statisticians think in term of linear mathematical models which apply whatever the field of application is. Major ideas of DOE were developed at the time. Naturally, common terms in DOE came from this background • • Sir Ronald Fisher pioneered use of mathematical statistics in DOE at the Rothampstead Agricultural Experiment Station 1920-1940 Stat 402B (Spring 2016): Slide Set 1 • Preliminaries Stat 402B (Spring 2016): Slide Set 1 We must use an estimate s2 from a preliminary experiment, and then collect additional observations if the sample size is not enough. • 4 × .08t2.025,n−1 .32 RHS 18.2 16.4 16.14 15.98 The inequality n must exceed the RHS is satisfied when n ≥ 17. n 10 15 16 17 6 H0 : μ1 = μ2 vs H1 : μ1 = μ2 (σ 2 unknown) Example Two-sample Problem 7 And that 1 − β is the power of the test for a specified alternative Ha. We use the operating characteristic curve of a test (OC) curve for determining the sample size required to have a certain power. β=P(type II error)=P(fail to reject H0|H0 is false) Hypothesis Testing We want to select a sample size that will allow us to control the type II error rate (or equivalently the power of the test) for a specified type I error rate, α. Recall: Choices of Sample Size (Continued) Stat 402B (Spring 2016): Slide Set 1 If an estimator s2 was available from a previous experiment, we could use trial and error to arrive at a suitable value for n. Suppose s2 ≈ .08. From the total sample of 17 observations we have ȳ = 42.576, s2 = .075959 which gives a 95% CI of 42.576±.146 , i.e. W = .292 42.80 42.82 42.66 42.24 42.84 42.15 42.88 Stat 402B (Spring 2016): Slide Set 1 Now substituting values for n in both sides of the inequality: n≥ Then the inequality becomes Stat 402B (Spring 2016): Slide Set 1 4s2tα/2,n−1 4(.070987)(2.262) = = 16.14 ≈ 17 L2 .32 Draw 17-10=7 more samples: n≥ Compute a new sample size n using the inequality 5 • • 4 ȳ = 42.541 and s2 = .070987 gives a 95% CI of 42.541 ± .1905 i.e. W = .381 Thus W ≥ L, the precision of our estimate does not meet the specification. 42.71 42.65 42.73 42.78 42.48 42.09 42.12 42.49 42.48 42.88 Draw a sample of size 10: Example Suppose we want the mean of certain measurement to be estimated within ±0.15 of the sample mean with 95% confidence. That is, we are specifying that L = 0.30 when α = 0.05 in the above formula. • Stat 402B (Spring 2016): Slide Set 1 Hence n*=2n-1=25 giving the required sample size of n=13. 10 From the chart for a one-sided test, (see below), for d=.6 and β = .1, we get n* approximately equal to 25. • = .6 −μB | = Compute d= |μA2σ • Set α = .05. Want to have a Power of .9 to detect a difference in mean egg production of 12. Assume σ 2=100 H0 : μA = μB vs Ha : μB > μA 11 Stat 402B (Spring 2016): Slide Set 1 Stat 402B (Spring 2016): Slide Set 1 −μ2 | 0.25 = 0.5 Compute d = |μ12σ 2σ = σ . So if σ = .25 (from past data or experimenter’s educated guess), d = 1 and from the curve the sample size needed to meet the criterion is approx. 8. Suppose that in the Portland cement mortar problem (see Table 2.1), if the difference in the means is as 0.5 kgf/cm2 we want to detect it with type II error probability of 0.05 or less. 2 For fixed d and α, β decreases as the sample size n increases. Thus , the larger the sample size the higher the power of the test (1 − β) to detect a given difference in the means. 1 For fixed α and n, β decreases as d increases. Thus, the larger the difference in the means the more likely the test will detect it, a useful property for a test to have. 9 Power of a test is defined as 1-β Requiring a low type II error probability is equivalent to the test having high power. Want to compare mean egg production for Diets A and B. Say we want to test whether B is better. 12 20 Stat 402B (Spring 2016): Slide Set 1 Some properties of the test can be illustrated using the OC curve 8 Note that the value on the curve in this case is actually 2n-1. Define δ = μ1 − μ2, d = |μ1 − μ2|/2σ = |δ|/2σ. The OC curve for the two-sided test is below: Stat 402B (Spring 2016): Slide Set 1 ȳ1 −ȳ2 S.E.(y¯1 −ȳ2 ) = Sp 1 2 ȳ1 −ȳ2 1 1 n +n 12 Design 2 Paired Design (Section 2.5) Take 10 metal specimens. Use each specimen to test both tips. Decide which (Tip 1 or Tip 2) is used with each specimen first by tossing a coin. The estimate of error Sp2 measures the variability of the specimens. Thus any different due to the two tips may be harder to detect if S.E.(ȳ1 − ȳ2) is inflated because this variability is large. Two-sample t-statistic = Design 1 Completely Randomized Design Take 20 specimens (say, same metal but scrap pieces cut from different rods). Assign Tip 1 to 10 specimens and Tip 2 to the other 10 completely randomly. Obtain yij , j = 1, . . . , 10; j = 1, . . . , 10 The Paired Comparison Design (Section 2.5) Stat 402B (Spring 2016): Slide Set 1 13 The two observations taken from each specimen are paired. Since the variability of specimen does not affect the difference in these pairs of observations, the sample variance computed from the difference is a more accurate estimate of S.E.(ȳ1 − ȳ2) • The two measurement made from the same specimen of metal form a block. Hence the variability among specimens does not affect the difference. • Blocks are formed so that experiment units within blocks are relatively more homogenous than among blocks with respect to the response being measured. • Blocking represents a restriction on randomization. • In the CRD case, the t-statistic has 18 d.f. We have “lost” 10 d.f. in estimating the variance of difference in sample means. (Question: Is it better or worse to lose d.f.?) • A “blocked experiment” is not always better. Have to compare designs after experiments.