(d) Interpretation of results. 2 (c) Computation of diagnostic statistics and other methods such as graphical plots of residuals, predicted values, normal plots, etc., to determine the adequacy of the model. (b) Computation of various descriptive statistics and of various test statistics. (a) Data collection process. III The analysis (e) What mathematical model or models are most meaningful for the experiment? What are the assumptions involved? Stat 402B (Spring 2016): Notes Set #2 Last update: January 10, 2016 Stat 402B (Spring 2016): Notes Set #2 1 3 (b) A barrel of crude oil is divided up into 4 portions, and one portion is used with each of the catalyst. The portion used with each catalyst is chosen randomly and the runs are made in random order. (a) Make 5 runs using each catalyst. 2. The Design (c) Several plant runs are made using each of 4 catalysts. (b) Crude oil is fed into the plant which is charged with the catalyst. The product is extracted from the liquid that comes out and the response measured is the percentage of ‘feedstock’ converted into the product. (a) A chemical engineer wants to investigate several catalysts in the hope of improving the yield of a petro-chemical in an oil refinery. Example 1. The Experiment (c) How should the randomization be carried out? (b) Order of experimentation (a) How many observations should be taken (the size of the experiment)? II The Statistical Design (c) Factors to be varied and levels of each factor. How are they chosen? Stat402B (Spring 2016) (a) Statement of the problem to be solved. I The Experiment (b) The response or the dependent variable to be studied. How will it be measured? Slide set 2 Stat 402B (Spring 2016): Notes Set #2 Introduction to the Design of Planning Experiments of an Experiment Stat 402B (Spring 2016): Notes Set #2 = observed yield from the run using the i-th catalyst on the crude from the j-th barrel, = mean(expected) yield from this run, = random error or noise with mean 0 and varianceσ 2, = μij + ij ; i = 1, ..., 4; j = 1, ..., 5; Stat 402B (Spring 2016): Notes Set #2 6 dj = y1j − y2j , j = 1, 2, . . . , n 7 Suppose instead that we had a paired design. Let yi = μ + i, i = 1, . . . , n where E(yi) = μ Expected mean (fixed or constant), Linear Model ȳ1. − ȳ2. ȳ1. − ȳ2. = S.E.(ȳ1. − ȳ2.) s n2 If n is fixed, smaller s2 will lead to a larger tc resulting in H0 being rejected. tc = H0 : μ1 = μ2, vs Ha : μ1 = μ2 Example Consider testing equality of means two sample experiment: i is random with E(i) = 0, Var(i) = σ σ 2 measures the experimental error and is called the “Error Variance”. It is (yi−ȳ)2 estimated by the sample variance s2 = n−1 if the data are a random sample. Experimental Error Variation among replicated observations Suppose y1, y2, . . . , yn are the n observations obtained from n replications of a treatment. Replications Independent applications of a treatment to experimental units. Observation The measurement made on the experimental unit, also called the response Experimental Units The thing to which a treatment is applied in a single trial of the experiment. Treatments Things that are being compared. These may be fertilizers level, varieties, machines, methods, etc. Some Terminology, Definitions and Basic Concepts Stat 402B (Spring 2016): Notes Set #2 Stat 402B (Spring 2016): Notes Set #2 2 5 (d) Was blocking necessary? Were there missing data and how these were dealt with? (c) Use multiple comparisons to compare means if necessary. (b) Test Hypothesis about pre-planned comparisons about the yield means or obtain confidence intervals. (a) Compute the analysis of variance and estimate all parameters (μij ’s and σ 2). 3. The Analysis 4 The μij may be partitioned as μij = μ + αi + βj for further analysis, where αi is the effect of the ith catalyst, and βj is the j th block effect. μij ij where yij yij (d) The model: (c) The above procedures is repeated for 5 barrels of crude. The design is a randomized complete block design with barrels as blocks. Thus a total of 20 runs are made in the experiment. Stat 402B (Spring 2016): Notes Set #2 2 σ̂ n 10 (3) Assign treatment 1 to the n1 experimental units corresponding to the first n1 numbers in 2), and treatment 2 to the next (n − n2) numbers (2) Randomly select a permutation of numbers 1 through n (1) Assign a number 1 through n to each experimental unit. Example • • • The allocation of treatments to the experimental units randomly also ensures that any inherent sources of variation that the experiment is not aware of , do not systematically bias the response to the treatment. • 11 Experimental error variance is estimated from the sample variance of observations obtained from experimental runs repeated under the same treatment i.e., replications Why is replication required in experiments? To determine whether treatment differences are significant, we need to compare the differences with experimental error variance An independent application of a treatment to an experimental unit; an experimental run repeated under the same experimental conditions • Usually assumed that observations of experimental runs are random samples from normal distributions • Stat 402B (Spring 2016): Notes Set #2 Stat 402B (Spring 2016): Notes Set #2 Replications 9 When this is smaller, it is easier to detect a difference between μ1 and μ2. S.E.(ȳ1. − ȳ2.) = When we construct CI’s for μ1 − μ2 or test H0 : μ1 = μ2. We use y21, y22, . . . , y2n ∼ N (μ2, σ 2) y11, y12, . . . , y1n ∼ N (μ1, σ 2) 8 d¯√ ; sd / n Stat 402B (Spring 2016): Notes Set #2 Experimental error is important because we compare differences in treatment means to the standard error of the difference (which depends on the estimate of experimental error) to determine whether the different is significant. Suppose we wish to compare expected means of 2 treatments Randomization (a) Application of treatments (b) Measurement of observations (c) Experimental technique (ii) Lack of control of experimental procedure (i) Variation among experimental units Experimental Error from the unpaired experiment. Recall that for the paired design tc = and thus H0 is more likely to be rejected. ¯ = √sd d1, d2, . . . , dn ∼ N (μd, σd2), d¯ = ȳ1. − ȳ2., S.E.(d) n If the experimental units in each pair are homogenous, then one would sd ¯ expect S.E.(d) = √n to be much less than S.E.(ȳ1. − ȳ2.) = s n2 obtained