Slide set 2 Stat402B (Spring 2016) Last update: January 10, 2016

advertisement
Slide set 2
Stat402B (Spring 2016)
Last update: January 10, 2016
Stat 402B (Spring 2016): Notes Set #2
Introduction to the Design of Planning Experiments of
an Experiment
I The Experiment
(a) Statement of the problem to be solved.
(b) The response or the dependent variable to be studied. How will it be
measured?
(c) Factors to be varied and levels of each factor. How are they chosen?
II The Statistical Design
(a) How many observations should be taken (the size of the experiment)?
(b) Order of experimentation
(c) How should the randomization be carried out?
1
Stat 402B (Spring 2016): Notes Set #2
(e) What mathematical model or models are most meaningful for the
experiment? What are the assumptions involved?
III The analysis
(a) Data collection process.
(b) Computation of various descriptive statistics and of various test
statistics.
(c) Computation of diagnostic statistics and other methods such as
graphical plots of residuals, predicted values, normal plots, etc., to
determine the adequacy of the model.
(d) Interpretation of results.
2
Stat 402B (Spring 2016): Notes Set #2
Example
1. The Experiment
(a) A chemical engineer wants to investigate several catalysts in the hope
of improving the yield of a petro-chemical in an oil refinery.
(b) Crude oil is fed into the plant which is charged with the catalyst. The
product is extracted from the liquid that comes out and the response
measured is the percentage of ‘feedstock’ converted into the product.
(c) Several plant runs are made using each of 4 catalysts.
2. The Design
(a) Make 5 runs using each catalyst.
(b) A barrel of crude oil is divided up into 4 portions, and one portion is
used with each of the catalyst. The portion used with each catalyst is
chosen randomly and the runs are made in random order.
3
Stat 402B (Spring 2016): Notes Set #2
(c) The above procedures is repeated for 5 barrels of crude. The design
is a randomized complete block design with barrels as blocks. Thus a
total of 20 runs are made in the experiment.
(d) The model:
yij
where yij
µij
ij
= µij + ij ; i = 1, ..., 4; j = 1, ..., 5;
= observed yield from the run using the i-th catalyst
on the crude from the j-th barrel,
= mean(expected) yield from this run,
= random error or noise with mean 0 and varianceσ 2,
The µij may be partitioned as µij = µ + αi + βj for further analysis,
where αi is the effect of the ith catalyst, and βj is the j th block effect.
4
Stat 402B (Spring 2016): Notes Set #2
3. The Analysis
(a) Compute the analysis of variance and estimate all parameters (µij ’s
and σ 2).
(b) Test Hypothesis about pre-planned comparisons about the yield means
or obtain confidence intervals.
(c) Use multiple comparisons to compare means if necessary.
(d) Was blocking necessary? Were there missing data and how these were
dealt with?
5
Stat 402B (Spring 2016): Notes Set #2
Some Terminology, Definitions and Basic Concepts
Treatments Things that are being compared. These may be fertilizers
level, varieties, machines, methods, etc.
Experimental Units The thing to which a treatment is applied in a single
trial of the experiment.
Observation The measurement made on the experimental unit, also called
the response
Replications Independent applications of a treatment to experimental units.
Suppose y1, y2, . . . , yn are the n observations obtained from n replications
of a treatment.
Experimental Error Variation among replicated observations
Linear Model
yi = µ + i, i = 1, . . . , n
where E(yi) = µ Expected mean (fixed or constant),
6
Stat 402B (Spring 2016): Notes Set #2
i is random with E(i) = 0, Var(i) = σ 2
σ 2 measures the experimental error and is called the “Error Variance”. It is
P (yi−ȳ)2
2
estimated by the sample variance s =
n−1 if the data are a random
sample.
Example Consider testing equality of means two sample experiment:
H0 : µ1 = µ2, vs Ha : µ1 6= µ2
tc =
ȳ1. − ȳ2.
ȳ1. − ȳ2.
q
=
S.E.(ȳ1. − ȳ2.)
s n2
If n is fixed, smaller s2 will lead to a larger tc resulting in H0 being rejected.
Suppose instead that we had a paired design. Let
dj = y1j − y2j , j = 1, 2, . . . , n
7
Stat 402B (Spring 2016): Notes Set #2
¯ = √sd
d1, d2, . . . , dn ∼ N (µd, σd2), d¯ = ȳ1. − ȳ2., S.E.(d)
n
If the experimental units in each pair are homogenous, then
q one would
¯ = √sd to be much less than S.E.(ȳ1. − ȳ2.) = s 2 obtained
expect S.E.(d)
n
n
from the unpaired experiment. Recall that for the paired design tc =
and thus H0 is more likely to be rejected.
d¯√
;
sd / n
Experimental Error
(i) Variation among experimental units
(ii) Lack of control of experimental procedure
(a) Application of treatments
(b) Measurement of observations
(c) Experimental technique
8
Stat 402B (Spring 2016): Notes Set #2
Experimental error is important because we compare differences in treatment
means to the standard error of the difference (which depends on the estimate
of experimental error) to determine whether the different is significant.
Suppose we wish to compare expected means of 2 treatments
y11, y12, . . . , y1n ∼ N (µ1, σ 2)
y21, y22, . . . , y2n ∼ N (µ2, σ 2)
When we construct CI’s for µ1 − µ2 or test H0 : µ1 = µ2. We use
r
S.E.(ȳ1. − ȳ2.) =
2
σ̂
n
When this is smaller, it is easier to detect a difference between µ1 and µ2.
9
Stat 402B (Spring 2016): Notes Set #2
Randomization
•
Usually assumed that observations of experimental runs are random
samples from normal distributions
•
The allocation of treatments to the experimental units randomly also
ensures that any inherent sources of variation that the experiment is not
aware of , do not systematically bias the response to the treatment.
•
Example
(1) Assign a number 1 through n to each experimental unit.
(2) Randomly select a permutation of numbers 1 through n
(3) Assign treatment 1 to the n1 experimental units corresponding to the
first n1 numbers in 2), and treatment 2 to the next (n − n2) numbers
10
Stat 402B (Spring 2016): Notes Set #2
Replications
•
An independent application of a treatment to an experimental unit; an
experimental run repeated under the same experimental conditions
•
Why is replication required in experiments? To determine whether
treatment differences are significant, we need to compare the differences
with experimental error variance
•
Experimental error variance is estimated from the sample variance of
observations obtained from experimental runs repeated under the same
treatment i.e., replications
11
Download