Stat 512_Lecture1.doc

advertisement
Stat 512: Lecture 1
A brief Review of Inference for 2 sample
means and some Design Vocabulary
Since the idea of the class is ANALYIS of DESIGNED
EXPERIMENTS I would like to focus on the three key
words.
1. Analysis
2. Design
3. Experiment
I am going to assume that you ALL have a working
knowledge of the following:
1. Random Variable
2. Mean and Variance
3. Calculating mean and variance
4. Expectation
5. The Normal Distribution
6. Looking up tables for Normal, t, chi-square and F
I would like us all to design the following simple
experiments for me:
1. Interested in seeing if there is a difference in average
prices between IGA and Safeway.
2. Interested in seeing if there is a difference in the
mean GRE Quantitative Scores between males and
females among students in the 512 class.
Study 1:
DESIGNING THE STUDY
Here we probably want to pick 10 items (say) at random
that are available in each store and get the prices from
both stores:
Example of a possible Layout:
item
Price Price at
at IGA Safeway
2%Milk
Yukon Potatoes 1lb
Walla Walla Onions 1lb
Fresh Express Salad 1lb
Tyson Whole Chicken 2lb
Mott Apple Juice 1lt
Ritz Crackers 2lb
Oreo Cookies 1 lb
Atlantic Salmon 1lb
Dannon Yogurt 1 lb
So here the data are paired and we would perform a
PAIRED t test. (Pairing on each item)
How to conduct the ANALYSIS:
Hypothesis:
H0:  
d
H1:   
d   
To find the test statistic we need to first take differences (price
at IGA– price at SAFEWAY). Based on the differences calculate
d (mean of the differences) and sd (standard deviation of the
differences).
Test statistic
(d   d )
is t 
.
sd
n
Reject if obs | t| > t(, n-1)
Design 2:
DESIGNING THE STUDY
Here we randomly pick 5 males and 5 females from
the class and get their GRE Quant Scores.
Here there is no pairing done among the males and
females.
Layout:
Males
GRE Score Females
GRE Score
This is not paired and so we consider the data
independent.
Analyzing the DATA:
Hypothesis:
0
1  

We will also be given (or we can calculate) the sample
means x1 , x 2 , standard deviation s1, s2 and the sample
sizes n1, n2.
RECALL: HERE we assume that both the populations
have equal variance.
Calculate, sp2 = (n1-1)s12 + (n2-1)s22
------------------------(n1+n2-2)
This is the “pooled” variance. Then sp is the “pooled”
standard deviation.
Define the pooled t-statistic as follows.
( y1  y 2 )  (1   2 )
t
1 1
sp

n1 n2
This follows a t distribution with (n1+n2-2) degrees of
freedom.
Reject H0 in favor of
1   if observed |t| >t(, n1+n2-2)
This is essentially a review for all of you. But I want you
to think of things from a DESIGN perspective now.
1. WHY did we randomly select the items or students?
2. WHY did we use more than one item in our study?
3. WHAT advantage does pairing in STUDY 1 give us?
The answers to these questions lead to the basic tenets of
experimental design as suggested by Sir RA Fisher.
1. Randomization
2. Replication
3. Local Control
Randomization:
This is the procedure of selecting units at random from
available units or assigning units to treatments at random.
This reduces bias.
Replication:
This means using more than one unit for a treatment for
comparison. This establishes experimental error and
reduces bias.
Local Control:
It’s the process of stratifying the units to homogenous
groups or blocks and assigning treatments at random
within the homogenous group. This reduces bias and
reduces experimental error.
In ANY Design context think of these three tenets.
Some Definitions and Vocabulary in the context of
DESIGN:
1. Factor, Levels, Treatment:
Factor: any substance or item whose effect on the data is
to be studied. An experiment involving two or more
treatment is called factorial experiment.
Levels: values of the factor used in the experiment. The
levels of a factor are the specific types or amounts of the
factor that will actually be used in the experiment.
For example, in an experiment to assess the effects of
different amounts of UV radiation upon the growth rate of
smolt, the UV radiation was held at normal, 1/2 normal,
and 1/5 normal levels. These would the three levels for
this factor UV Radiation and we could call them
TREATMENTS.
2. UNIT:
Experimental Unit: the unit to which the treatment is
applied.
Observational unit (or Measurement unit): the unit
on which the response is measured. In some cases, the
observational unit may be different from the
experimental unit - be careful!
CAUTION: A common mistake in the analysis of
experimental data is to confuse the experimental and
observational unit.
For example, consider an experiment to investigate the
effects of UV levels on the growth of smolt. Two tanks are
prepared; one tank has high levels of UV light, the second
tank has no UV light. Many fish are placed in each tank.
The individual fish are measured. In this experiment, the
observational unit is the smolt, but the experimental unit
is the tank. The treatments are NOT individually
administered to single fish.
3. Block: A homogenous group of units is a block.
4. Replicate: The multiple units used in the experiment
is the replicate.
5. Response Variable: The outcome that is being
measured. For example, in an experiment to measure
smolt growth in response to UV levels, the response
variable for each smolt could be final weight after 30
days.
6. Experimental error is the variation among
identically treated experimental units.
Terminology
Types of Studies
Comparative experimental studies are experiments in
which the treatments or conditions are assigned by the
researcher to the experimental units.
Comparative observational studies are experiments in
which the treatments or conditions are observed by the
researcher on the experimental units
Examples: Let’s figure out the following for our two studies
Study 1: Comparing prices at IGA and Safeway
1. Factor
2. Level
3. Treatment
4. Response Variable
5. Block
6. Replicate
7. Whether it’s an experiment or an observational study
Study 2: Comparing prices males and females GRE Quant Score
1. Factor
2. Level
3. Treatment
4. Response Variable
5. Block
6. Replicate
7. Whether it’s an experiment or an observational study
a. An agricultural experimental station is going to test two varieties of
wheat. Each variety will be planted on 3 fields, and the yield from the
field will be measured.
1. Factor
2. Level
3. Treatment
4. Response Variable
5. Block
6. Replicate
7. Whether it’s an experiment or an observational study
b. An agricultural experimental station is going to test two varieties of
wheat.
Each variety will be tested with two types of fertilizers. Each
combination will be applied to two plots of land. The yield will be
measured for each plot.
1. Factor
2. Level
3. Treatment
4. Response Variable
5. Block
6. Replicate
7. Whether it’s an experiment or an observational study
c. Fish farmers want to study the effect of an anti-bacterial drug on the
amount of bacteria in fish gills. The drug is administered at three dose
levels (none, 20, and 40 mg/100L). Each dose is administered to a large
controlled tank through the filtration system. Each tank has 100 fish. At
the end of the experiment, the fish are killed, and the amount of bacteria
in the gills of each fish is measured.
1. Factor
2. Level
3. Treatment
4. Response Variable
5. Block
6. Replicate
7. Whether it’s an experiment or an observational study
Download