Distribution of Sample Means

advertisement
Chance Models,
Hypothesis Testing, Power
Q560: Experimental Methods in Cognitive Science
Lecture 6
Which is correct in the “real” world?
Population
Stick/Switch
Stick
Switch
Observed
Sample
Which is correct in the “real” world?
Population
Stick
Stick/Switch
Stick
Switch
Observed
Sample
Stick
Switch
Switch
Probability and Samples
So far, we’ve talked about samples of size 1.
In an experiment, we take a sample of several
observations and try to make generalizations
back to the population
How do we estimate how good a representation
of the population the sample we obtain is?
The distribution of sample means contains all
sample means of a size n that can be obtained
from a population.
Sample Means
Let’s do an example from a very small
population of 4 scores:
X: 2, 4, 6, 8
Sample Means
We construct a
distribution of sample
means for n=2.
1. Step: Write down all
16 possible samples.
Sample Means
2. Step: Draw the distribution of sample means.
Sample Means
Things to note about the distribution:
1. Mean of sample means = mean of
population.
2. Shape looks normal.
3. We can use this distribution to answer
questions about probabilities.
Central Limit Theorem
• For any population with mean  and
standard deviation , the distribution of
sample means for sample size n will have a
mean of  and a standard deviation of s n ,
and will approach a normal distribution as n
approaches infinity.
Central Limit Theorem
Even though we can’t compute all possible samples
of size n from this population to compare to, the
Central Limit Theorem tells us that for any DSM of
samples of size n:
1)
2)
mM = m
s
sM =
n
3) DSM will approach the unit normal as n
approaches infinity
DSM will always be normally distributed, even if the
population was not normally distributed
Central Limit Theorem
The mean of the distribution of sample means is
called the expected value of M.
The standard deviation of the distribution of
sample means is called the standard error of M.
standard error = M
Standard deviation: standard distance between a
score X and the population mean .
Standard error: standard distance between a
sample mean M and the population mean .
Law of Large Numbers
The larger a sample, the better its mean
approximates the mean of the population.
Visualizing sampling distributions and CLT:
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Probability and the DSM
We can use the distribution of sample means to
find out probabilities (= proportions!).
For example: Given a population, how likely is it
to obtain a sample of size n with a certain M?
Probability and the DSM
Example:
SAT-scores (=500, =100).
Take sample n=25.
What is p(M>540)?
p = .0228
Another Example:
SAT-scores (=500, =100).
Take sample n=25.
What range of values for M can be expected 80%
of the time (prediction)?
Using the Standard Error
The standard error tells us how much error, on
average, should exist between a sample mean
and the population mean.
As the sample size n increases, the standard error
decreases.
Hypothesis Testing
What is Hypothesis Testing
A hypothesis test uses sample data to evaluate a
hypothesis about a population parameter.
The basic logic of hypothesis testing:
1. State hypothesis about a population.
2. Obtain random sample from population.
3. Compare sample data with population.
- if consistent, accept hypothesis
- if inconsistent, reject hypothesis
An Example:
Basic experimental situation:
Four Steps
The “4 Steps” of Hypothesis Testing:
1. State the hypothesis
2. Set decision criteria
3. Collect data and compute sample
statistic
4. Make a decision (accept/reject)
Step 1: State Hypothesis
Step 2: Set Criteria
Consider distribution of sample means if H0 is true.
Divide the distribution into two sections:
1. Sample means likely to be obtained if H0 is
true.
2. Sample means very unlikely to be obtained if
H0 is true.
Step 2: Set Criteria
Distribution of sample means:
Step 2: Set Criteria
Examples for boundaries:
Step 3: Collect Data/Statistics
Select random sample and perform “experiment”.
Compute sample statistic, e.g. sample mean.
Locate sample statistic within hypothesized
distribution (use z-score).
Is sample statistic located within the critical
region?
Step 4: Decision
1. Possibility: sample statistic is within critical
region. Reject H0.
2. Possibility: sample statistic is not within critical
region. Do not reject H0.
We reject or do not reject the null, we cannot
prove the alternate hypothesis
It is easier to demonstrate a hypothesis is false
than to demonstrate that it is true
Hypothesis Testing: An Example
It is known that corn in Bloomington grows to an
average height of =72 =6 six months after being
planted.
We are studying the effect of “Plant Food 6000” on
corn growth. We randomly select a sample of 40
seeds from the above population and plant them,
using PF-6000 each week for six months. At the end
of the six month period, our sample has a height of
M=78 inches. Go through the steps of hypothesis
testing and draw a conclusion about PF-6000
1. State hypotheses
3. Collect data
2. Chance model/critical region
4. Decision and conclusion
1. State hypotheses
• Null and alternate in both sentence and
parameter notation
2. Determine critical region in chance model
• Calculate and draw dist of sample means
(DSM)
• Determine alpha level (.05)
• Calculate upper and lower cutoff for means that
will
be=considered
Mcrit
m ± zcrits M “unlikely” due to chance
•
(for =.05, zcrit=±1.96)
3. Collect data/compute test statistic
(Done for us)
4. Hypothesis Decision and Conclusion
• Does Mobt exceed Mcrit?
• If yes, reject null; if no, cannot reject null
Step 1: State Hypotheses
In words:
Null: PF6000 will not have an effect on corn growth
Alt: PF6000 will have an effect on corn growth
In “code” symbols:
H 0 : m = 72
H1 : m ¹ 72
Step 2: Chance Model and Critical Value
a) Distribution of Sample Means:
N = 40
s
mM = 72
6
sM =
=
= 0.95
n
40
Step 2: Chance Model and Critical Value
a) Distribution of Sample Means:
N = 40
mM = 72
s
6
sM =
=
= 0.95
n
40
Draw the sampling distribution
b) Set alpha level =.05  zcrit = ±1.96
Shade in critical region on sampling distribution
Step 2: Chance Model and Critical Value
c) Compute critical values to correspond to zcrit
M lower = m - zcrits M
= 72 -1.96(.95)
= 72 -1.86
= 70.14
M upper = m + zcrits M
= 72 + 1.96(.95)
= 72 + 1.86
= 73.86
This is the range of means we will tolerate as due to chance
Beyond these values, the obtained sample mean is unlikely to
have come from this expected sampling distribution
Pencil these values onto our sampling distribution
Step 3: Do Experiment
This is the part where we actually draw the
sample, conduct the experiment, and compute
the sample statistic (mean so far)
For the question, this part has already been done
for us, we just need to compare this obtained
sample mean to our chance model to
determine if any discrepancy between our
sample and the original population is due to:
1. Sampling Error
2. A true effect of our manipulation
Step 4: Decision and Conclusion
• Mcrit is 70.14 (lower) or 73.86 (upper)
• If Mobt exceeds either of these critical values
(i.e., is out of the “chance” range, we reject
H0. Otherwise, cannot reject H0
Mobt = 78
Mcrit = 73.86
Mobt exceeds Mcrit  Reject H0
Conclusion: We must reject the null hypothesis
that the chemical does not produce a
difference. Conclude that PF6000 has an effect
on corn growth.
Directional Tests
Directional = one-tailed
In a one-tailed test the hypotheses make a
statement about the expected direction of an
effect.
Example: experimental test of dietary drug
(expected: reduction in food intake)
H0: no reduction in food intake
H1: food intake is reduced
Errors and Uncertainty
Errors and Uncertainty
A hypothesis test may produce an erroneous
result (wrong decision).
Two types of errors can be made …
Type I Error: Concluding there is an effect
when there really is not
Type II Error: Concluding there is no effect
when there really is
Errors and Uncertainty
Type I error: H0 is rejected, while in fact the
treatment has no effect.
Example: Experimental treatment (behavior,
drug, etc.) has actually no effect, but sample
data make it look that way (due to sampling
error).
The alpha level is the probability that the test will
lead to a Type I error.
Researcher controls the magnitude of Type I error
by setting .
Errors and Uncertainty
Type II error: Treatment effect really exists but
hypothesis test fails to detect it.
Example: treatment effect may be small
Symbol 
Summary of possible outcomes of a
statistical decision:
Real World
H0 True
(No effect)
Reject H0
(Effect)

Type I Error
H0 False
(Real effect)
1-•
Power
Experimenter’s
Decision
Retain H0
(No effect)
1-
•
PCR
Type II Error
Statistical Power
Another way of defining power:
Power is the probability of obtaining sample
data in the critical region when H0 is actually
false.
“Probability of detecting an effect if indeed one
exists”
Power is difficult to specify because it depends
in part on the magnitude of any treatment
effect.
Example …
Power, if treatment effect is 20 points:
Power, if treatment effect is 40 points:
Factors Affecting Power
1. Alpha (lowering  reduces power)
2. Sample size (increasing n increases power b/c
the standard error goes down)
3. Effect size (the bigger the effect, the greater
the power b/c distance between distributions is
bigger)
4. Tails (a one-tailed hypothesis test is more
powerful than a two-tailed hypothesis test)
p and α
Sample means located in the critical region have
p< (reject H0).
Sample means located outside of the critical
region have p> (accept H0)
Why not z-test: An Example
It is thought that we are genetically hardwired to
recognize human faces.
In a preferential looking paradigm, newborns are
presented with two stimuli: one representing a
face, and one containing the same features, but
in a different configuration. The experimenter
records how long the infants look at the face
stimulus during a 60-sec presentation (lets
assume they always look at one or the other)
By chance, we would only expect them to look at
the face stimulus for 30 seconds, but they look
for 35 seconds…is this effect significant?
Sample Variance
We don’t know the variability of the population.
But: we do know the variability of the sample.
Sample variance =
s2
=
SS
n-1
=
SS
Sample standard deviation = s2
df
Estimated Standard Error
We can use the estimated standard error as an
estimate of the real standard error.
Standard error =
sM =
Estimated standard error =
s
n
=
s2
n
s
s2
sM =
=
n
n
t-statistic
Substituting the estimated standard error in the
formula for the z-score gives us the following:
t statistic = t =
M-
sM
The t-statistic approximates a z-score, using the
sample variance instead of the population variance
(which is unknown).
How well does that work?
Degrees of Freedom and t Statistic
Degrees of freedom describes the number of
scores in a sample that are free to vary.
degrees of freedom = df = n-1
The greater df, the better the t-statistic
approximates the z-score.
The set of t statistics for a given df (n) forms a t
distribution.
For large df (large n) the t distribution
approximates the normal distribution.
t distribution: Shape
Hypothesis Tests Using the t Statistic
Same procedure as with z-scores, except using the
t statistic instead.
Step 1: State hypothesis, in terms of population
parameter .
Step 2: Determine critical region, using , df, and
looking up t.
Step 3: Collect data and calculate value for t using
estimated standard error.
Step 4: Decide, based on whether t value for
sample falls within critical region
One-Sample t Test: An Example
We’ll go back to our preferential looking paradigm
and newborn babies. We show them the two
stimuli for 60 seconds, and measure how long they
look at the facial configuration. Our null
assumption is that they will not look at it for
longer than half the time,  = 30
Our alternate hypothesis is that they will look at
the face stimulus longer b/c face recognition is
hardwired in their brain, not learned (directional)
Our sample of n = 26 babies looks at the face
stimulus for M = 35 seconds, s = 16 seconds
Test our hypotheses ( = .05, one-tailed)
Step 1: Hypotheses
Sentence:
Null: Babies look at the face stimulus for less than
or equal to half the time
Alternate: Babies look at the face stimulus for
more than half the time
Code Symbols:
H 0 = m £ 30
H1 = m > 30
Step 2: Determine Critical Region
Population variance is not known, so use sample
variance to estimate
n = 26 babies; df = n-1 = 25
Look up values for t at the limits of the critical
region from our critical values of t table
Set  = .05; one-tailed
1.708
Step 2: Determine Critical Region
Population variance is not known, so use sample
variance to estimate
n = 26 babies; df = n-1 = 25
Look up values for t at the limits of the critical
region from our critical values of t table
Set  = .05; one-tailed
tcrit = +1.708
Step 3: Calculate t statistic from sample
a) Sample variance:
b) Estimated
standard error:
c) t statistic:
s2 =162 = 256
s2
256
sM =
=
= 3.14
n
26
M - m 35 - 30
t=
=
= 1.59
sm
3.14
Step 4: Decision and Conclusion
The tobt=1.59 does not exceed tcrit=1.708
 We must retain the null hypothesis
Conclusion: Babies do not look at the face
stimulus more often than chance, t(25) = +1.59,
n.s., one-tailed. Our results do not support the
hypothesis that face processing is innate.
Which is correct in the “real” world?
Population
Stick
Stick/Switch
Stick
Switch
Observed
Sample
Stick
Switch
Switch
Download