Exam • Exam starts two weeks from today

advertisement
Exam
• Exam starts two weeks from today
Amusing Statistics
• Use what you know about normal
distributions to evaluate this finding:
The study, published in Pediatrics, the journal of the American Academy
of Pediatrics, found that among the 4,508 students in Grades 5-8ハwho
participated, 36 per cent reported excellent school performance, 38 per
cent reported good performance, 20 per cent said they were average
performers, and 7 per cent said they performed below average.
Review
• The Z-test is used to compare the mean
of a sample to the mean of a population
Zx 
x  x
x
and
X 

n
Review
• The Z-score is normally distributed
Review
• The Z-score is normally distributed
• Thus the probability of obtaining any
given Z-score by random sampling is
given by the Z table
Review
• We can likewise determine critical
values for Z such that we would reject
the null hypothesis if our computed Zscore exceeds these values
– For alpha = .05:
• Zcrit (one-tailed) = 1.64
• Zcrit (two-tailed) = 1.96
Confidence Intervals
• A related question you might ask:
– Suppose you’ve measured a mean and
computed a standard error of that mean
– What is the range of values such that there
is a 95% chance of the population mean
falling within that range?
Confidence Intervals
• There is a 2.5% chance that the population mean is
actually 1.96 standard errors more than the observed
mean
Gaussian (Normal) Distribution
True mean?
0.6
0.5
probability
0.4
0.3
0.2
0.1
0
-4
-3
-2
-1
0
1
2
3
4
score
95%
1.96
2.5%
Confidence Intervals
• There is a 2.5% chance that the population mean is
actually 1.96 standard errors less than the observed
mean
Gaussian (Normal) Distribution
True mean?
0.6
0.5
probability
0.4
0.3
0.2
0.1
0
-4
-3
-2
-1
0
1
score
2.5%
-1.96
95%
2
3
4
Confidence Intervals
• Thus there is a 95% chance that the true population
mean falls within + or - 1.96 standard errors from a
sample mean
Confidence Intervals
• Thus there is a 95% chance that the true population
mean falls within + or - 1.96 standard errors from a
sample mean
• Likewise, there is a 95% chance that the true
population mean falls within + or - 1.96 standard
deviations from a single measurement
Confidence Intervals
• This is called the 95% confidence interval…and it is
very useful
• It works like significance bounds…if the 95% C.I.
doesn’t include the mean of a population you’re
comparing your sample to, then your sample is
significantly different from that population
Confidence Intervals
• Consider an example:
• You measure the concentration of mercury in your
backyard to be .009 mg/kg
• The concentration of mercury in the Earth’s crust is
.007 mg/kg. Let’s pretend that, when measured at
many sites around the globe, the standard deviation
is known to be .002 mg/kg
Confidence Intervals
backyard  .009mg/kg
  .002mg/kg
• The 95% confidence interval
for this mercury

measurement is
95%C.I.  x  / Zcrit (two tailed)  
 .009  /1.96  .002mg/kg
 .0051  .0129

Confidence Intervals
• This interval includes .007 mg/kg which, it turns out,
is the mean concentration found in the earth’s crust in
general
.0051  .007  .0129
• Thus you would conclude that your backyard isn’t
artificially contaminated by mercury
Confidence Intervals
• Imagine you take 25 samples from around Alberta
and you found:
x  .009mg/ kg

  .002mg/kg
.002
 x   
 .0004
n
25


Confidence Intervals
• Imagine you take 25 samples from around Alberta
and you found:
• .009 +/- (1.96 x .0004) = .008216 to .009784
• This interval doesn’t include the .007 mg/kg value for
the earth’s crust so you would conclude that Alberta
has an artificially elevated amount of mercury in the
soil
Power
• we perform a Z-test and determine that
the difference between the mean of our
sample and the mean of the population
is not due to chance with a p < .05
Power
• we perform a Z-test and determine that
the difference between the mean of our
sample and the mean of the population
is not due to chance with a p < .05
• we say that we have a significant
result…
Power
• we perform a Z-test and determine that
the difference between the mean of our
sample and the mean of the population
is not due to chance with a p < .05
• we say that we have a significant
result…
• but what if p is > .05?
Power
• What are the two reasons why p comes
out greater than .05?
Power
• What are the two reasons why p comes
out greater than .05?
– Your experiment lacked Statistical Power
and you made a Type II Error
– The null hypothesis really is true
Power
• Two approaches:
– The Hopelessly Jaded Grad Student
Solution
– The Wise and Well Adjusted Professor
Procedure
Power
1. Hopelessly Jaded Grad Student
Solution - conclude that your hypothesis
was wrong and go directly to the grad
student pub
Power
- This is not the recommended course
of action
Power
2. The Wise Professor Procedure consider the several reasons why you
might not have detected a significant
effect
Power
- recommended by wise professors the
world over
Power
• Why might p be greater than .05 ?
• Recall that:
Zx 
x  x
x
and
X 

n
Power
• Why might p be greater than .05 ?
1. Small effect size:
X

is quite close to the mean of the population
– The effect doesn’t stand out from the
variability in the data
– You might be able to increase your effect
size (e.g. with a larger dose or treatment)
Power
• Why might p be greater than .05 ?
2. Noisy Data


and therefore  X is quite large
– A large denominator will swamp the small
effect
– Take
greater care to reduce measurement
errors
Power
• Why might p be greater than .05 ?
3. Sample Size is Too Small
X

is quite large because
n
is small
– A large denominator will swamp the small
effect
– Run more subjects

Power
• The solution in each case is more
power:
Power
• The solution in each case is more
power:
• Power is like sensitivity - the ability to
detect small effects in noisy data
Power
• The solution in each case is more
power:
• Power is like sensitivity - the ability to
detect small effects in noisy data
• It is the opposite of Type II Error rate
Power
• The solution in each case is more
power:
• Power is like sensitivity - the ability to
detect small effects in noisy data
• It is the opposite of Type II Error rate
• So that you know: there are equations
for computing statistical power
Power
• An important point about power and the
null hypothesis:
– Failing to reject the null hypothesis DOES
NOT PROVE it to be true!!!
Power
• Consider an example:
– How to prove that smoking does not cause
cancer:
• enroll 2 people who smoke infrequently and
use an antique X-Ray camera to look for
cancer
• Compare the mean cancer rate in your group
(which will probably be zero) to the cancer rate
in the population (which won’t be) with a Z-test
Power
• Consider an example:
– If p came out greater than .05, you still
wouldn’t believe that smoking doesn’t
cause cancer
Power
• Consider an example:
– If p came out greater than .05, you still
wouldn’t believe that smoking doesn’t
cause cancer
– You will, however, often encounter
statements such as “The study failed to
find…” misinterpreted as “The study
proved no effect of…”
Experimental Design
• We’ve been using examples in which a
single sample is compared to a
population
Experimental Design
• We’ve been using examples in which a
single sample is compared to a
population
• Often we employ more sophisticated
designs
Experimental Design
• We’ve been using examples in which a
single sample is compared to a
population
• Often we employ more sophisticated
designs
• What are some different ways you could
run an experiment?
Experimental Design
• Compare one mean to some value
– Often that value is zero
Experimental Design
• Compare one mean to some value
– Often that value is zero
• Compare two means to each other
Experimental Design
• There are two general categories of
comparing two (or more) means with
each other
Experimental Design
1. Repeated Measures - also called
“within-subjects” comparison
•
•
•
•
•
The same subjects are given pre- and postmeasurements
e.g. before and after taking a drug to lower
blood pressure
Powerful because variability between
subjects is factored out
Note that pre- and post- scores are linked we say that they are dependant
Note also that you could have multiple tests
Experimental Design
1. Problems with Repeated-Measure
design:
•
•
•
Practice/Temporal effect - subjects get
better/worse over time
The act of measuring might preclude
further measurement - e.g. measuring
brain size via surgery
Practice effect - subjects improve with
repeated exposure to a procedure
Experimental Design
2. Between-Subjects Design
•
•
Subjects are randomly assigned to treatment
groups - e.g. drug and placebo
Measurements are assumed to be statistically
independent
Experimental Design
2. Problems with Between-Subjects
design
•
•
Can be less powerful because variability
between two groups of different subjects can
look like a treatment effect
Often needs more subjects
Experimental Design
•
We’ll need some statistical tests that
can compare:
– One sample mean to a fixed value
– Two dependent sample means to each
other (within-subject)
– Two independent sample means to each
other (between-subject)
Experimental Design
•
•
The t-test can perform each of these
functions
It also gets around a big problem with
the z-test…
Problems with Z
and what to do instead
The Z statistic
• The Z statistic (with which to compare to
the Zcrit)
Zx 
Where


x  x
X
X 
x

n
The Z statistic
• What is the problem you will encounter
in trying to use this statistic?
The Z statistic
• What is the problem you will encounter
in trying to use this statistic?
• Although you might have a guess about
the population mean, you will almost
certainly not know the population
variance!
The Z statistic
Zx 
Where


x  x
X
X 
x

n
The Z statistic
Zx 
Where


x  x
X
X 
x

n
The Z statistic
Zx 
Where


x  x
X
X 
x

n
The Z statistic
Zx 
Where


x  x
X
X 
x

n
The Z statistic
• What to do?
2
• Could we estimate 
• What would we use and what would
have to be the case for it to be useful?

The Z statistic
• What to do?
2
• Could we estimate 
• What would we use and what would
have to be the case for it to be useful?
• We could use our sample variance, S2
2
to estimate the population variance 

Estimating Population
Variance
• Just like there are many sample means
(the sampling distribution of the mean)
there are many S2s
Estimating Population
Variance
• Just like there are many sample means
(the sampling distribution of the mean)
there are many S2s
• X tends to be near the value of 
2
2
but does S tend to be near the value of 

Estimating Population
Variance
• Just like there are many sample means
(the sampling distribution of the mean)
there are many S2s
• X tends to be near the value of 
2
2
but does S tend to be near the value of 
• No. It is a biased estimator. It tends to
2
be lower than 

Estimating Population
Variance
• Why is S2 biased?
Estimating Population
Variance
• Why is S2 biased?
• The sum of the deviation scores in your
sample must equal zero regardless of
where they came from in the population
Estimating Population
Variance
• Why is S2 biased?
• The sum of the deviation scores in your
sample must equal zero regardless of
where they came from in the population
• This means that the deviations in your
sample are somewhat more constrained
than in the population
Estimating Population
Variance
• Why is S2 biased?
• The sum of the deviation scores in your
sample must equal zero regardless of
where they came from in the population
• This means that the deviations in your
sample are somewhat more constrained
than in the population
• S2 is has relatively fewer degrees of
freedom than the entire population
Estimating Population
Variance
• Specifically S2 has n - 1 degrees of
freedom
Estimating Population
Variance
• Specifically S2 has n - 1 degrees of
freedom
• So if we compute S2 but use n - 1
instead of n in the denominator we’ll get
2
an unbiased estimator of 
Estimating Population
Variance
• Of course if you’ve already computed S2
using n in the denominator you can
multiply by n to recover the sum of
squared deviations and then divide by
n-1
The t Statistic(s)
• Using an estimated  , which we’ll
2
ˆ we can create an estimate of  X
call 
ˆX
which we’ll call 
2
 
ˆX 

where

ˆ

n 
(X i  X )
ˆ 


n 1
2
2
nS
n 1
The t Statistic(s)
ˆ X instead of  X we get a
• Using, 
statistic that isn’t from a normal (Z)
distribution - it is from a family of
distributions called t


x  x
tn1 
ˆx

Download