MEI Conference t Stella Dudzic

advertisement
MEI Conference 2013
Using the t distribution
Stella Dudzic
Stella.dudzic@mei.org.uk
Samples of size 5 from N(175, 152)

152 
If X~ N(175, 152) then X ~ N  175,

5 

Values of
x −µ
σ
n
400
f
300
200
100
−5
Values of
5
x −µ
s
n
400
f
300
200
100
−5
What do you notice?
5
Question: The exam marks for an A-level module were normally distributed last year
with a mean of 61 and a standard deviation of 17. I think the students in my area are
different, but just as variable; that is I think their mean mark is normally distributed
with mean μ and standard deviation 17. I get the marks for a random sample of ten
of them and find that the sample mean is 62.9. Find a symmetrical 95% confidence
interval for μ.
Is there evidence to suggest that my students are different?
Solution
⎛ σ ⎞
X~N ⎜ μ ,
⎟
n ⎠
⎝
⎛ 17 2 ⎞
X~N ⎜ μ ,
⎟
10 ⎠
⎝
x −μ
z=
~ N(0,1)
17
10
2
⎛
⎞
−
μ
x
< 1.96 ⎟ = 0.95
P ⎜ −1.96 <
⎜
⎟
17
⎜
⎟
10
⎝
⎠
−1.96 × 17
< x − μ < 1.96 × 17
10
10
x − 1.96 × 17
< μ < x + 1.96 × 17
10
10
Annotation
Write down the distribution of the
sample mean – remember it is a
random variable; you know what
it turned out to be for this sample
but different samples would give
different values.
Standardise. You can put in the
value of x and use your
calculator right at the end – this
saves on writing and reduces
opportunities for errors.
Use normal tables and a diagram
of the normal curve to find the
cut-off points from the N(0, 1)
distribution: 1.96 and – 1.96 for a
95% confidence interval.
Write the information you have
as a probability statement.
Rearrange the inequality.
Continue re-arranging to get an
interval for μ. Some people
prefer to remember that the
format for a confidence interval
for μ is x ± k
The 95% CI for μ is (52.4, 73.4)
My confidence interval contains 61 so it
does not provide evidence that my
students are different; the mean for all the
students in my area could be between
52.4 and 73.4.
σ
n
, where k is the
number from normal tables.
Put numbers in and round
sensibly.
Interpret in the original context.
38 has come up
213 times to end
March 2010 but
20 has only come
up 148 times
See
www.lottery.co.uk/st
atistics/
for data
This reference flowchart is one of a series of three, designed by Stella Dudzic.
The series includes: Hypothesis tests for one sample, Hypothesis tests for two samples, and
Experimental Design and Hypothesis tests for several samples: ANOVA (Analysis of Variance)
The series is also available as a set of three full colour posters in A2 size for wall display.
To view the colour posters and to place an order please visit the MEI website at
www.mei.org.uk
Yo u co uld us
ea
go odness of
fit
test to chec
k if
there is ev
idence
that the lo
tter y
is not fair
Yes
fig 1
Male
Test on
mean/
median
female
right
handed
32
28
left
handed
7
5
Do you
have a
large
sample?
Yes
Do you
know the
variance?
No
Yes
No
Are the data
from a Normal
distribution?
Contingency table
No
fig 2
Are the
data single
variable or
bivariate?
Single
variable
Test on
variance
With a large set of data,
the scatter diagram for a
bivariate Normal distribution
is approximately elliptical
cumulative probability
What
distribution
are the
data from?
Yes
Bivariate
data
observed
D
0.5
0
x
Test statistic for
Kolmogorov-Smirnov test
This reference flowchart is one of a series of three, designed by Stella Dudzic.
The series includes: Hypothesis tests for one sample, Hypothesis tests for two samples, and
Experimental Design and Hypothesis tests for several samples: ANOVA (Analysis of Variance)
The series is also available as a set of three full colour posters in A2 size for wall display.
To view the colour posters and to place an order please visit the MEI website at
www.mei.org.uk
Estimate variance as s²
and use t test
Poisson
Poisson test
Symmetrical Distribution
Other
For Normal population
Wilcoxon single
sample test
Sign test
test for variance
Binomial test or Normal approximation
Goo dness
of fit test
test or Kolmogorov-Smirnov (see fig 3)
Are the
variables
categories
or numbers?
EXPECTED
Normal test
No
Categories in a contingency table (see fig 1)
0.75
0.25
Do you
know the
variance?
Estimate variance as s²
and use Normal test
Test of
proportion
fig 3
1
Normal test
Number pairs
Are the data
fro m a bivariate
Nor mal distribution?
(see fig 2)
test
Yes
No
Pearson’s product
moment correlation test
Spearman’s rank
correlation test or Kendall’s
rank correlation test
Extract from November 2008 MEI Newsletter
t tests in S3
In the specification for S3, the notes about t tests outline when it is appropriate to use
such a test for a mean: “In situations where the sample is small and the population variance
is unknown, but the population may be assumed to have a Normal distribution.”
Answers to some of the questions that students may ask about this are given below.
How small does the sample need to be to use a t test?
There is no exact answer, it depends on the situation but a reasonable rule of thumb is that
samples of fewer than about 30 are small whereas samples of over 30 are large.
What if the sample is large?
If the population is Normal, the population variance is unknown and the sample size is large
then a Normal test should be used. The population variance is estimated from the sample.
Students learnt this in S2.
What if the population variance is known?
When the population is Normal and the variance is known, a Normal test should be used,
whether the sample is large or small. Students learnt this in S2.
What if the population does not have a Normal distribution?
If the sample size is large then the Central Limit Theorem implies that the sample mean will
be approximately Normally distributed. The Normal test can be used, either with known or
estimated variance.
How do we know whether the population has a Normal distribution?
There are techniques for testing whether the underlying population is Normal. For example,
using Normal probability paper or using a Kolmogorov-Smirnov test or a χ 2 goodness of fit
test. Only the χ 2 goodness of fit test is in the S3 syllabus. However, candidates may not
have the data needed to use this test in an examination question. So, when candidates need to
decide what test to use, the examiners will indicate whether the variable can be modelled by
a Normal distribution.
Is it OK to use a t test if the sample is large, the variance is unknown and the population
can be modelled by a Normal distribution?
If the population is not exactly Normal (and, in real life, it rarely is), then it is better to use a
Normal test for a large sample than a t test. For that reason, candidates should always use
Normal tests for large samples, even though the tables available in examinations give
percentage points of the t distribution for n = 50 and n = 100 . As can be seen from the
specification (S3I7 page 163), the same rules apply to confidence intervals.
t distribution questions
1. Tick the correct statements.
The tails of a t distribution are wider than for the standard Normal distribution.
There is a higher probability of getting a value within one standard deviation of the
mean for a t distribution than for a standard Normal.
You should always use the t distribution when you don’t know the population
variance.
2. A t distribution with which of the following degrees of freedom is closest to a standard
Normal distribution?
0
1
5
10
3. A researcher wants to know whether the Silver Streak javelin generally travels further
than the Golden Arrow javelin. She asks a random sample of athletes to throw both javelins
and measures the distance travelled. She subtracts the Golden Arrow distance from the
Silver Streak distance and will do a hypothesis test with null hypothesis difference = 0. Tick
the important conditions that would indicate she should use a t test.
The distances for the Silver Streak are Normally distributed.
The distances for the Golden Arrow are Normally distributed.
The differences in distance are Normally distributed.
She has a large sample of distances.
There is no correlation between the distance that someone can throw the Silver
Streak and the distance he can throw the Golden Arrow.
Are there any other important conditions not listed above?
Download