Making a statistical inference at Level 7

advertisement
Teaching activities towards
Achievement Standard 2.9
Use statistical methods to make an
inference.
Lindsay Smith, University of Auckland Stats Day 2011
What is new/changed?
• Use of exploratory data analysis.
• Statistical inference comparing two
populations.
• Informal confidence intervals for population
medians.
• Sampling variability.
• Using relevant contextual knowledge (given).
Lindsay Smith, University of Auckland Stats Day 2011
Historical development
• Replaces making an inference about a single
population
• Extends development of the curriculum
material developed by Chris Wild and his team
at Auckland University
Lindsay Smith, University of Auckland Stats
Day 2011
Approaches
The approach you take will depend on
• Course offered (with maths or without)
• Time allowed for the topic
• Incorporating Stat Lit (reports) material
• Background of students
• Access to ICT
Lindsay Smith, University of Auckland Stats
Day 2011
Key ideas 1
Sampling Variability
• Every sample contains sampling error due to the
sampling process
• Other errors, non-sampling errors, may be present due
to the sampling method applied (bias)
• Developing an understanding that confidence in the
estimate will vary depending on factors such as sample
size, sampling method, the nature of the underlying
population, sources of bias.
• Experiencing evidence for the central limit theorem by
simulating samples and comparing the distribution of
sample medians for samples of different sizes.
Lindsay Smith, University of Auckland Stats
Day 2011
Sample statistics
Sample statistic:
median of sample
(known)
sample
population
Population parameter:
median (or other statistic) of
whole population (unknown)
Lindsay Smith, University of Auckland Stats
Day 2011
Key ideas 2
Using the Level 7 guideline for constructing
informal confidence intervals for the
population medians
• Informal development of the formula
Lindsay Smith, University of Auckland Stats
Day 2011
Key ideas 3
Statistical literacy
• Using correct vocabulary: estimate, point
estimate, parameter, sample
• Developing critical thinking with respect to the
media involving sampling to make an
inference
• Applying the PPDAC cycle
Lindsay Smith, University of Auckland Stats
Day 2011
Introduction
http://www.nzherald.co.nz/
Lindsay Smith, University of Auckland Stats
Day 2011
Possible data sets
• Stats NZ: Surf (synthetic unit record files 2003)
• Census at School: school survey data, Kiwi data,
• http://seniorsecondary.tki.org.nz/Mathematics-andstatistics/Achievement-objectives/AO-S7-1
• Kiwi Kapers 1: explores the justification for using a
sample to make an inference and sampling variation
• Kiwi Kapers 2: explores the effect of sample size so that
we can have confidence in our estimate
• Sampling stuff: explores sampling methods to ensure
the sample is representative: stratified sampling
Lindsay Smith, University of Auckland Stats
Day 2011
Using the Stage 1 data set
• Note sampling variability
• Not the effect of increasing the sample size
Lindsay Smith, University of Auckland Stats
Day 2011
Showing the interval for the sample medians
Box Plot
Sample of Stage 1 Statistics Students
40
50
60
70
Sample of Stage 1 Statistics Students
40
50
60
70
50
60
70
50
60
70
50
60
80
90
80
90
80
90
70
80
90
110
Box Plot
100
110
Box Plot
100
110
Box Plot
100
WEIGHTACTUAL
Sample of Stage 1 Statistics Students
40
100
WEIGHTACTUAL
Sample of Stage 1 Statistics Students
40
90
WEIGHTACTUAL
Sample of Stage 1 Statistics Students
40
80
WEIGHTACTUAL
110
Box Plot
100
http://www.censusatschool.org.nz/2009/informal-inference/WPRH/
110
WEIGHTACTUAL
Lindsay Smith, University of Auckland Stats
Day 2011
Observing sampling variability
http://www.censusatschool.org.nz/2009/informal-inference/WPRH/
Investigation
What is the weight of schoolbags carried by
year 12 males?
http://www.censusatschool.org.nz/resources/data-analysis-tools/
Lindsay Smith, University of Auckland Stats
Day 2011
Collections of medians
Dot Plot
Measures from Sample size 15
40
50
60
70
80
m edian
90
50
60
70
80
m edian
110
40
50
60
70
80
m edian
Dot Plot
Measures from Sample size 60
40
100
90
100
Dot Plot
Medians from 200 samples of size 30
110
Lindsay Smith, University of Auckland Stats
Day 2011
90
100
110
What else might affect the uncertainty
in estimating the population median?
• The spread of the population
• Comparing the heights of intermediate school
(years 7 and 8) and the heights of junior high
school students (years 7 to 10)
Lindsay Smith, University of Auckland Stats
Day 2011
Sampling variability: effect of spread
Dot Plot
Intermediate
100
120
140
160
180
200
120
height
120
140
160
140
160
200
180
Box Plot
Sample of Middle School
120
200
140
160
180
200
height
height
Box Plot
Sample of Middle School
Box Plot
Sample of Intermediate
120
120
180
height
Box Plot
Sample of Intermediate
100
Dot Plot
Middle School
140
160
height
180
200
Lindsay Smith, University of Auckland Stats
Day 2011
140
160
height
180
200
Estimating the spread of the
population
• Best estimate: using the IQR of our sample
• Using the quartiles of our sample as point
estimates for the quartiles of the population
Lindsay Smith, University of Auckland Stats
Day 2011
Providing an interval estimate (a
confidence interval) for the population
median
There are two factors which affect the uncertainty of
estimating the parameter:
1. Sample size
2. Spread of population, estimated with sample IQR
•
How confident do we want to be that our interval
estimate contains the true population median?
Lindsay Smith, University of Auckland Stats
Day 2011
Development of formula for
confidence interval
population median = sample median ± measure of spread
√sample size
To ensure we predict the population median
population median = sample median ±
90% of the time
1.5 measure of spread
√sample size
population median = sample median ± 1.5 x IQR
√n
Lindsay Smith, University of Auckland Stats
Day 2011
Justification for the calculation
Based on simulations,
• The interval includes the true population median
for 9 out of 10 samples - the population median
is probably in the interval somewhere
• This leads to being able to make a claim about
the populations when they do not overlap
• Sampling variation only produces a shift large
enough to make a mistaken claim about once in
40 pairs of samples
Lindsay Smith, University of Auckland Stats
Day 2011
Comparing two populations
• Sampling variation is always present and will
cause a shift in the medians
• We are looking for sufficient evidence, a big
enough shift in the intervals for the median to
be able to make a claim that there is a
difference back in the populations
Lindsay Smith, University of Auckland Stats
Day 2011
Technical aside
When the calculated intervals do not overlap a
confidence interval for the difference in the
population medians ranges from the smaller
distance between the intervals to the larger
distance between the intervals.
Lindsay Smith, University of Auckland Stats
Day 2011
Retinal image
• http://www.censusatschool.org.nz/2009/infor
mal-inference/WPRH/
• Scroll down to two populations
Lindsay Smith, University of Auckland Stats
Day 2011
Observing sampling variability when
comparing two populations
Question to explore:
Do students who have a facebook account
tend to have newer cellphones than those
who do not?
http://www.censusatschool.org.nz/2010/data-viewer/
Lindsay Smith, University of Auckland Stats
Day 2011
Applying the PPDAC cycle
Question to explore:
Is the average schoolbag weight carried by boys as
they start secondary school more than the average
weight carried by boys as they start intermediate
school?
• http://www.censusatschool.org.nz/2010/data-viewer/
Lindsay Smith, University of Auckland Stats
Day 2011
Download