Teaching activities towards Achievement Standard 2.9 Use statistical methods to make an inference. Lindsay Smith, University of Auckland Stats Day 2011 What is new/changed? • Use of exploratory data analysis. • Statistical inference comparing two populations. • Informal confidence intervals for population medians. • Sampling variability. • Using relevant contextual knowledge (given). Lindsay Smith, University of Auckland Stats Day 2011 Historical development • Replaces making an inference about a single population • Extends development of the curriculum material developed by Chris Wild and his team at Auckland University Lindsay Smith, University of Auckland Stats Day 2011 Approaches The approach you take will depend on • Course offered (with maths or without) • Time allowed for the topic • Incorporating Stat Lit (reports) material • Background of students • Access to ICT Lindsay Smith, University of Auckland Stats Day 2011 Key ideas 1 Sampling Variability • Every sample contains sampling error due to the sampling process • Other errors, non-sampling errors, may be present due to the sampling method applied (bias) • Developing an understanding that confidence in the estimate will vary depending on factors such as sample size, sampling method, the nature of the underlying population, sources of bias. • Experiencing evidence for the central limit theorem by simulating samples and comparing the distribution of sample medians for samples of different sizes. Lindsay Smith, University of Auckland Stats Day 2011 Sample statistics Sample statistic: median of sample (known) sample population Population parameter: median (or other statistic) of whole population (unknown) Lindsay Smith, University of Auckland Stats Day 2011 Key ideas 2 Using the Level 7 guideline for constructing informal confidence intervals for the population medians • Informal development of the formula Lindsay Smith, University of Auckland Stats Day 2011 Key ideas 3 Statistical literacy • Using correct vocabulary: estimate, point estimate, parameter, sample • Developing critical thinking with respect to the media involving sampling to make an inference • Applying the PPDAC cycle Lindsay Smith, University of Auckland Stats Day 2011 Introduction http://www.nzherald.co.nz/ Lindsay Smith, University of Auckland Stats Day 2011 Possible data sets • Stats NZ: Surf (synthetic unit record files 2003) • Census at School: school survey data, Kiwi data, • http://seniorsecondary.tki.org.nz/Mathematics-andstatistics/Achievement-objectives/AO-S7-1 • Kiwi Kapers 1: explores the justification for using a sample to make an inference and sampling variation • Kiwi Kapers 2: explores the effect of sample size so that we can have confidence in our estimate • Sampling stuff: explores sampling methods to ensure the sample is representative: stratified sampling Lindsay Smith, University of Auckland Stats Day 2011 Using the Stage 1 data set • Note sampling variability • Not the effect of increasing the sample size Lindsay Smith, University of Auckland Stats Day 2011 Showing the interval for the sample medians Box Plot Sample of Stage 1 Statistics Students 40 50 60 70 Sample of Stage 1 Statistics Students 40 50 60 70 50 60 70 50 60 70 50 60 80 90 80 90 80 90 70 80 90 110 Box Plot 100 110 Box Plot 100 110 Box Plot 100 WEIGHTACTUAL Sample of Stage 1 Statistics Students 40 100 WEIGHTACTUAL Sample of Stage 1 Statistics Students 40 90 WEIGHTACTUAL Sample of Stage 1 Statistics Students 40 80 WEIGHTACTUAL 110 Box Plot 100 http://www.censusatschool.org.nz/2009/informal-inference/WPRH/ 110 WEIGHTACTUAL Lindsay Smith, University of Auckland Stats Day 2011 Observing sampling variability http://www.censusatschool.org.nz/2009/informal-inference/WPRH/ Investigation What is the weight of schoolbags carried by year 12 males? http://www.censusatschool.org.nz/resources/data-analysis-tools/ Lindsay Smith, University of Auckland Stats Day 2011 Collections of medians Dot Plot Measures from Sample size 15 40 50 60 70 80 m edian 90 50 60 70 80 m edian 110 40 50 60 70 80 m edian Dot Plot Measures from Sample size 60 40 100 90 100 Dot Plot Medians from 200 samples of size 30 110 Lindsay Smith, University of Auckland Stats Day 2011 90 100 110 What else might affect the uncertainty in estimating the population median? • The spread of the population • Comparing the heights of intermediate school (years 7 and 8) and the heights of junior high school students (years 7 to 10) Lindsay Smith, University of Auckland Stats Day 2011 Sampling variability: effect of spread Dot Plot Intermediate 100 120 140 160 180 200 120 height 120 140 160 140 160 200 180 Box Plot Sample of Middle School 120 200 140 160 180 200 height height Box Plot Sample of Middle School Box Plot Sample of Intermediate 120 120 180 height Box Plot Sample of Intermediate 100 Dot Plot Middle School 140 160 height 180 200 Lindsay Smith, University of Auckland Stats Day 2011 140 160 height 180 200 Estimating the spread of the population • Best estimate: using the IQR of our sample • Using the quartiles of our sample as point estimates for the quartiles of the population Lindsay Smith, University of Auckland Stats Day 2011 Providing an interval estimate (a confidence interval) for the population median There are two factors which affect the uncertainty of estimating the parameter: 1. Sample size 2. Spread of population, estimated with sample IQR • How confident do we want to be that our interval estimate contains the true population median? Lindsay Smith, University of Auckland Stats Day 2011 Development of formula for confidence interval population median = sample median ± measure of spread √sample size To ensure we predict the population median population median = sample median ± 90% of the time 1.5 measure of spread √sample size population median = sample median ± 1.5 x IQR √n Lindsay Smith, University of Auckland Stats Day 2011 Justification for the calculation Based on simulations, • The interval includes the true population median for 9 out of 10 samples - the population median is probably in the interval somewhere • This leads to being able to make a claim about the populations when they do not overlap • Sampling variation only produces a shift large enough to make a mistaken claim about once in 40 pairs of samples Lindsay Smith, University of Auckland Stats Day 2011 Comparing two populations • Sampling variation is always present and will cause a shift in the medians • We are looking for sufficient evidence, a big enough shift in the intervals for the median to be able to make a claim that there is a difference back in the populations Lindsay Smith, University of Auckland Stats Day 2011 Technical aside When the calculated intervals do not overlap a confidence interval for the difference in the population medians ranges from the smaller distance between the intervals to the larger distance between the intervals. Lindsay Smith, University of Auckland Stats Day 2011 Retinal image • http://www.censusatschool.org.nz/2009/infor mal-inference/WPRH/ • Scroll down to two populations Lindsay Smith, University of Auckland Stats Day 2011 Observing sampling variability when comparing two populations Question to explore: Do students who have a facebook account tend to have newer cellphones than those who do not? http://www.censusatschool.org.nz/2010/data-viewer/ Lindsay Smith, University of Auckland Stats Day 2011 Applying the PPDAC cycle Question to explore: Is the average schoolbag weight carried by boys as they start secondary school more than the average weight carried by boys as they start intermediate school? • http://www.censusatschool.org.nz/2010/data-viewer/ Lindsay Smith, University of Auckland Stats Day 2011