YEAR 12 STATISTICS

advertisement
YEAR 12 STATISTICS
S 7-1 Carry out investigations of phenomena, using the statistical inquiry cycle:
A – conducting surveys that require random sampling techniques and using existing data sets
B – evaluating the choice of measures for variables and the sampling and data collection methods used
C – using relevant contextual knowledge, exploratory data analysis, and statistical inference
Uses the statistical inquiry cycle to conduct surveys and to analyse existing data sets.
Conducts surveys to find solutions to problems (or uses existing data sets):
o Poses survey questions, considering sources of variation, for example, what are the variables to be collected, how each variable will be
measured.
o Designs, trials, and improves questionnaires using a range of appropriate questions types, checking the survey questions using, for
example, desk review, conducting pilot surveys.
o Selects and uses appropriate sampling methods, for example, simple random, systematic, stratified, cluster, and quota.
o Evaluates sampling method used, for example, is a sample sufficiently large, randomly chosen, and representative of the population.
o Collects and manages data.
o Uses exploratory data analysis to explore features of the data:
 Uses appropriate statistical plots and tables to explore the data and communicates relevant detail and overall distributions.
 Uses appropriate measures to communicate features of the data.
Uses relevant contextual knowledge when communicating findings.
Makes statistical inferences.
Communicates findings in a report which includes:
o relevant summary statistics, graphs and tables to support the findings of the survey
o quantitative and qualitative statements
o statistical inferences
o justified conclusions.
o
o
o





S 7-2 Make inferences from surveys and experiments:
o
o
B – using sample statistics to make point estimates of population parameters
C – recognising the effect of sample size on the variability of an estimate
B. Using sample statistics to make point estimates of population parameters.
document1
Page 1

Understands that the sample statistics can be used as point estimates of the population parameters, for example, sample medians and IQRs can
be used as point estimates for population medians and IQRs, or sample proportions for population proportions when using categorical data
C. Recognising the effect of sample size on the variability of an estimate:

Within the context of an investigation and statistical plots of observed data:
o Find informal confidence intervals for population medians
o Plots sample data showing informal confidence intervals (median ± 1.5 IQR / √n) on boxplots.
o Uses an informal confidence interval to make an inference about the population median from sample data plot.
o Makes a claim about whether one group has larger values than another group using informal confidence intervals for the population
medians.
o Explains the connections among sample, population, sampling variability, sample size effect, informal confidence interval, and degree
of confidence.
S 7-3 Evaluate statistically based reports:
o
B – identifying sampling and possible non-sampling errors in surveys, including polls
B. Identifying sampling and possible non-sampling errors in surveys, including polls:
 In a media report on a survey or a poll, identifies sampling error and explains the connection among sample, population, sampling variability,
and sampling error.
 In a media report on a survey or a poll, identifies and evaluates, using critical questions (look under the heading critical questions in the work
doc below), sampling methods and possible non-sampling errors such as self-selection, non response bias, behavioural considerations.
Exploratory data analysis notes



Exploratory data analysis starts with multivariate data. Investigative questions that can be asked of the data should be posed: such as
o wondering whether there is a connection between two variables,
o wondering whether other variables should be taken into account when possible patterns are observed,
o exploring multiple representations of the data into order to unlock the stories in the sample data.
Technology such as a graphics calculator can draw a modified box plot, which shows whether extreme data values are outliers. Outliers are not
simply the greatest or least data values. Outliers are more than 1.5 times the standard deviation above the upper quartile or below the lower
quartile.
If the sample box plot is approximately symmetrical and has no outliers it can be assumed the population has a similar distribution.
document1
Page 2



If the sample data is skewed, then the median will be more reliable than the mean as an estimate of the population central value. However, if the
distribution of the sample data is skewed this does not imply that the population is skewed. The skewness may be an artefact of sampling
variability.
A statistical estimate is not a guess but an inference or prediction of the true population parameter based on sample statistics. The sample
median is used to infer (used as a point estimate of) the population median. Similarly the sample mean, quartiles, standard deviation can be used
as estimates of the corresponding population parameters. A sample proportion can be used to estimate a population proportion, for example, the
fraction or percentage of students who travel more than 30 minutes to and from school each day.
Evaluation of sampling and data collection methods must be based on identifying features of good sample design or good experimental design.
Appropriate considerations are those that would make the inference more reliable/less variable:
o such as further (described) strata,
o repeated sampling and averaging statistics,
o context factors
o relative size of the mean and standard deviation ie if the standard deviation is small in relation to the mean, then the population is likely
to be closely spread about the population mean.
o If the sample contains at least 30 items, it may be trivial at Level 7 to suggest a larger sample would improve the inference of a
measurement.
Measure



An amount or quantity that is determined by measurement or calculation. The term ‘measure’ is used in two different ways in the curriculum.
One use is in the terms measure of centre, measure of spread, and measure of proportion, where these measures are calculated quantities that
represent characteristics of a distribution. The use of ‘using displays and measures’ in the level 6 (statistical investigation thread) achievement
objective is a reference to measures of centre, spread, and proportion.
The other use applies to a statistical investigation. The investigator decides on a subject of interest and then decides the aspects of it that can be
observed. These aspects are the ‘measures’.
Example
o
An investigator decides that ‘well-being’ is a subject of interest and chooses ‘happiness’ to be one aspect of well-being. Happiness
could be measured by the variable ‘the average number of times a person laughs in a day’.
Non-sampling error
One of the two reasons for the difference between an estimate (from a sample) and the true value of a population parameter; the other reason being the
error caused because data are collected from a sample rather than the whole population (sampling error). Non-sampling errors have the potential to
cause bias in surveys or samples.
document1
Page 3
There are many types of non-sampling errors, and the names used for them are not consistent.
Some examples of non-sampling errors are:







The sampling process is such that a specific group is excluded or under-represented in the sample, deliberately or inadvertently. If the excluded
or under-represented group is different, with respect to survey issues, then bias will occur.
The sampling process allows individuals to select themselves. Individuals with strong opinions or those with substantial knowledge will tend to
be over-represented, creating bias.
Bias will occur if people who refuse to answer have different views of the survey issues from those who respond. This can also happen with
people who are never contacted and people who have yet to make up their minds.
If the response rate (the proportion of the sample that takes part in a survey) is low, bias can occur because respondents may tend consistently to
have views that are more extreme than those of the population in general.
The wording of questions, the order in which they are asked, and the number and type of options offered can influence survey results.
Answers given by respondents do not always reflect their true beliefs because they may feel under social pressure not to give an unpopular or
socially undesirable answer.
Answers given by respondents may be influenced by the desire to impress an interviewer.
Sampling error



The error caused because data are collected from part of a population rather than the whole population.
An estimate of a population parameter, such as a sample median or sample proportion, is different for different samples (of the same size)
taken from the population. Sampling error is one of two reasons for the difference between an estimate and the true, but unknown, value of the
population parameter. The other reason is non-sampling error.
The error for a given sample is unknown, but when sampling is random, the size of the sampling error can be estimated by calculating the
margin of error.
Sampling variation


The variation in a sample statistic from sample to sample.
Suppose a sample is taken and a sample statistic, such as a sample median, is found. If a second sample of the same size is taken from the same
population, it is almost certain that the sample median found from this sample will be different from that found from the first sample. If further
sample medians are found, by repeatedly taking samples of the same size from the same population, then the differences in these sample
medians illustrate sampling variation.
document1
Page 4
Sample size



The number of objects, individuals, or values in a sample.
Typically, a larger sample size leads to an increase in the precision of a statistic as an estimate of a population parameter.
The most common symbol for sample size is n.
Sampling Notes





Reasons for sampling include time and cost considerations, lack of access to the entire population and the nature of the data collection or test,
for example, blood test does not require all blood to be taken, testing breaking strain of fishing line destroys the line.
Features of a good sampling technique include the sample is sufficiently large, randomly chosen and representative of the population.
Sample size affects the variability of an inference. If a sample is too small, it is more likely to be unusual and less likely to be representative. As
the Central Limit Theorem for sample means (a level 8 objective) applies to samples of at least 30 items, random samples of this size are
acceptable. There is no statistical requirement that a sample be a proportion of the population. For an inference of a population proportion,
however, a much larger sample size is needed, at least 250. This size comes from margin of error considerations (a level 8 objective) but at
level 7 an intuitive understanding is sufficient.
Randomised sampling techniques include simple random, systematic, stratified, cluster, and quota.
It is important to identify the positive features of each method and be able to carry out each method correctly in order for the sample to be as
representative as possible. Students must be able to provide evidence they have carried out their chosen sampling methods correctly. If a sample
is randomly chosen then it is representative of the population.
Sources of variation





The reasons for differences seen in the values of a variable. Some of these reasons are summarised in the following paragraphs.
Variation is present everywhere and is in everything. When the same variable is measured for different individuals, there will be differences in
the measurements, simply due to the fact that individuals are different. This can be thought of as individual-to-individual variation and is
often described as natural or real variation.
Repeated measurements on the same individual may vary because of changes in the variable being measured. For example, an individual’s
blood pressure is not exactly the same throughout the day. This can be thought of as occasion-to-occasion variation.
Repeated measurements on the same individual may vary because of some unreliability in the measurement device, such as a slightly different
placement of a ruler when measuring. This is often described as measurement variation.
The difference in measurements of the same quantity for different individuals, apart from natural variation, could be due to the effect of one or
more other factors. For example, the difference in growth of two tomato plants from the same packet of seeds planted in two different places
could be due to differences in the growing conditions at those places, such as soil fertility or exposure to sun or wind. Even if the two seeds
document1
Page 5

were planted in the same garden, there could be differences in the growth of the plants due to differences in soil conditions within the garden.
This is often described as induced variation.
Variation occurs in all sampling situations. Suppose a sample is taken and a sample statistic, such as a sample median, is found. If a second
sample of the same size is taken from the same population, it is almost certain that the sample median found from this sample will be different
from that found from the first sample. If further sample medians are found, by repeatedly taking samples of the same size from the same
population, then the differences in these sample medians illustrate sampling variation.
Statistical inference

The process of drawing conclusions about population parameters based on a sample taken from the population.
Example 1
o
Using a sample median calculated from a random sample taken from a population to estimate the population median is an example of
statistical inference.
Example 2
o
document1
Using data from a random sample taken from a population to obtain a 95% confidence interval for the population proportion is an
example of statistical inference.
Page 6
Activities
big ideas
Rationale
 different samples of the same size give
different pictures of proportions
 showing that small samples for
categorical data are unreliable because of
the large variation between samples
Ideas of proportion are needed in year 13
Working with categorical data – sampling variability and sample size
Sampling errors - Variability of sample proportion
Using populations of counters to draw samples of size 20 and observe
the variability in outcomes. (some notes in introduction to making an
inference about a population)
to tag or not to tag http://www.censusatschool.org.nz/classroomactivities/to-tag-or-not-to-tag/
margin of error (stats teachers day 2006) Marina and Anne
http://www.stat.auckland.ac.nz/~teachers/index6.php
Sampling errors - Increasing sample size improves the reliability of a
point estimate
What size sample do we need to take to get a reliable proportion?
 Categorical data needs big samples sizes
to be able to make a reliable estimate for
the population proportions
Links to probability objectives from level 6,
starting to make the connections
It’s a new starting point for thinking about
inference
Understanding of polls and their reliability
as predictors is an essential tool for
citizenship
The media article selected should be a good
hook for student engagement
Statistical literacy link – sampling and non-sampling errors
Working with measurement/numerical data – sampling variability and
sample size
Kiwi Kapers 1 http://seniorsecondary.tki.org.nz/Mathematics-andstatistics/Achievement-objectives/AO-S7-1
Introducing sampling variability through repeated sampling from the
same population using sample sizes 15 and 30. Looking for patterns
across repeated samples of the same size.


Five samples of size 30 for each pair/group of students (keep a
record of these 5 sample medians for Kiwi Kapers 3)
5 samples of 15 and 5 samples of 30 needed for Kiwi Kapers 2
document1
 different samples of the same size give
different pictures of the distribution of
weights of kiwi
 repeated samples of size 15 show more
variation than repeated samples of size 30
 at this stage using the multiple samples to
get a possible interval for the population
median (early ideas that will be built on
later about getting an interval estimate of
the population median from a single
sample)
Page 7
building on level 5 and 6 understandings of
the PPDAC cycle and sampling variability,
introducing the new data set that will be
used throughout following lessons as
statistical concepts are built
Activities
big ideas
Rationale
Kiwi Kapers 2
http://seniorsecondary.tki.org.nz/Mathematics-andstatistics/Achievement-objectives/AO-S7-1
Exploring different sample sizes and mutually agreeing to a sensible size
based on smaller variation of sample medians combined with practical
considerations such as the efficient use of time and resources. (Catching
kiwis to collect the data)
 deciding on the smallest sample size that
will give us a reliable estimate of the
population median
Students need to have the opportunity to
see how different sample sizes produce
different variation patterns for the sample
median. It is important that they engage in
the debate as to what makes a good sample
size.
This activity is built around using Fathom to generate multiple samples,
but this could easily be done using the data viewer
http://www.censusatschool.org.nz/2010/data-viewer/ on
CensusAtSchool to collect sample medians for samples of size 50 and
100. (Sample medians for sample size 15 and 30 can be used from the
Kiwi Kapers 1 activity).
 Under PLAN select the Kiwi Kapers 2008 population and select total
sample size of 50 and then 100 (ignore the “I am a year 12 student” at
this stage).
 Select get my sample (this provides a sample which shows in DATA).
 Under ANALYSIS, select variable 1 as weight and then tick “Add
summaries”. Now click “Do Analysis”. Record median.
 Click back in the sample size box and hit enter – generates a new
sample.
 Repeat ANALYSIS, select variable 1 as weight and then tick “Add
summaries”. Now click “Do Analysis”. Record median.
 Continue until X sample medians have been found.
 Collate class results.
Making inferences in summary situations
document1
Page 8
Activities
big ideas
Rationale
Kiwi Kapers 3 (see draft ideas in word document, kiwikapers3 NOTE
kiwikaper3v1 has the n=30 included, but would expect students to use
their own data as in kiwikapers3, v1 is to give the sense of what is
happening) Note: make a better student sheet. Thinking A4 landscape,
put the popn and n=400 and then make big IQR picture, leave space for
n=30, decent scale.
 developing the n idea for the informal
confidence interval for the population
median
1.5XIQR – activity to be written up, building to using 1.5XIQR/n
Note: take fathom output and put into word doc. Make 1, 1.5 and 2
separate. teaching notes to support this need to really spell out how you
collected the measures. EIS-T, this is to be done.
Some activities to use the whole cycle and practice making informal
interval estimates for the median. Note: This would include describing
the interval in words.
 Using the entire PPDAC cycle and
making an inference about the population
parameter in a summary situation
Money spent at canteen (see draft activity attached PPDAC L7
evaluating process- summary inference)
Do NZ teenagers get enough sleep?
 Starting to evaluate the PPDAC cycle
with a particular focus on the sampling
method.
Sampling
Sampling stuff 1 http://seniorsecondary.tki.org.nz/Mathematics-andstatistics/Achievement-objectives/AO-S7-1
Sampling stuff 2 http://seniorsecondary.tki.org.nz/Mathematics-andstatistics/Achievement-objectives/AO-S7-1
Check re: SURFs.
Making inferences in comparison situations
document1
Page 9
Activities
big ideas
Rationale
Comparing boys and girls sleep (see draft activity attached PPDAC L7
evaluating process - comparative inference activity)
Note: could come back to kiwi kapers, they can use C@S and use the
data viewer, tick year 12 student. Important to get the interval onto the
graph, leading into year 13.
document1
Page 10
Download