Algebra 1 Summer Institute 2014 The Sampling Proclivity Summary

advertisement
Algebra 1 Summer Institute 2014
The Sampling Proclivity
Summary
Goals
 Understand what is
average by simple
random sampling
 Recognize the variability
in subjective and random
sampling techniques and
understand how to
measure it
 Recognize how bias can
occur in the sampling
process
 Draw conclusions about
the effect of sample size
on statistical information
 Explore the sampling
distributions of stratified
random samples and
clustered samples
Technology
Source
How we select a sample is
extremely important.
Improper or biased sample
selection can produce
misleading conclusions.
Sample selection is biased if
it systematically favors
certain outcomes.
Materials
Poster paper
Markers
Participant Handouts
LCD Projector
Facilitator Laptop
Excel
1. Random Rectangles
2. Sampling Rectangles
3. Distribution of Sample
Average
4. Sample Size
5. Data Record
6. Sampling Method
Navigating through
Data Analysis in
Grades 9-12
Estimated Time
90 minutes
Mathematics Standards
Common Core State Standards for Mathematics
MAFS.6.SP.1: Develop understanding of statistical variability
1.1: Recognize that a statistics question as one that anticipates variability in the
data related to the question and accounts for it in the answers.
MAFS.6.SP.2: Summarize and describe distributions
2.5: Summarize numerical data sets in relation to their context, such as by:
a. Reporting the number of observations
b. Describing the nature of the attribute under investigation, including how it
was measured and its units of measurement.
MAFS.912.S-ID.1: Summarize, represent, and interpret data on a single count or
measurement variable
1.3: Interpret differences in shape, center, and spread in the context of the data sets,
accounting for possible effects of extreme points (outliers).
Standards for Mathematical Practice
1
Algebra 1 Summer Institute 2014
1. Make sense of problems and persevere in solving them
2. Reason abstractly and quantitatively
3. Construct viable arguments and critique the reasoning of others
4. Model with mathematics
5. Use appropriate tools strategically
Instructional Plan
In data analysis, we use graphs, tables, and numerical summaries to study the variation
present in our data. Often, we want to extend our interpretation to a larger group beyond
the particular group studied. Such generalizations are only valid, however, if the data we
examine are representative of that larger group. If not, our interpretation may
misrepresent the larger group!
The entire group that we want information about is called the population. We can gain
information about this group by examining a portion of the population, called a sample.
To gain useful information, the sample must be representative of the population. A
representative sample is one in which the relevant characteristics of the sample members
are generally the same as the characteristics of the population.
There are several good reasons that we use samples to study populations; chief among
them are feasibility and cost. For instance, in a nationwide political survey of the
population of all voters in the United States, it would be difficult, if not impossible, to
poll every voter. It would also be quite expensive. Statistical theory shows that a survey
of a 1,000 carefully selected voters suffices to represent the opinions of the millions of
people in the population of voters.
Another problem in answering questions about a population arises when we want to
inspect or test products. For example, testing an air bag to see if it works properly
averages that we have to destroy it. We certainly can't test every air bag, but testing a
carefully selected sample of air bags will tell us what we need to know about all the air
bags in the population.
How we select a sample is extremely important. Improper or biased sample selection can
produce misleading conclusions. Sample selection is biased if it systematically favors
certain outcomes. If we select only Democrats to participate in a political survey, the
outcome will reflect Democrats' opinions, but not other political parties'. If we personally
select a sample of students we know and like for a school survey, we have just eliminated
the differing opinions of those whom we do not know and like. We need to select our
sample in an unbiased fashion.
Random sampling is a way to remove bias in sample selection. For example, to pick a
random sample of 20 people out of a population of a 1,000, you might put all 1,000
names in a hat, then draw 20 of them. Random sampling attempts to reduce bias in
2
Algebra 1 Summer Institute 2014
sample selection, since every member of the population has an equal chance of being
selected. There are different random sampling methods and some are simple random
sampling, stratified sampling, and cluster sampling.
Simple random sampling is the basic sampling technique where we select a group of
subjects (a sample) for study from a larger group (a population). Each individual is
chosen entirely by chance and each member of the population has an equal chance of
being included in the sample. Every possible sample of a given size has the same chance
of selection; i.e. each member of the population is equally likely to be chosen at any stage
in the sampling process. There may often be factors which divide up the population into
sub-populations (groups / strata) and we may expect the measurement of interest to vary
among the different sub-populations. This has to be accounted for when we select a
sample from the population in order that we obtain a sample that is representative of the
population. This is achieved by stratified sampling. A stratified sample is obtained by
taking samples from each stratum or sub-group of a population. (Slides 2, 3, 4)
The “Random Rectangles” handout shows the population of rectangles that participants
will use for the following activities. Each small square in the figure represents an area of
1 square unit.
NOTE: in this activity we refer to the measures of center as “average” since the term
mean will be formally introduced in later chapters.
1. Distribute the handouts to all participants of “Sampling Rectangles” activity pages
and “Random Rectangles” which will be used with all the activities in this
session. The activity directs students to select 5 rectangles that they think are
“typical” of the group of rectangles on the page. Give them a limited time –
maybe 15 seconds – to choose their 5 typical rectangles and to record the areas.
2. Next, participants find the average area of the rectangles that they have chosen as
typical. (Slide 5)
3. Participants choose five more rectangles, this time by generating random numbers
between 1 and 100 and selecting the rectangles that correspond to those numbers.
(The random numbers could be generated by Excel by using the command
“=RANDBETWEEN(1,100)”). They record their data for this “random” sample
of rectangles, again calculating the average for the five rectangles’ areas. (Slide 6)
4. Now the participants are ready to pool their results. They could make a dot plot of
the two distributions – one for the average of the areas of the randomly chosen
rectangles and the other for the average of the areas of the subjectively chosen
rectangles. Participants could use the “Distribution of Sample Average Areas” to
make the dot plot. Display the sampling distributions. Ask participants to record
a measure of center (decide together if the average is going to be used) and a
3
Algebra 1 Summer Institute 2014
measure of spread (range) for both distributions in order to compare the two
methods of choosing a sample of rectangles.
The activity Sampling Rectangles can help participants begin to see the difference
between a sample chosen by using one’s judgment and one chosen at random.
Randomization
People often think that they can make good judgments about what is typical. This
activity focuses on what randomization brings to the sampling process. This
activity highlights three facts:
i)
ii)
iii)
Subjectively chosen samples may be biased
True random samples are in fact carefully chosen to ensure that every set
of a specified size consisting of individuals in the population under
consideration is equally likely to be selected.
Though an individual event is not predictable, there is an underlying
pattern in long-term random behavior.
5. The population in this activity is the given set of rectangles. Because the entire
population is known (which is not usually the case), the population average – the
parameter – of all the rectangles’ areas can also be known. It can readily be
determined to be 7.42 square units. Draw a vertical line on the display of the
distributions through 7.42 on each plot to illustrate how the distributions from the
two sampling techniques vary with respect to the actual population average.
How do the two sample averages compare to 7.42? If using the median instead of
the average, the population median is 6. (Slide 7)
Sample size
In this activity the 100 rectangles will still be used. Here participants investigate sample
sizes of 10.
6. After participants have worked individually to find the sample average area of one
sample of 10 rectangles, using random numbers, they can pool their data and
work as a class to plot the simulated sampling distribution of the sample average
areas for the new sample size. Compare the simulated sample average
distributions of size 5 and 10 by using the dot plots. They could also use graphing
software that allows results to be displayed. Discuss with the whole group the
following questions:
a. Is the distribution of the sample size 10 more like a bell shape?
b. What distribution shows a center closer to the population average for all
100 rectangles of 7.42?
c. What distribution shows a wider spread?
4
Algebra 1 Summer Institute 2014
d. In general, what can a sample of 5 or 10 individuals tell about a particular
population?
e. How many individuals should be sampled?
Formal statistical inference explores how confident one can be in reasoning about
a population from a sample. Participants’ work can communicate to them the
important message that as the sample size increases, the variability of the
sampling distribution for a sample statistics decreases. (Slide 8)
Other Sampling Methods
Stratified Random Sampling
In this method the population is divided into two or more groups called strata, according
to some criterion, such as geographic location or grade level, and subsamples are
randomly selected from each group.
Task A of the activity Sampling Methods illustrates these ideas. Here the “width” of a
rectangle is understood as the rectangle’s horizontal dimension. Table 1 and Table 2
show the numbers of the rectangles divided in to groups: those with widths less than 3,
and those with width greater than or equal to 3.
These two strata are not of equal size. There are 59 rectangles with widths less than 3 and
41 with widths greater than or equal to 3.
7. Participants choose 5 rectangles (using the random number generator) from each
strata – that is 5 small rectangles and 5 large rectangles for a sample of 10. To
calculate the sample average of the rectangles’ areas for the combined strata
sample, they must use the population proportion, as follows: (Slide 9)
59
41
∙ (𝑚𝑒𝑎𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝑇𝑎𝑏𝑙𝑒 1 ) +
100
100
∙ (𝑚𝑒𝑎𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝑇𝑎𝑏𝑙𝑒 2)
8. Pooling the information from all participants, create a simulated sampling
distribution and find the sample average. Describe the shape, center and spread of
the distribution.
Cluster Sampling
Cluster sampling is often used when studies look across large populations whose
members may be widely dispersed and a reliable list of all the individuals or objects in
the population is not available.
5
Algebra 1 Summer Institute 2014
In Task B of the activity Sampling Methods, participants explore the technique of cluster
sampling by considering the 100 rectangles again, this time in clusters that they will
sample to make another simulated sampling distribution of the sample average areas.
Table 3 has the rectangles divided in clusters of 5 by proximity.
9. Using a random number generator, participants select 2 of the twenty clusters.
Find the average area of the sample of 10 rectangles. Compile the class data and
show all sample averages in a simulated sample distribution. (Slide 10)
10. Have a class discussion comparing the sampling methods used in this session.
a. Does the sample size have an effect on the results of the investigation?
b. What observations can be made about the sampling methods explored:
simple random sampling, stratified random sampling, and cluster
sampling? (Slide 11)
6
Download