P. STATISTICS LESSON 5

advertisement
CHAPTER 5: PRODUCING DATA
Section 5.1 – Designing Samples
INTRODUCTION



Our goal in choosing a sample is to obtain a
picture of the population that is disturbed as
little as possible by the act of gathering
information.
Sample surveys are one kind of observational
study.
In other settings, we gather data from an
experiment.
OBSERVATION VS. EXPERIMENT
 An
observational study observes
individuals and measures variables of
interest but does not attempt to influence the
responses.
 An
experiment, on the other hand,
deliberately imposes some treatment on
individuals in order to observe their
responses.
 Both
have important roles depending on the
situation and the questions to be answered.
 See
example 5.1 on p.270
ADDITIONAL FACTS
 Observational
studies of the effect of one
variable on another often fail because the
explanatory variable is confounded with
lurking variables.
 In
some situations, it may not be possible
to observe individuals directly or to
perform an experiment. In other cases, it
may be logistically difficult or simply
inconvenient.
POPULATION AND SAMPLE
 The
entire group of individuals that we
want information about is called the
population.
A
sample is a part of the population that
we actually examine in order to gather
information.
SAMPLING VS. CENSUS
 Sampling
involves studying a part in
order to gain information about the whole.
A
census attempts to contact every
individual in the entire population.
 The
design of a sample refers to the
method used to choose the sample from
the population.

Poor sample design can produce misleading
conclusions.
VOLUNTARY RESPONSE SAMPLE
A
voluntary response sample consists of
people who choose themselves by responding
to a general appeal.


Voluntary response samples are biased because
people with strong opinions, especially negative
opinions, are most likely to respond.
See example 5.2 on p.272
 Another
type of bad sampling is
convenience sampling, which chooses the
individuals easiest to reach.

See example 5.3 on p.272
BIAS
 The
design of a study is biased if it
systematically favors certain outcomes.
 Choosing
a sample by chance attacks bias by
giving all individuals an equal chance to be
chosen.
SIMPLE RANDOM SAMPLE



A simple random sample (SRS) of size n consists of n
individuals from the population chosen in such a way that
every set of n individuals has an equal chance to be the
sample actually selected.
An SRS not only gives each individual an equal chance to
be chosen, (thus avoiding bias in the choice) but also gives
every possible sample an equal chance to be chosen.
Not all random samples are simple random samples:

If I have 10 names in a hat, the probability of any one name
being drawn is 1 out of 10. After the first name is drawn, there
are nine names left in the hat changing the probability of
anyone being selected as the second name to be 1 out of 9.
Since different names have different probabilities depending
on the sequence in which the drawing is done, the resulting
sample will not be a simple random sample.
RANDOM DIGITS
A
table of random digits is a long string of the
digits 0,1,2,3,4,5,6,7,8,9 with the following two
properties:
1.
Each entry in the table is equally likely to be any of
the 10 digits 0 through 9.
2.
The entries are independent of each other. That is,
knowledge of one part of the table gives no
information about any other part.
*Table B (in the back of your textbook) is a Random Digit Table
CHOOSING AN SRS
Choose an SRS in two steps:
1.
Table: Use Table B to select labels at random.
2.
Label: Assign a numerical label to every
individual in the population.

See example 5.4 on p.276
PROBABILITY SAMPLE



A probability sample is a sample chosen by
chance. We must know what samples are
possible and what chance, or probability, each
possible sample has.
Systematic sampling is a type of probability
sampling in which the sampling starts by
selecting an element from the list at random and
then every kth element in the frame is selected to
be part of the sample.
The use of chance to select the sample is the
essential principal of statistical sampling.
STRATIFIED RANDOM SAMPLE
 To
select a stratified random sample,
first divide the population into groups of
similar individuals, called strata. Then
choose a separate SRS in each stratum
and combine these SRSs to form the full
sample.
 This
method is usually used for sampling
from large populations spread out over a
wide area.
CLUSTER SAMPLING


With cluster sampling, the researcher divides the
population into separate groups, called clusters. Then, a
simple random sample of clusters is selected from the
population and ALL members in the cluster become part of
the sample.
For example, if a researcher is studying the attitudes of
Catholic Church members surrounding the recent exposure
of sex scandals in the Catholic Church, he or she might
first sample a list of Catholic churches across the country.
Let’s say that the researcher selected 50 Catholic Churches
across the United States. He or she would then survey all
church members from those 50 churches.
MULTISTAGE SAMPLING


A typical example of multistage sampling is Current
Population Survey Sampling Design, which is conducted as
follows:

Stage 1: Divide the United States into 2007 geographical areas
called Primary Sampling Units.

Stage 2: Divide each PSU selected into smaller areas
called “neighborhoods” using ethnic and other
information and take a stratified sample of
neighborhood.

Stage 3: Sort the housing units in each neighborhood
into clusters of four nearby units. Interview the
households in a random sample of these clusters.
This method saves time and money.
CAUTIONS ABOUT SAMPLE SURVEYS

We need a complete and accurate list of the population.

Undercoverage occurs when some groups in the population
are left out of the process of choosing the sample.

See example 5.6 on p.281

Nonresponse occurs when an individual chosen for the
sample can’t be contacted or does not cooperate.

Response bias is when respondents lie, especially if asked
about illegal or unpopular behaviors.

An interviewer whose attitude suggests that some answers are
more desirable than others will get these answers more often.

The wording of questions is the most important influence
on the answers given to a sample survey.

See example 5.7 on p.282
INFERENCE ABOUT THE POPULATION
 Using
chance to choose a sample
eliminates bias in the actual selection of
the sample.
 Because
we deliberately use chance, the
results obey the laws of probability that
govern chance behavior.
 Larger
random samples give more
accurate results than smaller samples.

Homework:
Download