Daniel S. Yates
The Practice of Statistics
Third Edition
Chapter 5:
Producing Data
Copyright © 2008 by W. H. Freeman & Company
Chapter 5 – Producing Data
• Sampling – a technique used to study a part or sample of a
larger group in order to gain information about the entire
group. The sample must be chosen carefully.
• Experiment – involves more than observations or questions
of individuals. A condition is imposed on the individuals
in order to observe a response. Experiments must be
designed carefully.
• Confounding variables can disguise the effects of
explanatory variables on response variables. Experiments
must be designed to control these variables.
Section 5.1 – Designing Samples
• Sample design is the method used to select a sample from a
population. Poor sample design will lead to bias and
misleading conclusions
• Simple random samples - a sampling design method
which attempts to eliminate bias.
• The easiest way to construct a SRS is to place names,
numbers, etc. in a “hat” and chose.
•
Using a random number table to generate a random
sample.
1. Assign a numerical label to every individual in the
population.
2. Use table B to select labels at random.
 Don’t scramble labels as you assign them. The
table will randomize.
 All labels must have the same number of digits.
Ex. If choosing 5 individuals out of 30. Assign:
01,02,03,04,……30 not 1,2,3,4,…….30.
 You can read Table B in any order and start
anywhere. Standard practice is to read across
rows.
Other sampling designs
Other sampling designs - continued
Multistage Sample – each stage is selected by a SRS
Ex. Want to personally interview 60,000 people
in the U.S.
Stage 1 - Take a SRS of the 3000 counties in the
U.S.
Stage 2 - Take a SRS of the towns within each
chosen county.
Stage 3 – Select a SRS of streets within each
chosen town.
Stage 4 – Select a SRS of households on each
street.
Cautions about sample surveys
• Sample Bias may be introduced by the
following:
• Response Bias – Respondents may lie or be
influenced by the race, sex, attitude or questioning
techniques of the interviewer. Wording of the question
can introduce bias.
• Even if great care is taken to design and carry out a
sample survey, it is highly unlikely that the sample
reflects the population exactly.
• However, the results do obey the laws of probability
because of random sampling. So we can determine The
margin of error. This is called statistical inference.
Large samples tend to give more
accurate results than smaller samples.
•
observing and measuring specific
characteristics without attempting to modify
the subjects being studied
Observational Study
apply some treatment and then observe its
effects on the subjects or experimental units
Experiment
of n subjects selected in such a way that
every possible sample of the same size n
has the same chance of being chosen
Simple Random Sample
selecting members from a population in such
a way that each member of the population
has a known (but not necessarily the same)
chance of being selected
Probability Sample
Select some starting point and then
select every kth element in the population
Systematic Sampling
Convenience Sampling
use results that are easy to get
subdivide the population into at
least two different subgroups that share the same
characteristics, then draw a sample from each
subgroup
Stratified Sampling
divide the population area into sections; randomly
select some of those sections; choose all members
from selected sections
Cluster Sampling
Collect data by using some combination of the
basic sampling methods
Pollsters select a sample in different stages, and
each stage might use different methods of
sampling
Multistage Sampling
 Randomization
is used when subjects are assigned to
different groups through a process of
random selection. The logic is to use
chance as a way to create two groups that
are similar.
 Replication
is the repetition of an experiment on more
than one subject. Samples should be large
enough so that the erratic behavior that is
characteristic of very small samples will not
disguise the true effects of different
treatments.
 Blinding
is a technique in which the subject doesn’t
know whether he or she is receiving a
treatment or a placebo. Blinding allows us
to determine whether the treatment effect is
significantly different from a placebo effect,
which occurs when an untreated subject
reports improvement in symptoms.
 Double-Blind
Blinding occurs at two levels:
(1) The subject doesn’t know whether he or
she is receiving the treatment or a
placebo
(2) The experimenter does not know
whether he or she is administering the
treatment or placebo
 Confounding
occurs in an experiment when the
experimenter is not able to distinguish
between the effects of different factors.
Controlling Effects of Variables
 Completely Randomized Experimental Design
assign subjects to different treatment groups
through a process of random selection
 Randomized Block Design
a block is a group of subjects that are similar, but
blocks differ in ways that might affect the outcome
of the experiment
 Rigorously Controlled Design
carefully assign subjects to different treatment
groups, so that those given each treatment are
similar in ways that are important to the experiment
 Matched Pairs Design
compare exactly two treatment groups using
subjects matched in pairs that are somehow
related or have similar characteristics
Summary
Three very important considerations in the design
of experiments are the following:
1. Use randomization to assign subjects to
different groups
2. Use replication by repeating the experiment on
enough subjects so that effects of treatment or
other factors can be clearly seen.
3. Control the effects of variables by using such
techniques as blinding and a completely
randomized experimental design
Section 5.3 – Simulating Experiments
• Simulation – The imitation of chance
behavior, based on a model that accurately
reflects the experiment under consideration.
– Ex. Flipping a coin to simulate the birth of a
baby. Heads-> Boy or Tails -> Girl
•
Basic Simulation procedure
1. State the problem or describe the experiment.
•
Ex. What is the likelihood of a run of 3
consecutive heads or 3 consecutive tails
when a coin is tossed 10 times.
2. State assumptions
•
A head or a tail are equally likely to occur
on each toss.
•
Tosses are independent of each other.
3. Assign digits to represent outcomes
•
Use random number table or calculator.
•
One digit represents one toss of the coin.
•
Odd digits represent heads; even digits
represent tails.
4. Simulate many repetitions
5. State conclusion.