Chapter 5 - danagoins

advertisement
Sampling Design
How do we gather data?




Surveys
Opinion polls
Interviews
Studies




Observational
Retrospective (past)
Prospective (future)
Experiments
Population
the
entire group of
individuals that we
want information
about
Census
a
complete count of
the population
Sample
A
part of the population
that we actually examine in
order to gather information
 Use sample to generalize
to population
Sampling design
refers
to the method
used to choose the
sample from the
population
Sampling frame
a
list of every
individual in the
population
Simple Random
Suppose we were to take an SRS of
Sample
Not
only
does
each (SRS)
student
has the
100
SHS
students
– put each
same
chance to
be in
selected
– Then
but every
students’
name
a
hat.
 consist
ofgroup
n
individuals
from
the
possible
of
100
students
has
the
randomly select 100 names from the
same
chance
to be
population
chosen
in
such
a way
hat.
Each
student
hasselected!
the same
Therefore,
it has
to be
possible for all
chance
to
be
selected!
that
100 students to be seniors in order for
it to be an
SRS!
 every individual
has
an equal
chance of being selected
 every set of n individuals has an
equal chance of being selected
Stratified random
sample
Homogeneous groups are groups
Suppose we were to take a stratified
that are alike based upon some
random sample of 100 SHS students.
characteristic of the group
Since students are already divided by
members.
grade level, grade level can be our

strata. Then randomly select 50 seniors
and randomly select 50 juniors.
population is divided
into homogeneous
groups called strata
SRS’s are pulled from
each stratum
Cluster Sample
Suppose we want to do a cluster sample of
SHS students. One way to do this would
be to randomly select 10 classrooms during
2nd period. Sample all students in those

rooms!
based upon location
randomly pick a
location & sample all
there
Systematic
random sample
Suppose we want to do a systematic random
sample of SHS students - number a list of
students
(There
are approximately

select
sample2000bystudents
– if we
want a sample of 100, 2000/100 = 20)
following
systematic
Select a numberabetween
1 and 20 at
random. That student will be the first
approach
student chosen, then choose every 20
student from there.
randomly select where
to begin
th
Multistage
sample
To use a multistage approach to sampling
SHS students, we could first divide 2nd
period classes by level (AP, Honors,
Regular, etc.) and randomly select 4 second
period classes from each group. Then we

could randomly select 5 students from each
of those classes. The selection process is
done in stages!
select successively
smaller groups within
the population in
stages
Identify the sampling design
1)The Educational Testing Service
(ETS) needed a sample of colleges.
ETS first divided all colleges into
groups of similar types (small
public, small private, etc.) Then
they randomly selected 3 colleges
from each group.
Stratified random sample
Identify the sampling design
2) A county commissioner wants to
survey people in her district to
determine their opinions on a
particular law up for adoption. She
decides to randomly select blocks in
her district and then survey all who
live on those blocks.
Cluster sampling
Identify the sampling design
3) A local restaurant manager wants
to survey customers about the
service they receive. Each night
the manager randomly chooses a
number between 1 & 10. He then
gives a survey to that customer,
and to every 10th customer after
them, to fill it out before they
leave.
Systematic random sampling
Random digit
table
Numbers can be read across.
Numbers
can of
be the
readrandom
vertically.
The following
is part
digit table
found can
on page
847
of your
Numbers
be read
diagonally.
textbook:
Row
each
entry is equally
1
4 5 to
1 be
8 5 any
0 3 of
3 the
7 1
likely
2 4 2 5 5 8 0 4 5 7 0
10 digits
3 8 9 9 3 4 3 5 0 6 3
digits are independent
of each other
Suppose your population consisted of these 20 people:
1)
1) Aidan
Aidan
2) Bob
3) Chico
4) Doug
5) Edward
We will11)
need
to use double
6) Fred
Kathy
16) Paul
digit 12)
random
7) Gloria
Lori numbers,
17) Shawnie
ignoring13)
any
number greater
8) Hannah
13)
Matthew
Matthew
18) Tracy
than 20.
9) Israel
14)Start
Nan with Row
19) 1
Uncle Sam
10) Jung and
15)read
Opus across. 20) Vernon
Ignore.
Ignore.Ignore.
Ignore.
Use the following random digits to select a sample of five from these people.
Row Stop when five people are selected. So
1 4 5 my1 sample
8 0 would
5 consist
1 3 of
7 :1
2 0 1 5 5 8 0 1 5 7 0
3 8 Aidan,
9 9 Edward,
3 4 Matthew,
3 5 0Opus,
6 3
and
Tracy
Bias
A
systematic error in
measuring
the
Anything that causes the
data to be wrong! It
estimate
might be attributed to
the researchers, the
favors certain outcomes
respondent, or to the
sampling method!
Sources of Bias
things
that can
cause bias in your
sample
cannot do anything
Undercoverage
People with unlisted
phone numbers – usually
high-income families
some
groups of
People without
phone numbers –
population
left
Suppose
you take a areusually
lowsample by randomly
income families
out of
the
selecting
names
from selection
the phone book –
process
some
groups will not
People with ONLY cell
have the opportunity
of being selected!
phones – usually young
adults
Voluntary
response
 People
respond
An examplechose
would be to
the surveys
in
Remember
–
the
way
to
magazines that ask readers to mail in
the
Usually
onlyvoluntary
people
determine
survey.
Other
examples
arewith
callin shows,
Americanis:
Idol, etc.
response
very strong opinions
Remember, the respondent selects
respond
themselves to participate in the
Self-selection!!
survey!
Nonresponse
Because of huge telemarketing

efforts in the past few years,
telephone surveys have a MAJOR
People
are
chosen
by
the
problem
with
nonresponse!
One way
to
help
with
theresearchers,
problem
BUT refuse is
toto
participate.
of nonresponse
make follow
contact with the people who are
NOT
self-selected!
not home
when
you first contact
them.

This is often confused with voluntary
response!
occurs when an individual
chosen for the sample
can’t be contacted or
refuses to cooperate
telephone surveys 70%
nonresponse
Response bias
Suppose we wanted to survey high
school students on drug abuse and
we used a uniformed police officer
to interview each student in our
sample – would we get honest
Response biasanswers?
occurs when for some
reason (interviewer’s or respondent’s
fault) you get incorrect answers.
 occurs
when the behavior
of respondent or
interviewer causes bias
in the sample
 wrong answers
Convenience
sampling
The data obtained by a convenience
sample will be biased – however this
method is often used for surveys &
results reported in newspapers and
An example would be stopping
magazines!
friendly-looking people in the mall to
survey. Another example is the
surveys left on tables at restaurants
- a convenient method!
Ask
people who are
easy to ask
Produces bias
results
Wording of the
The level of vocabulary should be
appropriate
for the
you
Questions
Questions
mustpopulation
be worded
as
are surveying
neutral as possible to avoid
influencing
the influence
response.
 wording
can
the
– if surveying Podunk, AR,
thenare
you should
answers that
givenavoid
complex vocabulary.
 connotation of words
if surveying
doctors,
– use
of “big”
words
then use more complex,
technical
words
technical wording.
or
1. A uniformed policeman interviews a group
of 50 college freshmen. He asks each one his
or her name and then if he or she as used an
illegal drug in the last month.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
2. A survey about the food in the school
cafeteria was conducted by passing out
questionnaires to students as they entered
the cafeteria. A drop box for completed
forms was on a table by the cash register.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
3. The magazine Harley Davidson Today sent
a survey to its subscribers asking whom they
admire most in America.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
4. A poll of parents in Texas found that 90%
of parents say they have spoken to their
teenagers about the dangers of drinking and
driving, while only 45% of those teens say
they recall such a discussion.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
5. In a census in Russia, 1.8 million more
women than men reported that they were
married.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
6. One year after the Detroit race riots of
1967, interviewers asked a sample of black
residents in Detroit if they felt they could
trust most white people, some white people,
or none at all. When the interviewer was
white, 35% answered "most"; when the
interviewer was black, 7% answered "most".
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
7. A political party mailed questionnaires to
all registered voters in Texas, asking
whether or not the party should support the
death penalty. The voters mailed the
completed questionnaires back in an envelope
provided.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
8. The Nielson rating service estimates the popularity of
television stations in the Dallas area. Suppose that four
times a year, Nielson takes a random sample of about 5000
viewers. Every member of the household over age 12 is
asked to fill out a diary, showing what he or she watches
every quarter hour from 6:00 am to midnight. Each diarist
receives $5 for his or her trouble. At the end of 12 weeks,
Nielson tallies the results from the usable diaries - usually
between 33% and 50% of the 5000 sent out.
A. Selection bias
B. Measurement or Response bias
C. Nonresponse bias
D. Systematic rejection bias
9. In the 1936 presidential election, Franklin D. Roosevelt
ran for reelection against Alfred Landon. As it had done
since 1916, the Literary Digest, a popular magazine, ran a
preelection poll. To obtain its sample, the magazine compiled
a list of about 10 million names from sources such as
telephone books, lists of automobile owners, club
membership lists, and its own subscription lists. All 10
million people received questionnaires, about 2.4 million
returned them; these people made up the sample. Literary
Digest had correctly predicted the winner in all presidential
races since 1916. Then in 1936, based on sample responses,
the magazine predicted that Landon would win, 57% to 43%.
In fact, Roosevelt won, 62% to 38%.
A.
Selection bias
B.
Measurement or Response bias
C.
Nonresponse bias
D.
Systematic rejection bias
Which of the following sampling methods
produces a simple random sample?
10. From a class of 25 students, the teacher
selects the last 5 to enter the room to be in
the sample.
A)Is a simple random sample
B) Is not a simple random sample
11. From a group of 100 employees, the
manager selects those whose phone
numbers end in 7.
A) Is a simple random sample
B) Is not a simple random sample
12.A large elementary school has 15 classes
with 24 children in each classroom. A sample
of 30 is chosen by the following procedure:
Each of the 15 teachers selects 2 children
from his or her classroom to be in the sample
by numbering the children from 1 to 24, then
using a random digit table to select two
different numbers between 01 and 24. The
two children with those numbers are in the
sample.
A) Is a simple random sample
B) Is not a simple random sample
13. Suppose that in a class of 24 there are 12
boys and 12 girls. The teacher selects 6
students for a sample by numbering the boys
from 1 to 12 and the girls from 1 to 12. Then
using a random digit table, the first number
between 01 and 12 is a boy, the next number
between 01 and 12 is a girl and so on until the
6 students are selected.
A) Is a simple random sample
B) Is not a simple random sample
14. Suppose that in a class of 24 there are 12
boys and 12 girls. The teacher selects 6
students for a sample by numbering the boys
from 1 to 12 and the girls from 13 to 24. Then
she uses a random number table to select 6
two-digit numbers between 01 and 24.
A) Is a simple random sample
B) Is not a simple random sample
Definitions:
1) Observational study observe outcomes without
imposing any treatment
2) Experiment - actively impose
some treatment in order to
observe the response
3)Experimental unit – the single
individual (person, animal,
plant, etc.) to which the
different treatments are
assigned
4) Factor – is the explanatory
variable – it’s what we test
5) Level – a specific value for
the factor
6) Response variable – what you
measure
7) Treatment – a specific
experimental condition applied to
the units
8) Control group – a group that
is used to compare the factor
against; can be a placebo or the
“old” or current item
9) Placebo – a “dummy”
treatment that can have no
physical effect
10) blinding - method used so
that units do not know which
treatment they are getting
11) double blind - neither the
units nor the evaluator know
which treatment a subject
received
Principles of Experimental
Design
Control of effects of extraneous
variables on the response – by
comparing treatment groups to a
control group (placebo or “old”)
 Replication of the experiment on
many subjects to quantify the
natural variation in the experiment
 Randomization – the use of chance
to assign subjects to treatments

The ONLY way to show
cause & effect is with a
well-designed, wellcontrolled experiment!!!
Example 1: A farm-product manufacturer wants
to determine if the yield of a crop is different
when the soil is treated with three different
types of fertilizers. Fifteen similar plots of
land are planted with the same type of seed
but are fertilized differently. At the end of
the growing season, the mean yield from the
sample plots is compared.
Experimental units?
Plots of land
Factors?
Type of fertilizer
Levels?
Fertilizer types A, B, & C
Response variable?
Yield of crop
How many treatments?
3
Example 2: A consumer group wants to
test cake pans to see which works the
best (bakes evenly). It will test aluminum,
glass, and plastic pans in both gas and
electric ovens.
Experiment units?Cake
Factors?
Levels?
batter
Two factors - type of pan & type of oven
Type of pan has 3 levels (aluminum, glass, & plastic
& type of oven has 2 levels (electric & gas)
Response variable? How
evenly the cake bakes
Number of treatments?
6
Experiment Designs
 Completely
randomized – all
experimental units are
allocated at random among all
treatments
explanatory
Treatment group 1
response
Treatment group 2
variable
variable
Treatment group 3
Randomized block – units are
blocked into groups
(homogeneous) and then
randomly assigned to treatments
explanatory
Group1
Random assignment

Treatment 1
Treatment 2
Treatment 3
response
varaible
Units should be blocked Treatment
on a variable
that
1
Group2
Treatment 2
effects
the response!!!
varaible
Treatment 3
•Matched pairs - a special
type of block design
 match
up experimental units
according to similar
characteristics & randomly assign
on to one treatment & the other
automatically gets the 2nd
treatment
 have each unit do both
treatments in random order
 the assignment of treatments is
dependent
12) Confounding variable – the
effect of the confounding
variable on the response cannot
be separated from the effects of
the explanatory variable (factor)
Example 5: Four new word-processing
programs are to be compared by measuring
the speed with which standard tasks can
be completed. One hundred volunteers are
randomly assigned to one of the four
programs and their speeds are measured.
Is this an experiment? Why or why not?
Yes, a treatment is imposed.
What type of design is this?
Completely randomized
Factors? Levels? one factor: word-processing
program with 4 levels
Response variable? speed
Example 5: Four new word-processing
programs are to be compared by
measuring the speed with which
standard tasks can be completed.
One hundred volunteers are randomly
do the
a block
designedYou
to could
one of
four programs
design where each person
and theiruses
speeds
are measured.
each program
in
random order.
Is there a potential confounding
variable?
Can this design
NO, completely randomized
designs have no confounding
be improved?
Explain.
Randomization reduces bias by
spreading any uncontrolled
Is there another way
confounding variables evenly
to reduce variability?
throughout the treatment groups.
Blocking also helps reduce
variability.
Bias is a systematic error in
measuring
the by
estimate
Variability
is controlled
sample
size. Larger samples produce
statistics with less variability.
High bias & high variability
Low bias & high variability
High bias & low variability
Low bias & low variability
Download