Tipping

advertisement
Topic 4
Random Sampling
In-Class Activities
Activity 4-1: Sampling Words
4-1, 4-2, 4-3, 4-4, 4-7, 4-8, 8-9, 9-15, 14-6
a.
Answers will vary.
The answers given here are one example.
b.
Word
score
Number 5
of
Letters
Word
did
Number 3
of
Letters
forth
whether
have
might
5
7
4
5
here
full
resolve
perish
4
4
7
6
c.
[letters.pdf]
observational units = words
variable = number of letters per word; type = quantitative
d.
average = 5 letters per word; statistic
e. An example set of responses from one class:
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
1
[averagelengths.pdf]
f.
observational units = samples of 10 words
variable = average number of letters per word; type = quantitative
g.
In this example 8/10 = .8 of the students produced a sample average greater than 4.29
letters per word.
h.
Yes – this sampling method appears to be biased. It appears to tend to overestimate the
population mean. This is evident from the dotplot because it is centered at about 5.7 (rather than
4.29), and indicates that a large proportion of the class selected samples that had means greater
than 4.29.
i.
Our eyes are most likely drawn to the longer words – we tend to overlook the short,
common words like “a”, “and” ‘is” and “or.” Thus when we try to choose representative
samples, we do not select enough short words in our sample.
j.
If we use this method we would also be likely to select too many long words in our
sample because the long words take up more space on the page and therefore have a greater
chance of being selected when we blindly point to a location.
k.
No – increasing the sample size will not make up for the biased sampling method. We
would still tend to overrepresent the long words.
l.
We need to employ a truly randomly method to select the words –we could write each
word on the same size slip of paper, put each slip in a hat, mix them thoroughly, then draw ten
slips from the hat.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
2
Activity 4-2: Sampling Words
4-1, 4-2, 4-3, 4-4, 4-7, 4-8, 8-9, 9-15, 14-6
a.
Many answers are possible. The following was obtained using the beginning of line 60:
Random
Digits
Word
Word Length
1
031
2
025
3
052
4
076
5
059
now
3
that
4
can
3
a
1
a
1
b.
average word length = 2.4 letters per word
c.
Answers will vary, below is an example from one class.
[samplemeans.pdf]
d.
This distribution is much closer to being centered at 4.29 and has a smaller horizontal
spread than the previous one did (though the latter is not always the case).
e.
The sample averages are roughly split evenly on both sides of 4.29.
f.
Yes – random sampling appears to have produced unbiased estimates of the average word
length in the population.
Activity 4-3: Sampling Words
4-1, 4-2, 4-3, 4-4, 4-7, 4-8, 8-9, 9-15, 14-6
Answers will vary. The following are from one particular running of the applet.
a.
Word
Number
of Letters
1
The
3
2
these
5
3
here
4
4
for
3
5
should
6
Average number of letters = 4.2
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
3
b.
You will probably not obtain the same sample of words or the same average length the
second time.
c.
Average of the 500 sample averages = 4.31 letters per word
d.
Yes – this appears to be ‘around’ 4.29.
e.
Answers will vary according to student expectation.
f.
The center of this distribution should also be near 4.29 but the horizontal spread much
less.
g.
The distribution of the samples of size 20 has less variability (more consistency) in the
values of the sample average word length.
h.
The result of a single sample is more likely to be close to 4.29 with a sample of size 20
than with a sample of size 5.
i.
No – increasing the sample size when using a biased sampling method will not reduce the
bias. The results from different samples will tend to be closer together but will still be centered
in the wrong location (not around the parameter value of interest). If you want to reduce the bias
you must change the sampling method.
Activity 4-4: Sampling Words
[insert computer screen icon]
4-1, 4-2, 4-3, 4-4, 4-7, 4-8, 8-9, 9-15, 14-6
a. Below is one example set of results
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
4
[wordsapplet.pdf]
b.
Both distributions are roughly bell-shaped, centered at about 4.29 with a horizontal
spread from about 2 to 7.
c.
Yes – these distributions seem to have similar variability.
d.
No – not much changed when we sampled from the larger population.
Activity 4-5: Back to Sleep
[insert checkmark icon]
4-5, 6-5, 21-2
a.
The population of interest is all infants younger than eight months in the United States in
those years. The sampling frame is the list of households with such infants, generated from birth
records, infant photography companies, and infant formula companies. The sample consists of
the infants in the 1002 households whose mother (or other caregiver) participated in the
interview.
b.
The sample size is 1002. (Actually, a total of 1015 infants were in the sample because
some households had twins.)
c.
The researchers did not technically obtain a simple random sample of infants. One reason
is that the sampling frame did not include the entire population. Another reason is that more than
half of the numbers called did not lead to an interview. Infants who were not included in the
sampling frame or whose mother declined to participate might differ systematically in some
ways from those who were included. Nevertheless, the researchers did use randomness to select
their sample, and they probably obtained as representative a sample as reasonably possible.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
5
d.
Perhaps mothers in those groups were in a lower economic class and, therefore, less
likely to have phones in the first place, or perhaps they had to work so their children were in
daycare.
e.
These comparisons address the issue of bias, not precision. The sampling method was
slightly biased with regard to the mother’s race and age and the infant’s birth weight.
f.
These percentages are statistics because they are based on the sample.
g.
The large sample size produces high precision. This means that the sample statistics are
likely to be close to their population counterparts. For example, the population proportion of
infants who sleep on their back should be close to the sample proportion who sleep on their
backs.
h.
The sample size for subgroups is smaller than for the whole group, so the sample results
would be less precise.
Homework Activities
Activity 4-6: Rating Chain Restaurants
a.
It seems unlikely that this sample was randomly chosen as it would be extremely difficult
to give each Consumer Reports reader an equally likely chance of being selected for the sample
and to ensure that everyone selected responded. It is much more likely that the responders selfselected by returning a survey.
b.
The authors probably make the disclaimer because the sample was not randomly selected
from the entire population but only of their readers who may have different habits and attitudes
from non-readers and therefore cannot reasonably be extended to the general population.
c.
Answers will vary, but we probably should generalize these results only to Consumer
Reports readers who tend to visit full-service restaurant chains and like to complete surveys.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
6
Activity 4-7: Sampling Words
4-1, 4-2, 4-3, 4-4, 4-7, 4-8, 8-9, 9-15, 14-6
a.
categorical (binary)
b.
99/268 = .369
c.
parameter. .369 is the proportion of all 268 words (the population) in the Gettysburg
address that is over 5 letters long.
d.
No – because of sampling variability we would not expect the sample proportion to equal
.369, but we would expect it to be reasonably close most of the time. (In fact, with a sample of
size 5, the sample proportion could not equal .369, it could only be 0, .2, .4, .6, .8, or 1.)
Activity 4-8: Sampling Words
4-1, 4-2, 4-3, 4-4, 4-7, 4-8, 8-9, 9-15, 14-6
Answers will vary. These are based on one particular running of the applet.
[include screen capture?]
a.
Yes – this distribution should be centered at about .369 (it is .38 in this case).
b.
This distribution should still be centered at .369 (the mean is .37), but with much less
variability.
c.
Since we are taking random samples, we expect our sample proportions to center around
the parameter (.369), regardless of the sample size. However, as we increase the sample size, we
expect our samples to become more precise, that is, we expect the variability between samples to
decrease.
Activity 4-9: Sampling Senators
4-9, 4-18
a.
observational units = U.S. senators
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
7
variable = years of service in the senate
population = current 100 U.S. Senators
sample = 5 selected current U.S. senators
parameter = average years of service of all 100 U.S. senators
statistic = average years of service of the 5 selected senators
b.
This sampling method would most likely overestimate the average years of service since
your classmates would most likely select names of well-known senators who have been serving
in the senate for a long time. (You also need to worry about a tendency for students to mention
the senators from their own state more than other states.)
c.
No – increasing the sample size will not correct for a biased sampling method. Students
would still tend to overrepresent the senators who have served longer.
d.
Obtain a list of the current senators. Number each senator in the list from 00-99. Select
any row of the table of random digits and read the row as a sequence of 2-digit numbers. These
2-digit numbers tell you which senators from your list will make up your sample. Continue
selecting senators until you have five senators in your sample. Skip any repeated 2-digit
numbers.
e.
Obtain a list of the current representatives. Number each representative in the list from
000-434. Select any row of the table of random digits and read the row as a sequence of 3-digit
numbers. These 3-digit numbers tell you which senators from your list will make up your
sample. Skip any repeated 3-digit numbers, or numbers greater than 434. Continue selecting
representatives until you have five representatives in your sample.. If necessary, continue to
another row of the table of random digits.
Activity 4-10: Responding to Katrina
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
8
2-12, 4-10, 16-13
Based on the sample sizes, the non-Hispanic white adults’ responses probably come closer to
reflecting the group’s population value than the black adults’ responses do because there were so
many more white adults sampled. If both samples were selected randomly, the larger sample is
more likely to produce a sample result similar to the population parameter.
Activity 4-11: Rose-y Opinions
a.
observational units = 1000 individuals
variable = did they have a favorable or unfavorable opinion of Pete Rose? (categorical)
b.
population = American sports fans
sample = first 1000 people leaving a LA Lakers’ basketball game
c.
This was not a randomly selected sample. People attending this basketball game are not
necessarily sports fans in general, or may be extreme LA Laker fans (fanatics), or just basketball
fans. This is an example of convenience sampling and is unlikely to result in a representative
sample.
d.
No, the individuals in the sample may still be only interested in basketball and not sports
in general.
e.
If you have a list of subscribers to Sports Illustrated you could number the list and use a
table of random digits or computer to select a random sample of subscribers. The population that
would be represented by this sample would be all readers of Sports Illustrated which would
certainly be more representative of the general sports fan than the previous methods.
f.
The parameter is the percentage of American sports fans that have an unfavorable
opinion of Pete Rose. Its value is unknown. The statistic is the 49% of the 1000 people
interviewed by the Gallup pollsters who said they had an unfavorable opinion of Pete Rose.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
9
g.
The value of the statistic would most likely change if Gallup had selected another random
sample of 1000 people to interview. But the value of the parameter would remain the same.
Activity 4-12: Sampling on Campus
a.
observational units = college freshmen; variable = weight gained during the first term at
college; population = all U.S. college freshmen; sample = random sample of college freshmen;
parameter = average weight gained by all college freshmen during their first term.
Since it would be impossible to obtain a random sample of all U.S. college freshmen,
work with freshmen at a particular college. Obtain a list of all freshmen from the registrar.
Number the list and use a table of random digits to obtain a random sample of freshmen.
b.
observational units = college students; variable = price paid for textbooks; population =
all U.S. college students; sample = random sample of college students; parameter = average
price paid for textbooks by all college students.
Since it would be impossible to obtain a random sample of all U.S. college students, work
with students at a particular college. Obtain a list of all students from the registrar. Number the
list and use a table of random digits to obtain a random sample of students.
c.
observational units = pages of your history book; variable = number of words on each
page; population = all pages in your history book; sample = random sample of pages from your
history book; parameter = average number of words per page in your history book.
Number all the pages in your history book consecutively. Use a table of random digits to
select a sample of pages from your book and count all the words on these pages.
d.
observational units = college faculty; variable = political party registration; population =
all U.S. college faculty; sample = random sample of U.S. college faculty; parameters =
percentages of U.S. college faculty that are registered in each political party.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
10
Since it would be impossible to obtain a random sample of all U.S. college faculty, work
with faculty at a particular college. Obtain a list of all faculty, and number the list. Then use a
table of random digits to obtain a random sample of faculty.
Activity 4-13: Sport Utility Vehicles
a.
observational units = vehicles
variable = whether or not the vehicle is an SUV
population = all vehicles on the road in your hometown
sample = the vehicles that pass by the intersection between 7 and 8 AM that morning
parameter = the proportion of all vehicles on the road in your hometown that are SUVs
statistic = the proportion of all vehicles that pass by that morning that are SUVs
b.
The vehicles that you observed between 7 and 8 AM may not be representative of all
vehicles on the road. For example, the vehicles may be used to carpool children to school and
therefore overrepresent larger families with children and larger cars or they may be
predominantly commuter vehicles more than weekend recreational vehicles and underrepresent
the proportion of SUVs.
c.
The sampling frame is the list of cars sold by that dealer.
d.
The recently purchased vehicles will probably not represent the vehicles on the road in
your town. For example, there may have been a backlash against SUVs recently because of high
gas prices so that fewer SUVs were purchased in the last year, yet many people would still own
them from purchases several years ago.
Activity 4-14: Generation M
3-8, 4-14, 13-6, 16-1, 16-3, 16-7, 18-1, 21-11, 21-12
a.
Your classmates form a sample as they are only a subset of all students at your school.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
11
b.
Answers will vary. This number is a statistic since it is collected from your class (a
sample).
c.
Answers will vary from class to class, but the numbers calculated will all be statistics.
d.
No – you and your classmates do not constitute a random sample of the students at your
school because every student did not have an equal chance of being selected for the sample.
e.
Answers will vary by school and class.
f.
Answers will vary by school and class.
Activity 4-15: Emotional Support
4-15, 18-19
a.
Hite’s sampling method is likely to be biased in the direction of women who think they
give more support than they receive. She sampled women in women’s groups who usually join
because they aren’t getting the kind of companionship they want from their husbands or
boyfriends.
b.
Hite’s poll surveyed the larger number of women.
c.
The ABC News/Washington Post poll was probably more representative of the truth
about the population of all American women since they used random sampling that was
presumably unbiased.
Activity 4-16: College Football Players
a.
position = categorical; weight = quantitative; class = categorical
b.
Example answer – using line 13 of the table:
First delete the 17 red-shirted freshmen from the list. Then renumber the remaining list
from 01 to 82. Then, use line 13 to select players 54 Brock Daniels (275 lbs), 40 Aris Borjas
(200 lbs), 02 Courtney Brown (205 lbs), 21 Anthony Randolph (220 lbs), 50 Jason Relyea (220
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
12
lbs), 56 Perris Kelly (285 lbs), 55 Kenny Calderone (285 lbs), 87, 52 Bobby Best (245 lbs), 86,
07 Pat Johnston(195 lbs), 30 Drew Robinson (195 lbs), 34 Martin Mates (185 lbs), 05 Mike
Anderson (180 lbs), 60 Lucas Trily (235 lbs), 57 Patrick Koligian (250 lbs), 62 Julai Tuua (275
lbs).
The average weight in this sample is 230 lbs. This weight should be fairly close to the
average weight of all 82 players since we took a random sample, but we don’t expect it to match
exactly. In particular, while this value will vary from sample to sample, we don’t expect a
tendency to consistently overestimate or underestimate the population mean weight..
Activity 4-17: Phone Book Gender
4-17, 16-16, 18-11
a.
parameter = the proportion of women living in San Luis Obispo County
statistic = the proportion of women listed on the randomly selected phone book page
b.
This sampling technique will give a biased estimate for the proportion of women living in
San Luis Obispo County because the phone listings of many married women are often only
under their husbands’ names. In addition, many single women choose not to list their phone
numbers to avoid harassing phone calls. Therefore, we expect the statistic will be an
underestimate of the population parameter.
Activity 4-18: Sampling Senators
4-9, 4-18
a.
This would produce the most variability because it has the smallest sample size.
b.
This would produce the least variability because it has the largest sample size.
c.
This would have less variability than (a) but more than (d).
d.
This would have more variability than (b) but less than (c).
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
13
From most variability to least variability: a, c, d, b.
As the sample size increases, regardless of the size of the population, the variability in the
sample values decreases.
Activity 4-19: Voter Turnout
4-19, 18-10
a.
1783/2613 = .682
b.
This is a statistic because it is a number calculated from a sample (of 2613 adults).
c.
Did you vote in the 1996 election?
1
Proportion
0.8
0.6
0.4
0.2
0
yes
no
Response
d.
This number (49%) is a parameter because the Federal Election Commission has the
records of all registered voters. Everyone who was eligible to vote was included in this number.
e.
No, the sample grossly overestimated the proportion of eligible voters who actually
voted.
f.
While the sample result is unlikely to exactly match the population value, this difference
is probably too large to be attributed to sampling variability.
g.
People may be reluctant to tell the truth (and seem unpatriotic) and so may overstate
whether or not they voted. They might not remember that they didn’t vote in this particular
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
14
election. Even with random samples, we have to worry about the honesty of the respondents in
surveys.
Activity 4-20: Nonsampling Sources of Bias
a.
The proportions of “yes” responses would most likely differ between these two groups.
The question that includes the words “horrific murder” is obviously putting a negative idea into
the minds of those surveyed, while the other question seems neutral.
b.
The proportions declaring agreement with the policy might differ between these two
groups. Those interviewed by the smoker might feel pressured into disagreement.
c.
The proportion of “yes” responses would probably be lower than the actual proportion of
married people in the community who have engaged in extramarital sex. This manner of survey
is not very confidential, and the surveyor would be hard-pressed to get honest answers to such a
personal and potential harmful question.
d.
We should not be surprised that the proportions would differ between these two groups.
The President’s views on foreign policy would be fresh in the minds of one group, while the
other group would have to recall past speeches or actions of the President in order to form an
opinion. Approval ratings tend to rise shortly after rousing speeches but then come back down
again over time.
e.
How the question is worded, appearance of the interviewer, lack of confidentiality,
knowledge of the topic and timing of the question.
Activity 4-21: Prison Terms and Car Trips
a.
Prisoners with longer terms have a higher probability of ending up in the sample (similar
to how longer words are more likely to be selected when you point your finger at one spot on the
page).
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
15
b.
Cars engaged in longer trips have a higher chance of being observed at a particular time
point than cars on short trips.
c.
Many answers are possible, but one example is estimating the average length of time that
people have been employed by a particular company. If we take a random sample of employees,
employees that have been around longer have a better chance of ending up in the sample.
Rossman/Chance, Workshop Statistics, 3/e
Solutions, Unit 1, Topic 4
16
Download