sections 8.5-8.7

advertisement
Chapter 8
Sections 8.1 – 8.4
Bell Ringer
I CAN:
Daily Agenda
• Bell Ringer
• Review Bell Ringer
• I CAN
• Chapter 8
The distinction
between population
and sample is basic to
statistics. To make
sense of any sample
result, you must know
what population the
sample represents.
8.1 Sampling Students.
A political scientist wants to know how college students feel about the Social Security system.
She obtains a list of the 3456 undergraduates at her college and mails a questionnaire to 250
students selected at random. Only 104 questionnaires are returned.
(a)What is the population in this study? Be careful: about what group does she want
information?
(b)What is the sample? Be careful: from what group does she actually obtain information?
The important message in this problem is that the sample can redefine the population about
which information is obtained.
8.2 Student Archaeologists.
An archaeological dig turns up large numbers of pottery shards, broken stone
implements, and other artifacts. Students working on the project classify each artifact
and assign it a number. The counts in different categories are important for
understanding the site, so the project director chooses 2% of the artifacts at random and
checks the students’ work. What are the population and the sample here?
8.3 Software Survey.
A statistical software company is planning on updating Version 8.1 of its software and wants to
know what features are most important to users. The company’s managers have the email
addresses of 1100 individuals, mostly faculty at universities, for whom they have supplied free
courtesy copies of Version 8.1. They email these 1100 individuals and ask them to complete a
survey online. A total of 186 of these individuals complete the survey.
(a)What is the population of interest to the software company? Do you think the 1100
individuals contacted are representative of the population? Explain your reasons.
(b)What is the sample? From what group is information actually obtained?
8.4Sampling on Campus.
You see a student standing in front of the student center, now and then
stopping other students to ask them questions. She says that she is collecting
student opinions for a class assignment. Explain why this sampling method is
almost certainly biased.
8.5 More Sampling on Campus.
You would like to start a club for psychology majors on campus, and you are interested in
finding out what proportion of psychology majors would join. The dues would be $35 and
used to pay for speakers to come to campus. You ask five psychology majors from your
senior psychology honors seminar whether they would be interested in joining this club
and find that four of the five students questioned are interested. Is this sampling method
biased, and if so, what is the likely direction of bias?
8.6 Apartment Living.
You are planning a report on apartment living in a college town. You decide to select three
apartment complexes at random for in-depth interviews with residents. Use Table B (start at
line 128.) to select a simple random sample of four of the following apartment complexes.
Ashley Oaks
Bay Pointe
Beau Jardin
Bluffs
BrandonPlace
Briarwood
Brownstone
Burberry Place
Cambridge
Chauncey Village
Country View
Country Villa
Crestview
Del-Lynn
Fairington
Fairway Knolls
Fowler
Franklin Park
Georgetown
Greenacres
Mayfair Village
Nobb Hill
Pemberly Courts
Peppermill
Pheasant Run
River Walk
Sagamore Ridge
Salem Courthouse
Village Square
Waterford Court
8.7 Minority Managers.
A firm wants to understand the attitudes of its minority managers toward its system for
assessing management performance. Following is a list of all the firm’s managers who are
members of minority groups. Use Table B at line 141 to choose five managers to be
interviewed in detail about the performance appraisal system
Adelaja
Ahmadiani
Barnes
Bonds
Burke
Deis
Ding
Draguljic
Fernandez
Fox
Gao
Gemayel
Gupta
Hernandez
Huo
Ippolito
Jiang
Jung
Mani
Mazzeo
Modur
Rettiganti
Rodriguez
Sanchez
Sgambellone
Yajima
8.8 Sampling Gravestones.
The local genealogical society in Coles County, Illinois, has compiled records on all 55,914
gravestones in cemeteries in the county for the years 1825 to 1985. Historians plan to use
these records to learn about African Americans in Coles County’s history. They first choose
an SRS of 395 records to check their accuracy by visiting the actual gravestones.
(a)How would you label the 55,914 records?
(b)Use Table B, starting at line 137, to choose the first five records for the SRS.
8.9 Ask More People.
In the 2012 presidential pre-election surveys, Pew Research sampled 1,112 likely voters
during October 4-7, 2012, and asked if they were planning to vote for Obama, and then
asked the same question of a sample of 1,495 likely voters taken from October 24-28,
2012. However, in their last survey taken October 31-November 3, 2012, just before the
election held on November 6, 2012, they asked this question of a sample of 2,709 likely
voters. Why do you think Pew did this?
8.10 How Accurate Is the Poll?
The New York Times/CBS News poll conducted during February 19-23, 2014, included 1644
adults, of which 519 were Republican, 515 were Democrats, 550 were Independent, and 60
didn’t know or didn’t respond. Each person sampled was asked their opinion on a variety of
issues facing the nation, such as, “Do you feel that the distribution of money and wealth in
this country is fair, or do you feel that the money and wealth in this country should be more
evenly distributed among more people?” The margin of error (we will give more detail in later
chapters) was reported as ±3% for the entire sample. When considering the opinions of only
the Republicans in the sample, the margin of error was reported as ±6%. What do you think
explains the fact that estimates for Republicans were less precise than for the entire sample?
8.11 Sampling Metro Chicago.
Cook County, Illinois, has the second-largest population of any county in the United States
(after Los Angeles County, California). Cook County has 30 suburban townships and an
additional eight townships that make up the city of Chicago. The suburban townships are:
Barrington
New Trier
Stickney
Leyden
ParkRiver Forest
Elk Grove
Palatine
Bremen
Norwood Park
Worth
Maine
Schaumburg
Lemont
Rich
Orland
Bloom
Northfield
Wheeling
Riverside
Hanover
Proviso
Cicero
Berwyn
Niles
Thornton
Lyons
Chicago
South Chicago
Jefferson Lake
View Rogers
Evanston
Palos
Calumet
Oak
The Chicago townships are
Hyde Park
Park West
Lake North
Chicago
Because city and suburban areas may differ, the first stage of a multistage sample chooses a
stratified sample of five suburban townships and three of the more heavily populated
Chicago townships. Use software, the Simple Random Sample applet, or Table B to choose
this sample. (If you use Table B, assign labels in alphabetical order and start at line 116 for the
suburbs and at line 126 for Chicago.
8.12 Academic Dishonesty.
A study of academic dishonesty among college students used a two-stage sampling design. The
first stage chose a sample of 30 colleges and universities. Then, the study authors mailed
questionnaires to a stratified sample of 200 seniors, 100 juniors, and 100 sophomores at each
school. One of the schools chosen has 1127 freshmen, 989 sophomores, 943 juniors, and 895
seniors. You have alphabetical lists of the students in each class. Explain how you would assign
labels for stratified sampling. Then use software or Table B, starting at line 140, to select the
first five students in the sample from each stratum. After selecting five students for a stratum,
continue to select the students for the next stratum.
8.13 A Survey of 100,000 Physicians.
In 2010, the Physicians Foundation conducted a survey of physicians’ attitude about health
care reform, calling the report “a survey of 100,000 physicians.” The survey was sent to 100,000
randomly selected physicians practicing in the United States: 40,000 via post-office mail and
60,000 via email. A total of 2,379 completed surveys were received.10
(a)State carefully what population is sampled in this survey and what is the sample size. Could
you draw conclusions from this study about all physicians practicing in the United States?
(b)What is the rate of nonresponse for this survey? How might this affect the credibility of the
survey results?
(c)Why is it misleading to call the report “a survey of 100,000 physicians”?
8.14 Gays in the Military.
In 2010, a Quinnipiac University Poll and a CNN Poll each asked a nationwide sample about
their views on openly gay men and women serving in the military.11 Here are the two
questions:
Question A: Federal law currently prohibits openly gay men and women from serving in the
military. Do you think this law should be repealed or not?
Question B: Do you think people who are openly gay or homosexual should or should not be
allowed to serve in the U.S. military?
One of these questions had 78% responding “should,” and the other question had only 57%
responding “should.” Which wording is slanted toward a more negative response on gays in
the military? Why?
First, call screening is now common. A large majority of American households have
answering machines, voicemail, or caller ID, and many use these methods to screen their
calls. Calls from polling organizations are rarely returned.
More seriously, the percent of cell-phone-only households is increasing rapidly. By mid2007, 14% of American households had a cell phone but no landline phone; by the end of
2009, that percent had increased to almost 25%; and in 2012, the percent was almost 36%.
It’s clear from these numbers that RDD reaching only landline numbers is in trouble. Can
surveys just add cell phone numbers? Not easily. Federal regulations prohibit automated
dialing to cell phones, which rules out computerized RDD sampling and requires hand
dialing of cell phone numbers, which is expensive. A cell phone can be anywhere, and
many people keep their cell number despite moving, so stratifying by location becomes
difficult. And a cell phone user may be driving or otherwise unable to talk safely.
One alternative is to use web surveys, an increasingly popular survey method, rather than
telephone surveys. Web surveys have several advantages over more traditional survey
methods. It is possible to collect large amounts of survey data at lower costs than traditional
methods allow. Anyone can put survey questions on dedicated sites offering free services;
thus large-scale data collection is available to almost every person with access to the
Internet. Furthermore, web surveys allow delivery of multimedia survey content to
respondents, opening up new realms of survey possibilities that would be extremely difficult
to implement using traditional methods. Some argue that eventually web surveys will
replace traditional survey methods.
Although web surveys are easy to do, they are not easy to do well. Three major problems
are voluntary response, undercoverage, and nonresponse. Voluntary response appears
in several forms in online surveys. Example 8.3 is a survey that invited individuals to a
particular website to participate in a poll. Other web surveys solicit participation through
announcements in news groups, email invitations, and banner ads on high-traffic sites.
Undercoverage is a serious problem for even careful web surveys, because about 25% of
Americans lack Internet access and only about 70% have broadband access. People
without Internet access are more likely to be poor, elderly, minority, or rural than the
overall population, so the potential for bias in a web survey is clear. There is no easy way
to choose a random sample even from people with web access because there is no
technology that generates personal email addresses at random in the way that RDD
generates residential telephone numbers, and individuals may have several email
addresses. Even if such technology existed, etiquette and regulations aimed at spammers
would prevent mass emailing. For the present, web surveys work well only for restricted
populations, for example, surveying students at your university using the school’s list of
student email addresses. Here is an example of a successful web survey.
8.15 NPR Facebook Survey.
In 2010, National Public Radio (NPR) conducted a survey of preferences and habits of its Facebook fans by recruiting
respondents through messages posted on its Facebook page. The survey was conducted online and deployed July 12-19.
A total of 40,043 respondents began the survey, with 33,304 completing all questions. It was found that people accessed
NPR on the radio, at NPR.org, through iPhone apps, and several other platforms. Asked about time spent with NPR, about
20% of respondents indicated that they spent more than three hours per day, including radio listening.
(a)Here is what NPR says about the survey methodology: “Respondents were self-selected and the resulting sample is
non-random—therefore a margin of error cannot be calculated, and the survey results cannot be projected to any
population other than the sample itself.”17 Why can’t inference about any population be made?
(b)Suppose that people who spent more time with NPR were more likely to respond to the survey. Do you think the true
percentage of NPR’s Facebook fans who spend more than three hours with NPR is higher or lower than the 20% found
from the survey? Explain why.
8.16 More on Random Digit Dialing.
In the first half of 2013, about 38% of adults lived in households with a cell phone and no landline phone. Among adults
aged 25 to 29, this percent was about 65%, while among adults over 65, the percent was only 13%.18
(a)Write a survey question for which the opinions of adults with landline phones only are likely to differ from the opinions of
adults with cell phones only. Give the direction of the difference of opinion.
(b)For the survey question in part (a), suppose a survey was conducted using random digit dialing of landline phones only.
Would the results be biased? What would be the direction of bias?
(c)Most surveys now supplement the landline sample contacted by RDD with a second sample of respondents reached
through random dialing of cell phone numbers. The landline respondents are weighted to take account of household size and
number of telephone lines into the residence, whereas the cell phone respondents are weighted according to whether they
were reachable only by cell phone or also by landline. Explain why it is important to include both a landline sample and a cell
phone sample. Why is the number of telephone lines into the residence important? (Hint: How does the number of
telephone lines into the residence affect the chance of the household being included in the RDD sample?)
Download