Example 2.2: Fake Cell Phone Calls

advertisement
SECTION 2.2
STATISTICAL INFERENCE
FROM SAMPLE TO
POPULATION
BIG IDEA OF THE DAY
Chapter 1 sampling from a process
Observing a small number of attempts from an
infinite number of possible attempts.
 Buzz and Doris do the experiment forever.

Chapter 2 sampling from a finite population
A limited (finite) number of individuals
 Sample a portion from the finite population

How does this affect the chance model?
EXAMPLE 2.2: FAKE CELL PHONE CALLS
Have you ever pretended to be talking on a cell
phone to avoid interacting with people around
you?
 Pew Research Center surveyed a random sample
of 1,858 American cell phone users and found
13% admitted to faking cell phone call in the past
30 days.

iphone App
Fake-A-Call ™
By Excelltech Inc.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
What are the:
Observational units?
Variable of interest?
Population?
Sample?
Parameter of interest?
Statistic for this study?
EXAMPLE 2.2: FAKE CELL PHONE CALLS
Does this survey convince you that more than 1
in 10 cell phone users in the U.S. has engaged in
such fake cell phone use in the past 30 days?
 If not, what could be another explanation for the
survey results?

EXAMPLE 2.2: FAKE CELL PHONE CALLS
The sample result (13%) is greater than 1 in 10
(or 10%).
 It’s possible that 10% of the population of cell
phone users have faked a call and the
researchers just happened to have a higher
percentage by the luck of the draw.
 How plausible (believable) is this explanation for
the higher sample percentage?

EXAMPLE 2.2: FAKE CELL PHONE CALLS

Research Question: Do more than 1 in 10 cell
phone users in the U.S. admit to engaging in fake
cell phone use in the past 30 days?
What is the Null Hypothesis?

10% of all American cell phone users admit to faking
a call in the past 30 days
What is the Alternative Hypothesis?

More than 10% of all American cell phone users
admit to faking a call in the past 30 days
EXAMPLE 2.2: FAKE CELL PHONE CALLS
Notice that the null and alternative hypotheses
are statements about the unknown parameter.
 We don’t know what proportion of all cell phone
users is. (We don’t know the parameter.)
 We only have information from 1858 cell phone
users. (We know the statistic.)
 Do the 1858 responders give us meaningful
information about this proportion for the entire
population of all cell phone users (the
parameter)?

EXAMPLE 2.2: FAKE CELL PHONE CALLS
As before, assume the null hypothesis true
 If the observed statistic is unlikely to have
happened by chance alone, evidence against the
null hypothesis in favor of the alternative
hypothesis.


The p-value assesses the probability we would
get a sample proportion as large as 0.13 if in fact
the proportion of all cell phone users faking calls
is only 0.10.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
 If
the null hypothesis is true, what is the
probability that the first person selected
will admit to faking a call?

This probability is equal to the parameter (the
proportion of all cell phone users who admit to
faking a call), under the null hypothesis, which
is 0.10.
WHAT IS THE IMPACT OF SAMPLING FROM A
FINITE POPULATION RATHER THAN A PROCESS?
 If
the first person admits to faking a call, what
is the probability that the second person
selected also admits faking a call?
Sampling “without replacement.”
 Once someone is selected, they aren’t replaced
before the next person is selected.
 If there were 255 million cell phone users when the
first person is selected, then there are 254,999,999
cell phone users when the second person is selected.
 The probability that the second person selected also
feels this way: 25,499,999/254,999,999 ≈
0.099999996.

EXAMPLE 2.2: FAKE CELL PHONE CALLS
When the population size is large (more than
10 or 20 times the size of the sample), still
consider random sampling from a finite
population to be equivalent to random
sampling from a process.
Is it safe to use a model of random sampling for
sampling from a finite population in this study?
 Model the probability of success as the same for
each observational unit in our sample.
 Under the null hypothesis, every person selected
has a 10% chance of admitting to faking cell
phone calls in the past 30 days.

EXAMPLE 2.2: FAKE CELL PHONE CALLS
 We
can conduct the same type of
simulation we used in chapter 1 on
processes for sampling from a population.
 Let’s go to an applet and try this.
 Remember we are testing to see if the
population proportion of cell phone fakers
is more than 10% and the results of our
poll showed that 13% of 1858 respondents
admitted faking.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
 We
have convincing evidence that the
sample proportion of 0.13 didn’t just
“happen by chance.”
 Thus we have very strong evidence that
the population proportion of cell phone
users who will admit faking a call is
larger than 0.10.
SUMMARY
Null Hypothesis: 10% of all cell phone users
admit to faking a call in the past 30 days
 Alternative Hypothesis: More than 10% of cell
phone users admit to taking a call in the past 30
days



We simulate and find a
tiny p-value (≈ 0).
Thus we have very strong evidence that the
population proportion of cell phone users who
will admit faking a call is larger than 0.10.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
Random Sampling Error
 Still possible that the researchers were
unlucky.
 However, the probability of this is so low,
a more believable explanation is that the
population proportion does indeed exceed
0.10.
 Random sampling also allows us to
estimate how much “random sampling
error” we expect.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
The sampling error is roughly
1
𝑛
where n is
the number of observational units (sample
size) in the sample.
 Therefore, the sampling error in our poll
1
is about
≈ 0.023 = 2.3%. Therefore,
1858
we can expect our sample percentage to be
within 2.3% of the population proportion.
 The 10% we were testing is outside of this
range.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
The sampling error is roughly
1
𝑛
where n is the
number of observational units in the sample.
 What if we only sample 500 people?
 The sampling error in this smaller poll is about
1
≈ 0.045 = 4.5%. Therefore, we can expect our
500
sample percentage to be within 4.5% of the
population proportion.
 The 10% we were testing is NOT outside of this
range.
EXAMPLE 2.2: FAKE CELL PHONE CALLS
We reduce variability (or random sampling error)
with larger sample sizes.
 Simple random sampling gives a lot of
predictability in sample proportions
 This predictability depends strongly on the
sample size, not on the population size as long as
the population size is large.
 There is also the possibility of nonsampling error.
We will talk about this in the next section.

 Let’s
work on Exploration 2.2:
Gettysburg Address Revisited
SECTION 2.3: NONSAMPLING ERRORS
Simple random sampling is an unbiased way to
take a sample, but we did see that there still
could be random sampling error.
 We can calculate random sampling error.
 However, other things can go wrong when you
use a sample to infer something about a
population. These other things are lumped
together in what we call nonsampling errors.

EXAMPLE 2.3: THE BRADLEY EFFECT
In 1982, Tom Bradley (D) ran against George
Deukmejian (R). (Duke-may-jon)
 Polls showed that Bradley had a significant lead
on Deukmejian shortly before the election as well
as in exit polls. However, Deukmejian narrowly
defeated Bradley.

THE BRADLEY EFFECT
After the election, research suggested that a
smaller percentage of white voters had voted for
Bradley than polls had predicted and a very large
proportion of voters who, in the polls, claimed to
be undecided, had voted for Deukmejian.
 Some people may answer polling questions in the
way they think the interviewer wants them to
answer—the politically correct way.
 Some argue that that is what happened in this
election, a number of white voters said they
would vote for Bradley to an interviewer, but in
the anonymity of the voting booth, voted for the
Deukmejian candidate

OBAMA VS CLINTON:
NEW HAMPSHIRE 2008
The same sort of thing happened in the New
Hampshire Democratic Presidential Primary in
2008.
 Polls showed Barak Obama with a significant
lead over Hillary Clinton. (41% to 28% with a
sample size of 778.)
 Clinton won that election with 39% of the vote
compared to Obama’s 36%.

SOME MORE DETAILS
The poll used random digit dialing to get
respondents for the poll.
 Only 9% of those whose phone numbers were
chosen actually responded to the question.
Others either didn’t answer their phone or
refused to answer the question.

OUR MODEL MAKES THE
FOLLOWING ASSUMPTIONS
1.
2.
3.
4.
Random digit dialing is a reasonable way to get
a sample of likely voters.
The 9% of individuals reached by phone who
agree to participate are like the 91% who didn’t.
Voters who said they plan to vote in the
upcoming Democratic primary will vote in the
upcoming primary.
Respondents answers to who they say they will
vote for, matches who they actually vote for in
the primary.
Assumption #1. Random digit dialing is a
reasonable way to get a sample of likely
voters
 Random digit dialing is roughly equivalent to a
simple random sample of all New Hampshire
residents who have a landline or cell phone,
except for slightly over-representing individuals
who have more than one phone. Random digit
dialing is a common survey technique in cases
where a sampling frame (list of all members of
the population) is unavailable.
Assumption #2. The 9% of individuals reached
by phone who agree to participate are like
the 91% who didn’t
 The assumption is that the respondents are like
the non-respondents. Although the response rate
was very low, it is line with many polls and other
surveys conducted by phone. So, though it is
possible for non-respondents to be the cause of
the bias observed, many other political surveys
conducted around the same time had similar
response rates, but no bias. Of course, there is no
guarantee that the 9% are representative.
Assumption #3 Voters who said they plan to
vote in the upcoming Democratic primary
will vote in the upcoming primary
 It is typical to ask voters whether they plan to
vote in the upcoming election/primary. But, there
is no guarantee that they actually will.
Assumption #4 Respondents answers to who
they say they will vote for matches who they
actually vote for in the primary.
 There is no guarantee that people won’t do
something different in the voting booth than they
say they will do when on the phone.
 They could just change their mind or they could
not be honest with the polling interviewer.
The American Association for Public Opinion
Research conducted an independent
investigation and concluded the following were
among the most likely explanations for the
discrepancies:
 People changed their opinion about who they were
voting for at the last minute. (Assumption #4)
 People in favor of Hillary Clinton were more likely to
be non-respondents. (Assumption #2)
 Social desirability based on the race of the
interviewer. (Assumption #4).


Black telephone interviewers were more likely to generate
respondents who were in favor of Obama than were white
interviewers. This is an example of the Bradley effect.
Clinton was listed before Obama on every ballot
(Assumption #4)
If assumption 4 is not valid, as seemed to have
been plausible here, then even a pure random
sample would have still exhibited this
discrepancy in the results.
 Simple random samples should produce a
representative sample, but do nothing to control
for the action of the individuals in the sample.
 Having respondents change their minds or
misrepresent their answers are examples of
nonsampling errors, reasons why the statistic
may not be close to the parameter that are
separate from sampling errors.

DO YOU AGREE WITH THIS STATEMENT?

“Quality of life lies in knowledge, in culture.
Values are what constitute true quality of life,
the supreme quality of life, even above food,
shelter and clothing.”
Thomas Jefferson
HOW ABOUT THIS ONE?

“Quality of life lies in knowledge, in culture.
Values are what constitute true quality of life,
the supreme quality of life, even above food,
shelter and clothing.”
Fidel Castro
EXPLORATION 2.3
Let’s work on Exploration 2.3.
 I won’t be giving you a survey like it says in the
exploration. We will just discuss them.

Download