Statistics, 1e

advertisement
Chapter 6
Probability Distributions

Learn ….
To analyze how likely it is that sample
results will be “close” to population
values
How probability provides the basis for
making statistical inferences
Agresti/Franklin Statistics, 1 of 139
Inferential Statistics

Use sample data to make decisions
and predictions about a population
Agresti/Franklin Statistics, 1e, 2 of 139
Section 6.1
How Can We Summarize
Possible Outcomes and Their
Probabilities?
Agresti/Franklin Statistics, 1e, 3 of 139
Randomness

The numerical values that a variable
assumes are the result of some
random phenomenon:
• Selecting a random sample for a
•
population
or
Performing a randomized experiment
Agresti/Franklin Statistics, 1e, 4 of 139
Random Variable

A random variable is a numerical
measurement of the outcome of a
random phenomenon.
Agresti/Franklin Statistics, 1e, 5 of 139
Random Variable

Use letters near the end of the alphabet,
such as x, to symbolize variables.

Use a capital letter, such as X, to refer to
the random variable itself.

Use a small letter, such as x, to refer to a
particular value of the variable.
Agresti/Franklin Statistics, 1e, 6 of 139
Probability Distribution

The probability distribution of a
random variable specifies its possible
values and their probabilities.
Agresti/Franklin Statistics, 1e, 7 of 139
Discrete Random Variable

The possible outcomes are a set of
separate numbers: (0, 1,2, …).
Agresti/Franklin Statistics, 1e, 8 of 139
Probability Distribution of a
Discrete Random Variable

A discrete random variable X takes a set of
separate values (such as 0,1,2,…)

Its probability distribution assigns a
probability P(x) to each possible value x:
• For each x, the probability P(x) falls between 0
•
and 1
The sum of the probabilities for all the
possible x values equals 1
Agresti/Franklin Statistics, 1e, 9 of 139
Example: How many Home Runs
Will the Red Sox Hit in a Game?

What is the estimated probability of at least
three home runs?
Agresti/Franklin Statistics, 1e, 10 of 139
Example: How many Home Runs
Will the Red Sox Hit in a Game?
Agresti/Franklin Statistics, 1e, 11 of 139
Parameters of a Probability
Distribution

Parameters: numerical summaries of
a probability distribution.
Agresti/Franklin Statistics, 1e, 12 of 139
The Mean of a Probability
Distribution

The mean of a probability distribution
is denoted by the parameter, µ.
Agresti/Franklin Statistics, 1e, 13 of 139
The Mean of a Discrete
Probability Distribution

The mean of a probability distribution for a
discrete random variable is
   x  p(x)
where the sum is taken over all possible
values of x.
Agresti/Franklin Statistics, 1e, 14 of 139
Expected Value of X

The mean of a probability distribution of a
random variable X is also called the
expected value of X.

The expected value reflects not what we’ll
observe in a single observation, but rather
that we expect for the average in a long run
of observations.
Agresti/Franklin Statistics, 1e, 15 of 139
Example: What’s the Expected Number of
Home Runs in a Baseball Game?

Find the mean of this probability
distribution.
Agresti/Franklin Statistics, 1e, 16 of 139
Example: What’s the Expected Number of
Home Runs in a Baseball Game?

The mean:
   x  p(x)
= 0(0.23) + 1(0.38) + 2(0.22) + 3(0.13) +
4(0.03) + 5(0.01) = 1.38
Agresti/Franklin Statistics, 1e, 17 of 139
The Standard Deviation of a
Probability Distribution

The standard deviation of a
probability distribution, denoted by
the parameter, σ, measures its
spread.

Larger values of σ correspond to
greater spread.
Agresti/Franklin Statistics, 1e, 18 of 139
Continuous Random Variable


A continuous random variable has an
infinite continuum of possible values
in an interval.
Examples are: time, age and size
measures such as height and weight.
Agresti/Franklin Statistics, 1e, 19 of 139
Probability Distribution of a
Continuous Random Variable




A continuous random variable has
possible values that from an interval.
Its probability distribution is specified by
a curve.
Each interval has probability between 0
and 1.
The interval containing all possible values
has probability equal to 1.
Agresti/Franklin Statistics, 1e, 20 of 139
Continuous Variables are Measured
in a Discrete Manner because of
Rounding.
Agresti/Franklin Statistics, 1e, 21 of 139
Which Wager do You Prefer?

You are given $100 and told that you must pick
one of two wagers, for an outcome based on
flipping a coin:
A. You win $200 if it comes up heads and
lose $50 if it comes up tails.
B. You win $350 if it comes up head and lose
your original $100 if it comes up tails.

Without doing any calculation, which wager
would you prefer?
Agresti/Franklin Statistics, 1e, 22 of 139
You win $200 if it comes up heads and
lose $50 if it comes up tails.
Find the expected outcome for this
wager.
a.
b.
c.
d.
$100
$25
$50
$75
Agresti/Franklin Statistics, 1e, 23 of 139
You win $350 if it comes up head and lose
your original $100 if it comes up tails.
Find the expected outcome for this
wager.
a.
b.
c.
d.
$100
$125
$350
$275
Agresti/Franklin Statistics, 1e, 24 of 139
Section 6.2
How Can We Find Probabilities for
Bell-Shaped Distributions?
Agresti/Franklin Statistics, 1e, 25 of 139
Normal Distribution


The normal distribution is symmetric,
bell-shaped and characterized by its mean
µ and standard deviation σ.
The probability of falling within any
particular number of standard deviations
of µ is the same for all normal
distributions.
Agresti/Franklin Statistics, 1e, 26 of 139
Normal Distribution
Agresti/Franklin Statistics, 1e, 27 of 139
Z-Score

Recall: The z-score for an observation
is the number of standard deviations
that it falls from the mean.
Agresti/Franklin Statistics, 1e, 28 of 139
Z-Score

For each fixed number z, the probability
within z standard deviations of the mean
is the area under the normal curve
between
 - z and   z
Agresti/Franklin Statistics, 1e, 29 of 139
Z-Score

For z = 1:
68% of the area (probability) of a normal
distribution falls between:
 - 1 and   1
Agresti/Franklin Statistics, 1e, 30 of 139
Z-Score

For z = 2:
95% of the area (probability) of a normal
distribution falls between:
 - 2 and   2
Agresti/Franklin Statistics, 1e, 31 of 139
Z-Score

For z = 3:
Nearly 100% of the area (probability) of a normal
distribution falls between:
 - 3 and   3
Agresti/Franklin Statistics, 1e, 32 of 139
The Normal Distribution: The
Most Important One in Statistics

It’s important because…
• Many variables have approximate normal
•
•
distributions.
It’s used to approximate many discrete
distributions.
Many statistical methods use the normal
distribution even when the data are not
bell-shaped.
Agresti/Franklin Statistics, 1e, 33 of 139
Finding Normal Probabilities for
Various Z-values

Suppose we wish to find the
probability within, say, 1.43 standard
deviations of µ.
Agresti/Franklin Statistics, 1e, 34 of 139
Z-Scores and the Standard
Normal Distribution

When a random variable has a normal
distribution and its values are converted
to z-scores by subtracting the mean and
dividing by the standard deviation, the
z-scores have the standard normal
distribution.
Agresti/Franklin Statistics, 1e, 35 of 139
Example: Find the probability within
1.43 standard deviations of µ
Agresti/Franklin Statistics, 1e, 36 of 139
Example: Find the probability within
1.43 standard deviations of µ

Probability below 1.43σ = .9236

Probability above 1.43σ = .0764

By symmetry, probability below
-1.43σ = .0764

Total probability under the curve = 1
Agresti/Franklin Statistics, 1e, 37 of 139
Example: Find the probability within
1.43 standard deviations of µ
Agresti/Franklin Statistics, 1e, 38 of 139
Example: Find the probability within
1.43 standard deviations of µ

The probability falling within 1.43
standard deviations of the mean
equals:
1 – 0.1528 = 0.8472, about 85%
Agresti/Franklin Statistics, 1e, 39 of 139
How Can We Find the Value of z for a
Certain Cumulative Probability?

Example: Find the value of z for a
cumulative probability of 0.025.
Agresti/Franklin Statistics, 1e, 40 of 139
Example: Find the Value of z For a
Cumulative Probability of 0.025



Look up the cumulative probability of
0.025 in the body of Table A.
A cumulative probability of 0.025
corresponds to z = -1.96.
So, a probability of 0.025 lies below
µ - 1.96σ.
Agresti/Franklin Statistics, 1e, 41 of 139
Example: Find the Value of z For a
Cumulative Probability of 0.025
Agresti/Franklin Statistics, 1e, 42 of 139
Example: What IQ Do You Need
to Get Into Mensa?

Mensa is a society of high-IQ people
whose members have a score on an
IQ test at the 98th percentile or higher.
Agresti/Franklin Statistics, 1e, 43 of 139
Example: What IQ Do You Need
to Get Into Mensa?

How many standard deviations above
the mean is the 98th percentile?
• The cumulative probability of 0.980 in the
body of Table A corresponds to z = 2.05.
• The 98th percentile is 2.05 standard
deviations above µ.
Agresti/Franklin Statistics, 1e, 44 of 139
Example: What IQ Do You Need
to Get Into Mensa?

What is the IQ for that percentile?
• Since µ = 100 and σ 16, the 98th percentile
of IQ equals:
µ + 2.05σ = 100 + 2.05(16) = 133
Agresti/Franklin Statistics, 1e, 45 of 139
Z-Score for a Value of
a Random Variable

The z-score for a value of a random variable
is the number of standard deviations that x
falls from the mean µ.

It is calculated as:
z
x-

Agresti/Franklin Statistics, 1e, 46 of 139
Example: Finding Your Relative
Standing on The SAT

Scores on the verbal or math portion of
the SAT are approximately normally
distributed with mean µ = 500 and
standard deviation σ = 100. The scores
range from 200 to 800.
Agresti/Franklin Statistics, 1e, 47 of 139
Example: Finding Your Relative
Standing on The SAT

If one of your SAT scores was x = 650,
how many standard deviations from
the mean was it?
Agresti/Franklin Statistics, 1e, 48 of 139
Example: Finding Your Relative
Standing on The SAT

Find the z-score for x = 650.
x-
650 - 500
z

 1.50

100
Agresti/Franklin Statistics, 1e, 49 of 139
Example: Finding Your Relative
Standing on The SAT

What percentage of SAT scores was
higher than yours?
• Find the cumulative probability for the z•
score of 1.50 from Table A.
The cumulative probability is 0.9332.
Agresti/Franklin Statistics, 1e, 50 of 139
Example: Finding Your Relative
Standing on The SAT



The cumulative probability below 650
is 0.9332.
The probability above 650 is
1 – 0.9332 = 0.0668
About 6.7% of SAT scores are higher
than yours.
Agresti/Franklin Statistics, 1e, 51 of 139
Example: What Proportion of
Students Get A Grade of B?



On the midterm exam in introductory
statistics, an instructor always give a
grade of B to students who score between
80 and 90.
One year, the scores on the exam have
approximately a normal distribution with
mean 83 and standard deviation 5.
About what proportion of students
get a B?
Agresti/Franklin Statistics, 1e, 52 of 139
Example: What Proportion of
Students Get A Grade of B?

Calculate the z-score for 80 and for 90:
z
z
x-

x-

90 - 83

 1.40
5
80 - 83

 - 0.60
5
Agresti/Franklin Statistics, 1e, 53 of 139
Example: What Proportion of
Students Get A Grade of B?

Look up the cumulative probabilities in
Table A.
• For z = 1.40, cum. Prob. = 0.9192
• For z = -0.60, cum. Prob. = 0.2743

It follows that about 0.9192 – 0.2743 =
0.6449, or about 64% of the exam scores
were in the ‘B’ range.
Agresti/Franklin Statistics, 1e, 54 of 139
Using z-scores to Find
Normal Probabilities

If we’re given a value x and need to find a
probability, convert x to a z-score using:
z


x-

Use a table of normal probabilities to get a
cumulative probability.
Convert it to the probability of interest.
Agresti/Franklin Statistics, 1e, 55 of 139
Using z-scores to Find
Random Variable x Values

If we’re given a probability and need
to find the value of x, convert the
probability to the related cumulative
probability.

Find the z-score using a normal table.

Evaluate x = zσ + µ.
Agresti/Franklin Statistics, 1e, 56 of 139
Example: How Can We Compare Test
Scores That Use Different Scales?



When you applied to college, you scored 650
on an SAT exam, which had mean µ = 500
and standard deviation σ = 100.
Your friend took the comparable ACT in
2001, scoring 30. That year, the ACT had µ =
21.0 and σ = 4.7.
How can we tell who did better?
Agresti/Franklin Statistics, 1e, 57 of 139
What is the z-score for your SAT
score of 650?
For the SAT scores: µ = 500 and σ = 100.
a.
2.15
b. 1.50
c.
-1.75
d. -1.25
Agresti/Franklin Statistics, 1e, 58 of 139
What percentage of students
scored higher than you?
a.
b.
c.
d.
10%
5%
2%
7%
Agresti/Franklin Statistics, 1e, 59 of 139
What is the z-score for your
friend’s ACT score of 30?
The ACT scores had a mean of 21 and a
standard deviation of 4.7.
a.
1.84
b. -1.56
c.
1.91
d. -2.24
Agresti/Franklin Statistics, 1e, 60 of 139
What percentage of students
scored higher than your friend?
a.
b.
c.
d.
3%
6%
10%
1%
Agresti/Franklin Statistics, 1e, 61 of 139
Standard Normal Distribution


The standard normal distribution is
the normal distribution with mean
µ = 0 and standard deviation σ = 1.
It is the distribution of normal
z-scores.
Agresti/Franklin Statistics, 1e, 62 of 139
Section 6.3
How Can We Find Probabilities When
Each Observation Has Two Possible
Outcomes?
Agresti/Franklin Statistics, 1e, 63 of 139
The Binomial Distribution

Each observation is binary: it has one of
two possible outcomes.

Examples:
•
•
•
Accept, or decline an offer from a bank for a
credit card.
Have, or have not, health insurance.
Vote yes or no in a referendum.
Agresti/Franklin Statistics, 1e, 64 of 139
Conditions for the Binomial
Distribution

Each of n trails has two possible outcomes:
“success” and “failure”.

Each trail has the same probability of
success, denoted by p.

The n trials are independent.

The binomial random variable X is the
number of successes in the n trials.
Agresti/Franklin Statistics, 1e, 65 of 139
Example: Finding Binomial
Probabilities for An ESP Experiment


John Doe claims to possess ESP.
An experiment is conducted:
•
•
•
•
A person in one room picks one of the integers 1,
2, 3, 4, 5 at random.
In another room, John Doe identifies the number
he believes was picked.
The experiment is done with three trials.
Doe got the correct answer twice.
Agresti/Franklin Statistics, 1e, 66 of 139
Example: Finding Binomial
Probabilities for An ESP Experiment

If John Doe does not actually have ESP and
is actually guessing the number, what is the
probability that he’d make a correct guess
on two of the three trials?
Agresti/Franklin Statistics, 1e, 67 of 139
Example: Finding Binomial
Probabilities for An ESP Experiment
Agresti/Franklin Statistics, 1e, 68 of 139
Example: Finding Binomial
Probabilities for An ESP Experiment

The three ways John Doe could make two
correct guesses in three trials are: SSF,
SFS, and FSS.

Each of these has probability:
(0.2)2(0.8)=0.032.

The total probability of two correct guesses
is 3(0.2)2(0.8)=0.096.
Agresti/Franklin Statistics, 1e, 69 of 139
Probabilities for a Binomial
Distribution

Denote the probability of success on a
trial by p.

For n independent trials, the probability
of x successes equals:
n!
x
n x)
p(x) 
p (1  p) , x  0,1,2,...,n
x!(n - x)!
Agresti/Franklin Statistics, 1e, 70 of 139
Example: Using the Binomial Formula in ESP
Experiment

The probability of exactly 2 correct guesses is
the binomial probability with n = 3 trials, x = 2
correct guesses and p = 0.2 probability of a
correct guess.
3!
p(2) 
(0.2) 2 (0.8)1  3(0.04)(0. 8)  0.096
2!1!
Agresti/Franklin Statistics, 1e, 71 of 139
Example: Are Women Passed
over for Managerial Training?

Example: Presence of bias in
promotion.
• Large supermarket in Florida.
• Group of women claimed that female
employees were passed over for
management training.
Agresti/Franklin Statistics, 1e, 72 of 139
Example: Are Women Passed
over for Managerial Training?



Large employee pool of more than 1000
people.
Half the employees are male; half are
female.
None of the 10 employees chosen for
management training were female.
Agresti/Franklin Statistics, 1e, 73 of 139
Example: Are Women Passed
over for Managerial Training?

How can we investigate statistically
the women’s assertion of gender
bias?
Agresti/Franklin Statistics, 1e, 74 of 139
Example: Are Women Passed
over for Managerial Training?

If the employees are selected
randomly in terms of gender, about
half of the employees picked should
be females and about half should be
males.
Agresti/Franklin Statistics, 1e, 75 of 139
Example: Are Women Passed
over for Managerial Training?

Due to ordinary sampling variation, it
need not happen that exactly 50 % of
those selected are females.
Agresti/Franklin Statistics, 1e, 76 of 139
Example: Are Women Passed
over for Managerial Training?

If employees were actually selected at
random for the training, what are the
chances that none of the 10
employees selected were females?
Agresti/Franklin Statistics, 1e, 77 of 139
Example: Are Women Passed
over for Managerial Training?

The probability that no females are
chosen equals:
10!
0
10
p(0) 
(0.50) (0.50)  0.001
0!10!
Agresti/Franklin Statistics, 1e, 78 of 139
Example: Are Women Passed
over for Managerial Training?

It is very unlikely (one chance in a
thousand) that none of the 10
selected for management training
would be female.
Agresti/Franklin Statistics, 1e, 79 of 139
Example: Are Women Passed
over for Managerial Training?
Agresti/Franklin Statistics, 1e, 80 of 139
Do the Binomial Conditions
Apply?

Before you use the binomial
distribution, check that its three
conditions apply:
• Binary data (success or failure).
• The same probability of success for each
•
trial (denoted by p).
Independent trials.
Agresti/Franklin Statistics, 1e, 81 of 139
Do the Binomial Conditions Apply to
the Managerial Training Example?



The data are binary.
If employees are selected randomly, the
probability of selecting a female on a
given trial is 0.50.
With random sampling from a large
population, outcomes from trials are
independent.
Agresti/Franklin Statistics, 1e, 82 of 139
Binomial Mean and Standard
Deviation

The binomial probability distribution for n
trials with probability p of success on
each trial has mean µ and standard
deviation σ given by:
  np,   np(1- p)
Agresti/Franklin Statistics, 1e, 83 of 139
Example: How Can We Check for
Racial Profiling?


Study conducted by the American Civil
Liberties Union.
Study analyzed whether African-American
drivers were more likely than other in the
population to be targeted by police for
traffic stops.
Agresti/Franklin Statistics, 1e, 84 of 139
Example: How Can We Check for
Racial Profiling?

Data:
• 262 police car stops in Philadelphia in
•
•
1997.
207 of the drivers stopped were AfricanAmerican.
In 1997, Philadelphia’s population was
42.2% African-American.
Agresti/Franklin Statistics, 1e, 85 of 139
Example: How Can We Check for
Racial Profiling?

Does the number of African-Americans
stopped suggest possible bias, being
higher than we would expect (other things
being equal, such as the rate of violating
traffic laws)?
Agresti/Franklin Statistics, 1e, 86 of 139
Example: How Can We Check for
Racial Profiling?

Assume:
• 262 car stops represent n = 262 trials.
• Successive police car stops are
•
independent.
P(driver is African-American) is p = 0.422.
Agresti/Franklin Statistics, 1e, 87 of 139
Example: How Can We Check for
Racial Profiling?

Calculate the mean and standard
deviation of this binomial distribution:
  np,   np(1- p)
Agresti/Franklin Statistics, 1e, 88 of 139
Example: How Can We Check for
Racial Profiling?
  262(0.422)  111
  262(0.422)(0.578)  8
Agresti/Franklin Statistics, 1e, 89 of 139
Example: How Can We Check for
Racial Profiling?

Recall: Empirical Rule
• When a distribution is bell-shaped, about
100% of it falls within 3 standard
deviations of the mean.
Agresti/Franklin Statistics, 1e, 90 of 139
Example: How Can We Check for
Racial Profiling?
u - 3  111 - 3(8)  87
  3  111  3(8)  135
Agresti/Franklin Statistics, 1e, 91 of 139
Example: How Can We Check for
Racial Profiling?

If no racial profiling is happening, we would not
be surprised if between about 87 and 135 of the
262 people stopped were African-American.

The actual number stopped (207) is well above
these values.

The number of African-American stopped is too
high, even taking into account random variation.
Agresti/Franklin Statistics, 1e, 92 of 139
Example: How Can We Check for
Racial Profiling?

Limitation of the analysis:
• Different people do different
amounts of driving, so we don’t
really know that 42.2% of the
potential stops were AfricanAmerican.
Agresti/Franklin Statistics, 1e, 93 of 139
When Is the Binomial
Distribution Dell Shaped?

The binomial distribution has close to
a symmetric, bell shape when the
expected number of successes, np,
and the expected number of failures,
n(1-p) are both at least 15.
Agresti/Franklin Statistics, 1e, 94 of 139
Section 6.4

How Likely Are the Possible
Values of a Statistic?

The Sampling Distribution
Agresti/Franklin Statistics, 1e, 95 of 139
Statistic

Recall: A statistic is a numerical
summary of sample data, such as a
sample proportion or a sample mean.
Agresti/Franklin Statistics, 1e, 96 of 139
Parameter

Recall: A parameter is a numerical
summary of a population, such as a
population proportion or a population
mean.
Agresti/Franklin Statistics, 1e, 97 of 139
Statistics and Parameters



In practice, we seldom know the values
of parameters.
Parameters are estimated using
sample data.
We use statistics to estimate
parameters.
Agresti/Franklin Statistics, 1e, 98 of 139
Example: 2003 California Recall
Election

Prior to counting the votes, the
proportion in favor of recalling
Governor Gray Davis was an
unknown parameter.

An exit poll of 3160 voters reported
that the sample proportion in favor of
a recall was 0.54.
Agresti/Franklin Statistics, 1e, 99 of 139
Example: 2003 California Recall
Election

If a different random sample of about
3000 voters were selected, a different
sample proportion would occur.
Agresti/Franklin Statistics, 1e, 100 of 139
Example: 2003 California Recall
Election


Imagine all the distinct samples of
3000 voters you could possibly get.
Each such sample has a value for the
sample proportion.
Agresti/Franklin Statistics, 1e, 101 of 139
Statistics and Parameters


How do we know that a sample
statistic is a good estimate of a
population parameter?
To answer this, we need to look at a
probability distribution called the
sampling distribution.
Agresti/Franklin Statistics, 1e, 102 of 139
Sampling Distribution

The sampling distribution of a
statistic is the probability distribution
that specifies probabilities for the
possible values the statistic can take.
Agresti/Franklin Statistics, 1e, 103 of 139
The Sampling Distribution of the
Sample Proportion




Look at each possible sample.
Find the sample proportion for each
sample.
Construct the frequency distribution of
the sample proportion values.
This frequency distribution is the
sampling distribution of the sample
proportion.
Agresti/Franklin Statistics, 1e, 104 of 139
Example: Sampling Distribution

Which Brand of Pizza Do You Prefer?
• Two Choices: A or D.
• Assume that half of the population prefers
•
Brand A and half prefers Random D.
Take a random sample of n = 3 tasters.
Agresti/Franklin Statistics, 1e, 105 of 139
Example: Sampling Distribution
Sample
No. Prefer
Pizza A
Proportion
(A,A,A)
3
1
(A,A,D)
2
2/3
(A,D,A)
2
2/3
(D,A,A)
2
2/3
(A,D,D)
1
1/3
(D,A,D)
1
1/3
(D,D,A)
1
1/3
(D,D,D)
0
0
Agresti/Franklin Statistics, 1e, 106 of 139
Example: Sampling Distribution
Sample
Proportion
Probability
0
1/8
1/3
3/8
2/3
3/8
1
1/8
Agresti/Franklin Statistics, 1e, 107 of 139
Example: Sampling Distribution
Agresti/Franklin Statistics, 1e, 108 of 139
Mean and Standard Deviation of the
Sampling Distribution of a Proportion

For a binomial random variable with n trials and
probability p of success for each, the sampling
distribution of the proportion of successes has:
Mean  p and standard deviation 

p(1 - p)
n
To obtain these value, take the mean np and
standard deviation np(1  p) for the binomial
distribution of the number of successes and divide
by n.
Agresti/Franklin Statistics, 1e, 109 of 139
Example: 2003 California Recall
Election

Sample: Exit poll of 3160 voters.

Suppose that exactly 50% of the
population of all voters voted in favor
of the recall.
Agresti/Franklin Statistics, 1e, 110 of 139
Example: 2003 California Recall
Election

Describe the mean and standard deviation of
the sampling distribution of the number in the
sample who voted in favor of the recall.
• µ = np = 3160(0.50) = 1580
• 
np(1- p)  3160(0.50)(0.50)  28.1
Agresti/Franklin Statistics, 1e, 111 of 139
Example: 2003 California Recall
Election

Describe the mean and standard deviation of the
sampling distribution of the proportion in the
sample who voted in favor of the recall.
Mean  p  0.50
Standard Deviation 
p(1  p)
(0.50)(0.50)

 0.000079  0.0089
p
3160
Agresti/Franklin Statistics, 1e, 112 of 139
The Standard Error

To distinguish the standard deviation
of a sampling distribution from the
standard deviation of an ordinary
probability distribution, we refer to it
as a standard error.
Agresti/Franklin Statistics, 1e, 113 of 139
Example: 2003 California Recall
Election


If the population proportion supporting
recall was 0.50, would it have been
unlikely to observe the exit-poll sample
proportion of 0.54?
Based on your answer, would you be
willing to predict that Davis would be
recalled from office?
Agresti/Franklin Statistics, 1e, 114 of 139
Example: 2003 California Recall
Election

Fact: The sampling distribution of the
sample proportion has a bell-shape with a
mean µ = 0.50 and a standard deviation
σ = 0.0089.
Agresti/Franklin Statistics, 1e, 115 of 139
Example: 2003 California Recall
Election

Convert the sample proportion value of
0.54 to a z-score:
(0.54 - 0.50)
z
 4.5
0.0089
Agresti/Franklin Statistics, 1e, 116 of 139
Example: 2003 California Recall
Election
Agresti/Franklin Statistics, 1e, 117 of 139
Example: 2003 California Recall
Election


The sample proportion of 0.54 is more
than four standard errors from the
expected value of 0.50.
The sample proportion of 0.54 voting
for recall would be very unlikely if the
population support were p = 0.50.
Agresti/Franklin Statistics, 1e, 118 of 139
Example: 2003 California Recall
Election



A sample proportion of 0.54 would be
even more unlikely if the population
support were less than 0.50.
We there have strong evidence that the
population support was larger than 0.50.
The exit poll gives strong evidence that
Governor Davis would be recalled.
Agresti/Franklin Statistics, 1e, 119 of 139
Summary of the Sampling
Distribution of a Proportion

For a random sample of size n from a population
with proportion p, the sampling distribution of the
sample proportion has
p(1 - p)
Mean  p and standard error 
n

If n is sufficiently large such that the expected
numbers of outcomes of the two types, np and n(1p), are both at least 15, then this sampling
distribution has a bell-shape.
Agresti/Franklin Statistics, 1e, 120 of 139
Section 6.5
How Close Are Sample Means to
Population Means?
Agresti/Franklin Statistics, 1e, 121 of 139
The Sampling Distribution of the
Sample Mean



The sample mean, x, is a random
variable.
The sample mean varies from sample
to sample.
By contrast, the population mean, µ,
is a single fixed number.
Agresti/Franklin Statistics, 1e, 122 of 139
Mean and Standard Error of the
Sampling Distribution of the Sample
Mean

For a random sample of size n from a population
having mean µ and standard deviation σ, the
sampling distribution of the sample mean has:
•
•
Center described by the mean µ (the same as the
mean of the population).
Spread described by the standard error, which
equals the population standard deviation divided by
the square root of the sample size: 
n
Agresti/Franklin Statistics, 1e, 123 of 139
Example: How Much Do Mean
Sales Vary From Week to Week?

Daily sales at a pizza restaurant vary
from day to day.

The sales figures fluctuate around a
mean µ = $900 with a standard
deviation σ = $300.
Agresti/Franklin Statistics, 1e, 124 of 139
Example: How Much Do Mean
Sales Vary From Week to Week?



The mean sales for the seven days in a
week are computed each week.
The weekly means are plotted over time.
These weekly means form a sampling
distribution.
Agresti/Franklin Statistics, 1e, 125 of 139
Example: How Much Do Mean
Sales Vary From Week to Week?

What are the center and spread of the
sampling distribution?
  $900
300

 113
7
Agresti/Franklin Statistics, 1e, 126 of 139
Sampling Distribution vs.
Population Distribution
Agresti/Franklin Statistics, 1e, 127 of 139
Standard Error

Knowing how to find a standard error
gives us a mechanism for
understanding how much variability
to expect in sample statistics “just by
chance.”
Agresti/Franklin Statistics, 1e, 128 of 139
Standard Error

The standard error of the sample mean:

n


As the sample size n increases, the denominator
increase, so the standard error decreases.
With larger samples, the sample mean is more
likely to fall close to the population mean.
Agresti/Franklin Statistics, 1e, 129 of 139
Central Limit Theorem

Question: How does the sampling
distribution of the sample mean relate
with respect to shape, center, and
spread to the probability distribution
from which the samples were taken?
Agresti/Franklin Statistics, 1e, 130 of 139
Central Limit Theorem


For random sampling with a large
sample size n, the sampling
distribution of the sample mean is
approximately a normal distribution.
This result applies no matter what the
shape of the probability distribution
from which the samples are taken.
Agresti/Franklin Statistics, 1e, 131 of 139
Central Limit Theorem:
How Large a Sample?

The sampling distribution of the sample
mean takes more of a bell shape as the
random sample size n increases. The more
skewed the population distribution, the
larger n must be before the shape of the
sampling distribution is close to normal. In
practice, the sampling distribution is
usually close to normal when the sample
size n is at least about 30.
Agresti/Franklin Statistics, 1e, 132 of 139
A Normal Population Distribution
and the Sampling Distribution

If the population distribution is
approximately normal, then the
sampling distribution is
approximately normal for all sample
sizes.
Agresti/Franklin Statistics, 1e, 133 of 139
How Does the Central Limit Theorem
Help Us Make Inferences


For large n, the sampling distribution
is approximately normal even if the
population distribution is not.
This enables us to make inferences
about population means regardless of
the shape of the population
distribution.
Agresti/Franklin Statistics, 1e, 134 of 139
 Section 6.6
How Can We Make Inferences
About a Population?
Agresti/Franklin Statistics, 1e, 135 of 139
Three Distinct Types of
Distributions

Population Distribution

Data Distribution

Sampling Distribution
Agresti/Franklin Statistics, 1e, 136 of 139
Population Distribution

This is the probability distribution
from which we take the sample.

Value of its parameters, such as the
population proportion p and the
population mean µ are usually
unknown.
Agresti/Franklin Statistics, 1e, 137 of 139
Data Distribution

This is the distribution of the sample data.

It’s described by sample statistics, such as a
sample proportion or a sample mean.

With random sampling, the large the sample size
n, the more closely it resembles the population
distribution.

With larger n, the higher the probability that a
sample statistic falls close to the population
parameter.
Agresti/Franklin Statistics, 1e, 138 of 139
Sampling Distribution



This is the probability distribution of a
sample statistic, such as a sample
proportion or sample mean.
It provides the key for telling us how
close a sample statistic falls to the
unknown parameter.
For large n, the sampling distribution is
approximately a normal distribution.
Agresti/Franklin Statistics, 1e, 139 of 139
Download