Means and and Proportions Proportions as Random as Random

advertisement
Chapter 9
Means and
Proportions
as Random
Variables
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.1 Understandingg Dissimilarityy
Among Samples
K
Key:
Needd to understand
d
d what
h ki
kindd off dissimilarity
di i il i
we should expect to see in various samples from the
same population
population.
• Suppose knew most samples were likely to provide an
answer that
th t is
i within
ithi 10% off the
th population
l ti answer.
• Then would also know the population answer should
be within 10% of whatever our specific sample gave.
gave
• => Have a good guess about the population value
based on jjust the sample
p value.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
2
Statistics and Parameters
A statistic is a numerical value computed
p
from a
sample. Its value may differ for different samples.
e.g. sample mean x , sample standard deviation s,
and sample proportion p̂.
A parameter is a numerical value associated with
a population. Considered fixed and unchanging.
e g population mean μ, population standard
e.g.
deviation σ, and population proportion p.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
3
Sampling Distributions
Each new sample taken =>
sample
l statistic
i i will
ill change.
h
The distribution of possible values of a statistic
for repeated samples of the same size from a
population is called the sampling distribution
of the statistic.
Many statistics of interest have sampling distributions
that are approximately normal distributions
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
4
Example 9.1 Mean Hours of Sleep
f C
for
College
ll
Students
St d t
Survey of n = 190 college students.
“How many hours of sleep did you get last night?”
Sample mean = 7.1 hours.
If we repeatedly took
samples of 190 and each
time computed the sample
mean, the histogram of the
resultingg sample
p mean
values would look like the
histogram at the right:
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
5
9.2 Sampling
p g Distributions
for Sample Proportions
• Suppose (unknown to us) 40% of a population
carry the gene for a disease, (p = 0.40).
• We
W will
ill take
t k a random
d
sample
l off 25 people
l from
f
this population and count X = number with gene.
• Although we expect (on average) to find 10 people
(40%) with the gene, we know the number will
vary for different samples of n = 25.
• In this case, X is a binomial random variable
with n = 25 and p = 0.4.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
6
Many Possible Samples
Four possible random samples of 25 people:
Sample
p 1: X =12,, pproportion
p
with ggene =12/25 = 0.48 or 48%.
Sample 2: X = 9, proportion with gene = 9/25 = 0.36 or 36%.
Sample 3: X = 10, proportion with gene = 10/25 = 0.40 or 40%.
Sample 4: X = 7, proportion with gene = 7/25 = 0.28 or 28%.
Note:
• Each sample gave a different answer, which did not
always
y match the population
p p
value of 40%.
• Although we cannot determine whether one sample will
accurately reflect the population, statisticians have
d t
determined
i d what
h t to
t expect for
f mostt possible
ibl samples.
l
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
7
The Normal Curve Approximation
Rule for Sample Proportions
L t p = population
Let
l ti proportion
ti off interest
i t
t
or binomial probability of success.
p̂ = sample
Let p
p pproportion
p
or pproportion
p
of successes.
If numerous random samples or repetitions of the same size n
are taken, the distribution of possible values of p̂ is
approximately a normal curve distribution with
• Mean = p
p (1 − p )
p̂ ) =
• Standard
St d d d
deviation
i ti = s.d.(
d(p
n
This approximate distribution is sampling distribution of p̂ .
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
8
The Normal Curve Approximation
Rule for Sample Proportions
Normal Approximation Rule can be applied in two situations:
Situation 1: A random sample is taken from a population.
Situation 2: A binomial experiment is repeated numerous times.
In each situation, three conditions must be met:
Condition 1: The Physical Situation
Th is
There
i an actuall population
l i or repeatable
bl situation.
i
i
Condition 2: Data Collection
A random sample is obtained or situation repeated many times.
times
Condition 3: The Size of the Sample or Number of Trials
The size of the sample
p or number of repetitions
p
is relatively
y large,
g ,
np and n(1-p) must be at least 5 and preferably at least 10.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
9
Examples for which Rule Applies
• Election Polls: to estimate proportion who favor a
candidate; units = all voters.
voters
• Television Ratings: to estimate proportion of
households watching TV program; units = all households
with TV.
• Consumer Preferences: to estimate proportion of
consumers who prefer new recipe compared with old;
units = all consumers.
• Testing ESP: to estimate probability a person can
successfully guess which of 5 symbols on a hidden card;
repeatable situation = a guess.
guess
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
10
Example 9.2 Possible Sample Proportions
F
Favoring
i a Candidate
C did t
Suppose 40% all voters favor Candidate X. Pollsters take a
sample of n = 2400 voters.
voters Rule states the sample proportion
who favor X will have approximately a normal distribution with
mean = p = 0.4
0 4 and s.d.(
s d (p
p̂ ) =
p (1 − p )
n
=
0.4(1 − 0.4)
2400
= 0.01
Histogram at right
shows sample
proportions resulting
from simulatingg this
situation 400 times.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
11
Estimating the Population Proportion
f
from
a Single
Si l Sample
S
l Proportion
P
i
In practice, we don’t know the true population proportion p,
so we cannot compute the standard deviation of p̂ˆ ,
s.d.( p̂ ) =
p (1 − p )
n
.
In practice, we only take one random sample, so we only have
one sample proportion p̂ . Replacing p with p̂ in the standard
deviation expression gives us an estimate that is called the
standard error of p̂ .
s.e.( p̂ ) =
pˆ (1 − pˆ )
n
.
If p̂ = 0.39 and n = 2400, then the standard error is 0.01. So
the true proportion who support the candidate is almost surely
between 0.39 – 3(0.01) = 0.36 and 0.39 + 3(0.01) = 0.42.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
12
9.3 What to Expect
p of
Sample Means
• Suppose we want to estimate the mean weight loss
for all who attend clinic for 10 weeks. Suppose
(unknown to us) the distribution of weight loss is
approximately N(8 pounds, 5 pounds).
• We will take a random sample of 25 people from
this population and record for each X = weight loss.
• We know the value of the sample mean will vary
for different samples of n = 25.
• What
Wh t do
d we expectt those
th
means to
t be?
b ?
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
13
Many Possible Samples
Four possible random samples of 25 people:
Sample 1: Mean = 8.32
8 32 pounds,
pounds standard deviation = 4.74
4 74 pounds.
pounds
Sample 2: Mean = 6.76 pounds, standard deviation = 4.73 pounds.
Sample
p 3: Mean = 8.48 pounds,
p
, standard deviation = 5.27 ppounds.
Sample 4: Mean = 7.16 pounds, standard deviation = 5.93 pounds.
Note:
• Each sample gave a different answer, which did not always
match the population mean of 8 pounds.
• Although we cannot determine whether one sample mean will
accurately reflect the population mean, statisticians have
determined what to expect for most possible sample means.
means
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
14
The Normal Curve Approximation
Rule for Sample Means
Lett μ = mean for
L
f population
l ti off interest.
i t
t
Let σ = standard deviation for population of interest.
Let x = sample
p mean.
If numerous random samples of the same size n are taken, the
distribution of possible values of x is approximately a normal
curve distribution with
• Mean = μ
σ
• Standard
St d d d
deviation
i ti = s.d.(
d (x ) =
n
This approximate distribution is sampling distribution of x .
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
15
The Normal Curve Approximation
Rule for Sample Means
N
Normal
l Approximation
A
i ti Rule
R l can be
b applied
li d in
i two
t situations:
it ti
Situation 1: The population of measurements of interest is
bell shaped and a random sample of any size is measured.
bell-shaped
measured
Situation 2: The population of measurements of interest is not
bell shaped but a large random sample is measured.
bell-shaped
measured
Note: Difficult to get a Random Sample? Researchers usually
willing to use Rule as long as they have a representative sample
with no obvious sources of confounding or bias.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
16
Examples for which Rule Applies
• Average Weight Loss: to estimate average weight
loss;; weight
g assumed bell-shaped;
p ; population
p p
= all
current and potential clients.
• Average
g Age
g At Death: to estimate average
g age
g at
which left-handed adults (over 50) die; ages at death not
bell-shaped so need n ≥ 30; population = all left-handed
ppeople
p who live to be at least 50.
• Average Student Income: to estimate mean monthly
income of students at universityy who work; incomes not
bell-shaped and outliers likely, so need large random
sample of students; population = all students at university
who work.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
17
Example 9.4 Hypothetical Mean
W i ht Loss
Weight
L
Suppose the distribution of weight loss is approximately
N(8
( p
pounds,, 5 p
pounds)) and we will take a random sample
p
of n = 25 clients. Rule states the sample mean weight loss
will have a normal distribution with
σ
5
=
= 1 pound
mean = μ = 8 pounds and s.d.(
s d ( x) =
n
25
Histogram at right shows
sample
p means resulting
g
from simulating this
situation 400 times.
Empirical
E
i i l Rule:
R l
It is almost certain that
the sample mean will be
b
between
5 and
d 11 pounds.
d
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
18
Standard Error of the Mean
In practice, the population standard deviation σ is rarely
known, so we cannot compute the standard deviation of x ,
σ
s.d.( x ) =
.
n
In practice, we only take one random sample, so we only have
the sample mean x and the sample standard deviation s.
Replacing σ with s in the standard deviation expression gives
us an estimate that is called the standard error of x .
s
s.e.( x ) =
.
n
For a sample of n = 25 weight losses,
the standard deviation is s = 4.74 pounds.
So the standard error of the mean is 00.948
948 pounds
pounds.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
19
Increasing the Size of the Sample
Suppose we take n = 100 people instead of just 25.
The standard deviation of the mean would be
s.d.(
d (x) =
σ
n
=
5
= 0.5 pounds.
d
100
• For samples of n = 25,
sample means are likely to
range between 8 ± 3 pounds
=>
> 5 to 11 pounds.
pounds
• For samples of n = 100,
sample means are likely to
range only
l between
b t
8 ± 1.5
15
pounds => 6.5 to 9.5 pounds.
Larger
a ge samples
sa ples tend
e d too result
esu in more
o e accurate
accu ate es
estimates
es
of population values than smaller samples.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
20
Sampling for a Long, Long Time:
Th L
The
Law off L
Large N
Numbers
b
LLN: the
LLN
h sample
l mean x will
ill eventually
ll get
“close” to the population mean μ no matter how
small a difference you use to define “close
close.”
LLN = peace off mind
i d to casinos,
i
insurance
i
companies.
i
• Eventually, after enough gamblers or customers,
the mean net profit will be close to the theoretical mean.
mean
• Price to pay = must have enough $ on hand to pay the
occasional winner or claimant.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
21
9.4 What to Expect
p in Other
Situations: CLT
The Central Limit Theorem states that if n is
sufficiently large, the sample means of random
samples
l from
f
a population
l ti with
ith mean μ and
d
finite standard deviation σ are approximately
normally distributed with mean μ and
standard deviation σ n .
Technical
T
h i l Note:
N t
The mean and standard deviation given in the CLT hold
for any sample size; it is only the “approximately normal”
shape that requires n to be sufficiently large.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
22
Example 9.5 California Decco Winnings
California Decco lottery game: mean amount lost per
ticket over millions of tickets sold is μ = $0.35; standard
deviation σ = $29.67
$29 67 =>
> large variability in possible
amounts won/lost, from net win of $4999 to net loss of $1.
Suppose store sells 100,000
100 000 tickets in a year.
year CLT =>
>
distribution of possible sample mean loss per ticket
is approximately normal with …
mean (loss) = μ = $0.35and s.d.( x) =
σ
$29.67
=
= $0.09
n
100000
Empirical Rule: The mean loss is almost surely between
$0.08 and $0.62 => total loss for the 100,000 tickets is
likely between $8,000 to $62,000!
There are better ways to invest $100,000.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
23
9.6 Standardized Statistics
If conditions are met, these standardized
statistics have, approximately, a standard
( , )
normal distribution N(0,1).
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
24
Example 9.7 Unpopular TV Shows
Networks cancel shows with low ratings. Ratings based
on random sample of households, using the sample
proportion
ti pˆ watching
t hi show
h as estimate
ti t off population
l ti
proportion p. If p < 0.20, show will be cancelled.
Suppose in a random sample of 1600 households,
households 288 are
watching (for proportion of 288/1600 = 0.18). Is it likely
to see pp̂ = 0.18 even if p were 0.20 ((or higher)?
g )
z=
pˆ − p
p( 1− p)
n
=
0.18 − 0.20
0.20( 1− 0.20 )
1600
= −2.00
The sample proportion of 0.18 is about
2 standard deviations below the mean of 0.20.
0 20
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
25
9.7 Student’s
Student s tt-Distribution:
Distribution:
Replacing σ with s
Dil
Dilemma:
we generally
ll don’t
d ’t know
k
σ. Using
U i s we have:
h
x−μ
x−μ
t=
=
=
s.e.(( x ) s / n
n (x − μ)
s
If the sample
p size n is small,,
this standardized statistic will
not have a N(0,1) distribution
but rather a tt-distribution
distribution with
n – 1 degrees of freedom (df).
More on t-distributions and tables of probability areas in Chapters 12-13.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
26
Example 9.8 Standardized Mean Weights
Claim: mean weight loss is μ = 8 pounds.
Sample of n =25
25 people gave a sample mean
weight loss of x = 8.32 pounds and a sample
standard deviation of s = 4.74 pounds.
Is the sample mean of 8.32 pounds reasonable
to expect if μ = 8 pounds?
t=
x −μ
s
n
=
8.32 −8
4.74
25
= 0.34
The sample mean of 8.32 is only about one-third
of a standard error above 8, which is consistent
with
ith a population
l ti mean weight
i ht loss
l
off 8 pounds.
d
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
27
9.8 Statistical Inference
• Confidence Intervals: uses sample data
t provide
to
id an interval
i t
l off values
l
that
th t the
th
researcher is confident covers the true value
for the population.
population
• Hypothesis
H
th i Testing
T ti or Significance
Si ifi
Testing:
T ti
uses sample data to attempt to reject the
hypothesis that nothing interesting is
happening, i.e. to reject the notion that
chance alone can explain the sample results.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
28
Case Study 9.1 Do Americans Really Vote
Wh Th
When
They SSay Th
They D
Do?
?
Election of 1994:
• Time Magazine Poll: n = 800 adults (two days after election),
56% reported that they had voted.
• Info
I f from
f
C
Committee
itt for
f the
th Study
St d off the
th American
A
i
Electorate:
El t t
only 39% of American adults had voted.
If p = 0.39 then sample proportions for samples of size
n = 800 should vary approximately normally with …
mean = p = 0.39 and s.d.( p̂ ) =
p (1 − p )
n
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
=
0.39(1 − 0.39)
800
= 0.017
29
Case Study 9.1 Do Americans Really Vote
Wh Th
When
They SSay Th
They D
Do?
?
If respondents were telling the truth, the sample percent
should be no higher than 39% + 3(1.7%) = 44.1%,
nowhere near the reported percentage of 56%.
If 39% of the population voted, the standardized score
f the
for
th reported
t d value
l off 56% is
i …
0.56 − 0.39
z=
= 10.0
0.017
It is virtually impossible to obtain a standardized score of 10.
Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.
30
Download