Stat 200: Introduction to Inference

advertisement
Presentation 8
First Part
Introduction to Inference:
Confidence Intervals
and Hypothesis Testing
What is inference?
Inference is when we use a sample to make conclusions
about a population.
1. Draw a Representative
SAMPLE from the POPULATION
2. Describe the SAMPLE90
Var 1
Var 2
Va 3
80
70
459
Brown
28
657
Red
43
60
50
40
30
321
Green
46
213
Blue
47
536
Blue
53
3. Use Rules of Probability and
Statistics to make Conclusions about
the POPULATION from the SAMPLE.
East
West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr
4th Qtr
Population Parameters
p = population proportion
 µ = population mean
 σ = population standard deviation
 β1 = population slope (we will see this in Ch. 14)

Sample Statistics

p̂ = sample proportion

x = sample mean
s = sample standard deviation
 b1 = sample slope (we will see this in Ch. 14)

Two Types of Inference
1.
Confidence Intervals: (Ch. 10 & 12)
– Confidence Intervals give us a range in which the population
parameter is likely to fall.
– We use confidence intervals whenever the research question calls for
an estimation of a population parameter.
Example: What is the mean age of trees in the forest?
Estimate the proportion of US adults who would vote for candidate A.
2. Hypothesis Testing: (Ch. 11 & 13)
– Hypothesis tests are tests of population parameters.
Example: Is the proportion of US adult women who would vote for
candidate A >50%?
– We can only prove that a population parameter is ‘different’ than our
null value. We cannot prove that a population parameter is equal
to some value.
Valid Hypothesis: Is the mean age of trees in the forest > 50 years?
Invalid Hypothesis: Is the mean age of trees in the forest equal to
50 years?
Types of CIs and Hypothesis Tests
For Hypothesis Tests and C.I.’s:





1-proportion (1-categorical variable)
1-mean (1-quantitative variable)
Difference in 2 proportions (2-categorical variables,
both with 2 levels)
Difference in 2 means (1-quantitative and 1categorical variable, or 2-quantitative variables,
independent samples)
Regression, Slope (2-quantitative variables)
For Hypothesis Tests only:
 Chi-Square Test (2-categorical variables, at least one
with 3 or more levels!)
Some Examples…

Mike wants to estimate the mean high-school GPA of incoming
freshman at Penn State.
Solution- CI for one population mean.

George wants to know if the proportion of students who engage in
under age drinking is greater than 25%.
Solution- Test of one proportion
Ho: p ≤ .25
Ha: p > .25

Doug wants to estimate the difference in the proportion of men and
women who smoke.
Solution- CI for difference in 2-proportions.
Interpreting CI and Hypothesis Testing

Confidence Intervals:
Given the confidence level, β= 90%, 95%, 99%, etc conclude that
with β % confidence the population parameter is within the
confidence interval.
Example: Suppose the 90% CI for age of trees in the forest is
(32,45) years. Then, we are 90% confident that the true mean age
of trees in the forest is between 32 and 45 years.

Hypothesis Testing:
Use the p-value to determine whether we can reject the null
hypothesis.
We do not need to know the exact definition now, or how to
calculate the p-value, but generally the p-value is a measure of
how consistent the data is with the null hypothesis. A small
p-value (<.05) indicates the data we obtained was UNLIKELY under
the null hypothesis.
Decision Rule:
If the p-value is <.05 we REJECT the null hypothesis, and accept the
alternative. We have a statistically significant result!
If the p-value is >.05 then we say that we do NOT have enough evidence
to reject the null hypothesis.
Second Part
Confidence Intervals
for 1-Proportion
Review of Ch.9: Sample Proportion

Mean of pˆ : E ( pˆ )  p
Std.Dev.of pˆ : sd ( pˆ ) 
p(1  p)
n
Standard Error of pˆ : se( pˆ ) 
pˆ (1  pˆ )
n
 If np and n(1-p) are greater or equal to 10, the sampling
distribution of p̂ is approximately normal with mean p and
standard deviation p(1  p) .
n
From Sampling Distributions to
Confidence Intervals…

The sample proportion will fall close to the true
proportion.

Thus the true proportion is likely to be close to the
observed sample proportion. How close?

95% of the p̂ would be expected to fall within ± 2
standard deviations of the true proportion p.

So if we were to construct intervals around p̂ ‘s with a
width of ± 2 standard deviations these intervals would
contain the TRUE population proportion 95% of the
times!
Margin of Error & C.I.
is an estimator of p but it is not exactly equal to p.

p̂

How far is p̂ from p?
– Margin of Error is a measure of accuracy providing a likely upper
limit for the difference between p̂ and p.
– This difference is almost always less that the Margin of Error.
– The almost always is translated with large probability. Usually we
are talking about 90%, 95% or 99% probability.
– This probability is the confidence level. For example, if the
confidence level is 95%, it means that 95% of the times the
difference between p̂and p is less than the Margin of Error. (i.e. we
expect 38 out of 40 samples to give a p̂ such that its difference with
p is less than the Margin of Error.)

Example: Based on a sample of 1000 voters, the proportion of voters
who favor candidate A are 34% with a 3% Margin of Error based on a
95% confidence level. What does this tell us?
95% C.I. for 1-proportion (Derivation)

If np and n(1-p) are ≥ 10, the sampling distribution of p̂ is
ˆ ).
approximately normal with mean p and standard deviation sd ( p

From the empirical rule we have that for about 95% of the samples,
ˆ ) from p, i.e. with 95% probability we
is going to fall within 2 sd ( p
have
p  2sd ( pˆ )  pˆ  p  2sd ( pˆ )
 2sd ( pˆ )  pˆ  p  2sd ( pˆ )
p(1  p)
p(1  p)
ˆ
 2
 p p2
n
n

There is a problem here! Since p is the unknown parameter of
ˆ ) is also unknown. Thus, we substitute sd ( pˆ ) with
interest, sd ( p
ˆ ). Doing so we have that if npˆ and n(1  pˆ ) are both ≥10,
the se ( p
then with 95% probability we have
 2se( pˆ )  pˆ  p  2se( pˆ )  2
pˆ (1  pˆ )
pˆ (1  pˆ )
ˆ
 p p2
n
n
95% Margin of Error and C.I. for p

Thus, if npˆ and n(1  pˆ ) the 95% Margin of Error is
pˆ (1  pˆ )
2  se( pˆ )  2
pˆ
and the 95% C.I. for p is
Sample Statistic  Margin of Error
pˆ (1  pˆ )
 pˆ  2  se( pˆ )  pˆ  2
pˆ
Note that we are using p̂ instead of p for the condition!
Example 1: Obtaining a 95% C.I. for p.
A sample of 1200 people is polled to determine the
percentage that are in favor of candidate A. Suppose 580
say they are in favor. Construct a 95% CI for the true
population proportion.
p̂  580/1200  .483
p̂(1  p̂)
.483(1 - .483)
se(p̂) 

 .0144
n
1200
So the 95% CI for p is:
p̂  2  se(p̂)  .483  2(.0114)  (.455,.512)
Conclusion: We are 95% confident that the
true population proportion of those who
support candidate A is between 45.5% and
51.2%.
Any C.I. for 1-proprtion

Conditions: We need to have

β% CI for p :
npˆ  10 and n(1  pˆ )  10.
pˆ  z * se( pˆ )
Margin of Error=z* times the std. error
– z* multiplier depends on the desired confidence level, β%.
– z* is such that P(-z*<Z<z*)= β%. The most common multipliers are

Conf. level, β%.
Multiplier, z*
90
1.64
95
1.96 ≈ 2
98
2.33
99
2.58
Interpretation: We are β% confident that the true population
proportion, p, is contained within the confidence interval.
Another interpretation is that for about β% samples from the
population, the CI captures p.
Example 2: Obtaining a 99% C.I. for p.
300 high-risk patients received an experimental AIDS
vaccine. The patients were followed for a period of 5 years
and ultimately 53 came down with the virus. Assuming all
patients were exposed to the virus, construct a 99% CI for
the proportion of individuals protected.
We have that the 99% CI for p is:
pˆ  z * se( pˆ )
where z*= 2.58. (Can you see why using the Normal table?)
247
 .823
300
ˆ (1  p
ˆ)
p
ˆ
se( p ) 

n
ˆ 
p
.823(1  .823)
 .0220
300
So the 99% CI for p = .823 ± 2.58(.0220) = (.767,.880)
We are 99% confident that the true proportion of those
protected by the vaccine is between 76.7% and 88.0%.
The Width of a Confidence Interval is
affected by:
n
as the sample size increases the standard
error of p̂ decreases and the confidence interval
gets smaller. So a larger sample size gives us a
more precise estimate of p.
z*
as the confidence level increases
(β%), the multiplier z* increases, leading to a
wider CI.
So, if we want to control the length of the C.I. we can
either adjust the confidence level or the sample size...
Question: What is an appropriate size in order to
obtain a C.I. of a 95% confidence level that is not
very large (i.e. with small Margin of Error)?



The Margin of Error for 95% CI is equal to 2 x s.e( p̂).
Before collecting the sample, p̂ is unknown, thus we
cannot calculate the exact Margin of Error.
A conservative Margin of Error is equal to 1 n
pˆ (1  pˆ )
1/ 2
1
1
noting that 2
2

, since pˆ (1  pˆ ) 
n
n
n
4



This implies that p̂ differs from p at most ___________ .
Using the conservative Margin of Error, the length of the
C.I. is equal to _____________.
How large should n be to get a 95% CI of some length L?
n=___________.
Download