Chapter 8 Notes (Word)

advertisement
Chapter 8: Statistical Inference Confidence Intervals
Background:
Critical values of Z
Za is the value such that P (Z > Za) = a where Z ~ N (0, 1) and
0<a<1
Ex:
P (Z > 1.645) = .05  Z.05 = 1.645
P (Z > 1.96) = .0250  Z.025 = 1.96
P (Z > 2.326) = .01  Z.01 = 2.326
P (Z > 2.576) = .005  Z.005 = 2.576
A statistic is a number that describes the sample. Examples include the sample mean (xbar), the sample standard deviation (s) and the sample proportion (p-hat = number of
successes in the sample divided by the sample size).
A parameter is a number that describes a population. Examples include the population
mean (μ), the population standard deviation (σ) and the population proportion (p).
Population parameters are usually unknown, so we want to estimate them. The simplest
way to estimate parameters are with statistics, which we call point estimates. We can
do better by not just using a single point, but by setting up an interval around the point
and stating how much confidence or certainty we have that the parameter is in the
interval. We call this interval estimation.
Interval Estimation is the beginning of Statistical Inference, which is the action of
generalizing about a population from a sample. Statistical inference lets us make
statements about population parameters with a stated level of certainty. Statistical
inference also enables us to make better decisions.
In order for Statistical inference to be correct we need to make sure that the sample is
representative of the population. We do this by taking random samples. These are
samples where each unit / subject has the same probability of being selected as any
other unit in the population.
Finding CI’s for the population mean.
First we find a point estimator (statistic) that estimates the population parameter.
For μ we use x-bar the sample mean.
Second we find the standard error of the estimator.
For x-bar the SE is simple:

s
n
n
Where σ is the population standard deviation (which will not be known much of the
time), s is the sample standard deviation and n is the sample size. (These numbers will
be given to you)
SE 

Third we find the margin of error. This will depend on whether or not we know σ. If we
know σ, the margin of error involves multiplying the SE by a z-score from the standard
normal distribution.
To find a 95% CI for μ known σ
(x-bar – Z.025 * SE, x-bar + Z.025 * SE)
To find any level CI which will denote (1 – α) where α is between 0 and 1, we need to
find Zα/2
Common CI levels are 95%, 99% and 90%.
Fourth, in order to use the methodology we need to know that the CLT applies. We
either need to know that:
n ≥ 30
or if n < 30 and the sample comes for an approximately normally distributed population.
Of course we are also assuming that the sample was obtained randomly.
You do not want to use either methodology to find CI’s for μ if the data is binary, like
yes-no questions.
Ex:
Suppose you want to find a 95% CI for µ when the sample mean is 27.5 from a random
sample of 36 subjects. The population standard deviation is 8.6.
Since n = 36 > 30 we can use the above methodology.
X-bar = 27.5 and σ = 8.6.
Our confidence level = .95 so α = .05 and α/2 = .025 so Zα/2 = 1.96.
Therefore our 95% CI for µ is:
(27.5 – 1.96 * 8.6/6, 27.5 + 1.96 * 8.6/6)
(27.5 – 2.809, 27.5 + 2.809)
(24.691, 30.309)
Interpretation:
Therefore we are 95% sure that the population mean is between 24.691, 30.309.
Inference:
If someone claimed that the population mean is 22, we can state that we are 95%
confident that they are incorrect, because 22 is not in the interval.
If someone claimed that the population mean is 32, we can state that we are 95%
confident that they are incorrect, because 32 is not in the interval.
However, if someone claimed that the population mean is 26, we cannot state that we
are 95% confident that they are correct, because 26 is in the interval. We can only state
that we are NOT 95% confident that they are wrong.
On the TI 83/84
Hit [STAT] select TESTS select 7: ZInterval…
ZInterval
Inpt: Data Stats
select Stats
σ: 8.6
x-bar: 27.5
n: 36
C-Level: .95
Calculate
Output
(24.691, 30.309)
x-bar: 27.5
n: 36
Ex:
A random sample of 50 Twisty pretzels is taken. Their mean baking time was 13.2
minutes. The population standard deviation is 4.1 minutes. Find a 95% CI for the
population mean baking time of the pretzels.
Solution:
Definition of terms:
n = 50
x-bar = 13.2 σ = 4.1
μ = unknown mean baking time for all Twisty pretzels, this is what we are finding the CI
for. Note that n > 30.
The 95% CI is (12.064, 14.336)
We are 95% sure that the mean baking time for all Twisty Pretzels is between 12.064
and 14.336 minutes.
T-Intervals
For CI’s of μ when we do not know σ we will use s to approximate σ and we find the
margin of error by multiplying the SE by a t-score which comes from the T-distributions.
The T-distributions are very similar to the Standard Normal Distribution, but are a little
more complicated because each T-distribution depends on n. Recall that the normal
distribution has 2 parameters, μ and σ. The T-distribution has only one parameter,
called v = n -1 = degrees of freedom. The mean of any T-distribution is always 0.
For this class it will simply mean hitting a different button on your calculator.
To find a 95% CI for μ unknown σ
(x-bar – Tα/2 * SE, x-bar + Tα/2 * SE)
In the old days you would need to learn how to read the T-table and manually calculate
the margin of error by hand, but via technology, your calculators will calculate CI’s for μ
using T-scores for you. The Tα/2 ‘s will change depending on n.
Ex: Same as before but now we use T
A random sample of 50 Twisty pretzels is taken. Their mean baking time was 13.2
minutes with a standard deviation of 4.1 minutes. Find a 95% CI for the population
mean baking time of the pretzels.
Solution:
Definition of terms:
n = 50
x-bar = 13.2 s = 4.1
μ = unknown mean baking time for all Twisty pretzels, this is what we are finding the CI
for. Note that n > 30.
On the TI 83/84
Hit [STAT] select TESTS select 8: TInterval…
TInterval
Inpt: Data Stats
select Stats
x-bar: 13.2
Sx: 4.1
n: 50
C-Level: .95
Calculate
Output
(12.035, 14.365)
x-bar: 13.2
Sx: 4.1
n: 50
We are 95% sure that the mean baking time for all Twisty Pretzels is between 12.035
and 14.365 minutes.
Ex2. A random sample of 40 students taking the Kaplan SAT prep course is taken. Their
average SAT score was 1165 with a standard deviation of 86.3. Find a 95% CI for the
mean SAT score for all students taking the Kaplan SAT prep course.
n = 40
x-bar = 1165 s = 86.3
95% CI for mean SAT score for all Kaplan students is
(1137.4, 1192.6)
We are 95% sure that the mean SAT score for all Kaplan students is between 1137.4 and
1192.6
ETS reports that the mean score for all students taking the SAT’s is 1120. Based on the
data can Kaplan claim that the mean score for all its students is greater than the mean
for all students in general?
Yes, because Kaplan is 95% sure that the mean for all its students is between (1137.4
and 1192.6), 1120 is below this interval.
(1120 < 1137.4)
The Princeton Review, a rival of Kaplan, says that the mean for all its students is 1180.
Can the Princeton Review claim that the mean SAT score for all its students is greater
than the mean SAT score for all Kaplan’s students?
No, because 1180 is within the CI (1180 < 1192) so there is insufficient evidence to state
that the mean SAT score for all Princeton Review students is greater than the mean SAT
score for all Kaplan students.
What CI’s really mean?
You should really use T because σ is rarely known.
Approximate CI’s for p
Ex
Suppose we want to know what percentage / proportion of the US approves of the
president.
We cannot sample every one of the 300 million people in the US.
Survey companies take random samples of people in the US and calculate the sample
proportion. Usually they also report a margin of error. Something like 42 % plus or
minus 2%.
This margin of error gives us an interval for estimation.
(40%, 44%)
Other survey companies may take similar surveys and find slightly different results. But
if all the companies were careful to take random samples and word their questions the
same, the results will be very similar.
If the sample proportions are slightly different that does not mean that one poll is
wrong and one is right. The sample proportion is a statistic and like all statistics it is also
a random variable. It will change from sample to sample.
Ex. Assume that the population proportion of US people who approve of the president is
actually 40%. If 10 different survey companies take random samples of 100 people,
what might the sample proportions look like?
S1
0.39
S2
0.41
S3
0.42
S4
0.44
S5
0.47
S6
0.39
S7
0.40
S8
0.42
S9
0.43
S10
0.37
P-hat = sample proportion (Calculator and most texts) Your text calls it P’
P-Hat is a good estimator for p for 3 important reasons.
1. The mean value of p-hat is p. On average if you could find all possible p-hat’s and
averaged them you would get p. We say that p-hat is an unbiased estimator for p.
2. The standard error (or standard deviation) of p-hat is known and is small, when both
(n * p-hat and n * q-hat are both ≥ 5)
The SE of p-hat is:
^
^
SE ( p) 
^
p(1  p)
n
3. If n * p-hat and n * q-hat are both ≥ 5, then the distribution of p-hat is approximately
Normally distributed. This can be shown by the Central Limit Theorem.
These 3 reasons make finding Confidence intervals for p very easy.
Note that x-bar is an unbiased estimator of μ, the SE(x-bar) is also small and known and
x-bar is normally distributed, which makes finding CI’s for μ also very easy.
To Find a 100 (1 – α) % Confidence Interval for p
p-hat – Z α/2 * SE, p-hat + Z α/2 * SE
where
^
SE 
^
p (1  p )
n
For example to find a 95% Confidence Interval for p
p-hat – 1.96 * SE, p-hat + 1.96 * SE
The -1.96 and +1.96 come from the standard normal or Z distribution.
P(-1.96 < Z < +1.96) = .95
Ex:
In a random sample of 100 SCSU students 72 said that there was not ample parking on
campus. Find and interpret a 95% CI for the true proportion of SCSU students who do
think that there is not ample parking on campus.
n = 100 and x = 72, therefore p-hat = x/n = .72
So q-hat = 1- .72 = .28
SE 
.72 * .28
 .0449
100
First check to see if n*p-hat and n*q-hat are bigger than 5.
n*p-hat = 100 * .72 = 72 > 5
n*q-hat = 100 * .28 = 28 > 5
So the 95% CI for p is:
(.72 – 1.96* .0449, .72 + 1.96 * .0449)
(.72 - .088, .72 + .088) Note that .088 is the margin of error or error bound.
(.632, .808)
We are 95% sure that the population proportion for all SCSU students who think that
there is not ample parking is between (.632, .808)
So if the administration were to say that 50% of students think there is not ample
parking you could say that you are 95% sure that they are lying.
(.50 is not in the interval)
Or is some radical student association said that 90% of students say there is not ample
parking, you could say that you are 95% sure that they are lying.
(.90 is not in the interval)
If someone else said that 75% of students do think there is not ample parking, you
CANNOT say that you are 95% sure that they are right. You only can say that the true or
population proportion of students who think that there is not ample parking is
somewhere between .632 and .808. Where p is in the interval you do not know.
To find a 95% CI for p on the TI 83/84 :
Hit [STAT] then go over to Tests, scroll down to
A: 1-PropZInt Hit [ENTER]
(NOT 1-Prop-ZTest)
1-PropZInt
x: 72
n: 100
C-Level : .95
Calculate
You can use the calculator too find a CI for p for any C-level between 0% and 100% not
inclusive.
The most common CI’s are 95% and 99%.
The formula for the 99% CI is the same as the 95% CI except where there is 1.96 you
substitute 2.575.
Find a 99% CI for p, using the previous data:
1-PropZInt
x: 72
n: 100
C-Level : .99
Calculate
(.604, .836)
Note that this interval is a little bigger.
Determining the Sample Size
CI for p and µ are very similar. Each CI depended on finding the SE of the statistic and
using a distribution, Z for p and T for μ.
Each CI also involved dividing by sqrt(n). Therefore the width of the CI depends on n.
The problem we dealt with last time was finding an interval that we could use for
statistical inference. Another similar problem is, we want to estimate a parameter (p or
μ) with a certain confidence level (like 95% or 99%) and we want the margin of error
(1/2 the width of the CI) to be a certain maximal amount and we want to know how
many observations we need to accomplish this.
Recall that to find a 95% Confidence Interval for p the formula is:
p-hat – 1.96 * SE, p-hat + 1.96 * SE
where
^
^
p (1  p )
SE 
n
The margin of error is 1.96*SE.
Note that as n increases the SE will decrease so the margin of error will decrease. We
can always find an n big enough such that we will find a small enough margin of error to
satisfy the problem. Whether that n is realistic is another question.
The formula for finding n to find a 95% CI for p that has a margin of error EBP for Error
Bound for Proportion is:
^
^
p (1  p ) * 1.96 2
n
EBP 2
Ex: A drug company wants to find a 95% CI for the population proportion of people its
new drug will cure. It wants the margin of error to be 10%. In previous testing they
estimate p to be near 70%. How many subjects must they sample?
Information given:
95% CI for p, EBP = .10 and p-hat = .70 so 1 – p-hat = .30
n = .7 * .3 * 1.962/.12
n = .7 * .3 * 3.8416 / .01 = 80.6736
The final answer is 81, because n has to be an integer and you ALWAYS round up!
The formula for finding n to find a 99% CI for p that has a margin of error EBP is:
^
^
p(1  p) * 2.576 2
n
EBP 2
Ex2: If we use the same information as the previous example, and find n for a 99% CI we
would get:
99% CI for p, m = .10 and p-hat = .70 so 1 – p-hat = .30
n = .7 * .3 * 6.636 / .01 = 139.356
The final answer is 140, because n has to be an integer and you ALWAYS round up!
Increasing the confidence level and having m and p-hat stay the same makes the n
increase. The more confident you want to be the more observations you need.
In general the formal for finding n is:
^
^
p (1  p ) * Z 2 / 2
n
EBP 2
Ex 3: If the company wanted to be more accurate, so they want their margin of error to
be .01 instead of .1, how many observations would they need to find a 95% CI, assuming
p-hat = .70.
n = .7 * .3 * 3.8416 / .0001 = 8067.36
n = 8068, a much bigger number than for a 10% margin of error.
As EBP gets smaller n increases.
Ex 4: What if the company did not know that p-hat was about 70%? Because in the
numerator of the formula for n we are multiplying p-hat * (1 – p-hat) and p-hat has to
be between 0 and 1, we can maximize the numerator by assuming p-hat = 0.5.
So if you are given a problem where p-hat is not given and you are asked to find n, you
assume that p-hat = .5 = 1 – p-hat.
To find the sample size required for a 95% CI for p, when we want the margin of error to
be 10%, then:
n = .5 * .5 * 3.8416 / .01 = 96.04
n = 97
To find the sample size for a 100(1 – α) % CI for μ:
Z *  2 Z 2 *  2
n   /2 2   /2 2
EBM
EBM
Note that EBM stands for Error Bound for Mean and will sometimes be called the
margin of error.
Also note that we are using σ and Z here not s and T, why?
Ex 5: Find the sample size required for a 95% CI for μ with a margin of error of 3, when
the standard deviation is 10.
n = 3.8416 * 100 / 9 = 42.68  43
Ex 6: Find the sample size required for a 99% CI for μ with a margin of error of 3, when
the standard deviation is 10.
n = 6.636* 100 / 9 = 73.7  74
Ex 7: Find the sample size required for a 95% CI for μ with a margin of error of 4, when
the standard deviation is 10.
n = 3.8416 * 100 / 16 = 24.01  25
How is the width of the CI affected by σ, α, EBM?
Ex 7: Find the sample size required for a 95% CI for μ with a margin of error of $1000 for
the mean annual income of Native Americans in Onondaga County, NY. We are not
given any information about the standard deviation. We know that nearly all the
incomes fall between $0 and $120,000 and that the distribution is close to bell shaped.
First we need to approximate s. If the distribution of salaries is between 0 and 120,000
and bell shaped then this would be the interval (μ – 3σ, μ + 3σ). This means that 6σ =
120000, which means that σ = 20000. We will use this as s.
n = 3.8416 * 20000 2 / 1000 2 = 1536.64
n = 1537
Homework: Page 343 (353) 1, 3, 5, 7, 9, 11, 13
Download