Chapter 4 Probability Distributions

advertisement
Chapter 9 Estimation and Confidence Intervals
Homework #9 (Hilary Term Week 3): Chapter 9
Exercises 2, 12, 16, 24 & 26.
A Brief Review
Chapter 7: Continuous Probability Distributions
Xi ~ N(, 2)
 P( - 1.96. < Xi <  + 1.96.) = 0.95
Chapter 8: Sampling Methods and the Central Limit
Theorem
The population parameters  and 2 were assumed
known and the objective was to form some
conclusions about possible values of the sample mean,
x.
Xi ~ N(, 2) {or Xi ~ ?(, 2) with n > 30}

x
~ N(, 2/n)
 P( - 1.96./n <
x
<  + 1.96./n) = 0.95
Estimation
A more interesting question to ask is: given the values
of x , what can be said about the population
parameters ?
Point Estimation: A single value is used to provide the
best estimate of the parameter of interest.
Interval Estimation: “Interval estimates are better
for the consumer of the statistics, since they not only
show the estimate of the parameter but also give an
idea of the confidence which the researcher has in
that estimate.”
Estimation: Large Sample Size (n > 30)
x
~ N(, 2/n) (for all distributions of xi!)
 P( - 1.96./n <
x
<  + 1.96./n) = 0.95
Rearranging this inequality,
1. ( - 1.96./n < x )  ( <
x
+ 1.96./n)
2. ( x <  + 1.96./n)  ( x - 1.96./n < )
The interval [ x - 1.96./n <  < x + 1.96./n] is
referred to as the 95% confidence interval for .
The interval [ x - 1.64. /n <  < x + 1.64. /n] is
referred to as the 90% confidence interval for .
The greater the degree of confidence required, the
wider the confidence interval has to be.
2 Unknown?
Replace 2 with s2 (as long as n > 30)
The interval [ x - 1.96.s/n <  < x + 1.96.s/n] is
referred to as the 95% confidence interval for .
The interval [ x - 1.64.s/n <  < x + 1.64.s/n] is
referred to as the 90% confidence interval for .
Estimation: Small Sample Size
Importance of Large Sample Size
1. Central Limit Theorem: Sampling distribution of
the sample mean could be assumed to be Normally
distributed.
Z=
X 

n
~ N(0,1)
2. Unknown 2: Replace 2 with s2
Z=
X 
~ N(0,1)
s
n
Small Sample Size (n < 30)?
1. Given that the Central Limit Theorem can no
longer be used, we must know (or simply
assume/hope) that underlying distribution is
Normally distributed.
Xi ~ N(, 2) 
x
~ N(, 2/n)  Z =
X 

n
~ N(0,1)
2. Claim: If the population is Normally distributed,
the following statistic,
T=
X 
~ tn-1
s
n
has a distribution called the t distribution with n-1
degrees of freedom.
Student’s t distribution
“Student” was Gosset’s pseudonym (Guinness
brewery, Dublin)
The shape of the t distribution depends on the
number of the degrees of freedom (= n – 1). The t
distribution is similar in appearance to the standard
normal (Z) distribution in that it is symmetric about
zero. For small sample sizes, it has wider (i.e. fatter)
tails than the standard normal distribution. For n >
25 or 30, there is little/no difference between the t
distribution and the z distribution.
Z
T20
T10
Use of t Table(s)
The interval [ x - t0.025,n-1.s/n <  < x + t0.025,n-1.s/n] is
referred to as the 95% confidence interval for .
Examples:
n = 10  (n – 1) = 9  t0.025,n-1 = 2.262
n = 20  (n – 1) = 19  t0.025,n-1 = 2.093
n =   (n – 1) =   t0.025,n-1 = 1.96
Example: Given the sample data, x = 40, s = 10 and n
= 36, calculate the 99% confidence estimate of the
population mean . If the sample size were 20, how
would the method of calculation and width of the
interval be altered?
As n > 30, the 99% confidence interval for  is [ x 2.57.s/n <  < x + 2.57.s/n] =
[40 – 2.57.10/6 <  < 40 + 2.57.10/6] = [35.72, 44.28]
n = 20: The 99% confidence interval for  is [ x t0.005,n-1.s/n <  < x + t0.005,n-1.s/n] =
[40 – 2.861.10/20 <  < 40 + 2.861.10/20] = [33.60,
46.40]
Estimating a Proportion
: proportion of the population that has a particular
characteristic, e.g. unemployed, FF voter, …
p: proportion of a sample that has a particular
characteristic, e.g. unemployed, FF voter, …
n: sample size
Review: The Binomial Distribution
n: number of trials
x: number of “successes” within n trials
: probability of “success” in any individual trial
(1-): probability of “failure” in any individual trial
P(x) = nCx x (1-)n-x
Claims:
1. E(x) = n (intuitive)
2. Var(x) = n(1-) (not so intuitive)
See previous notes for proofs.
x ~ B(n, n(1-))
Claim: If x ~ B(n, n(1-))
 x ~ N(n, n(1-)) [if n > 5 and n(1-) > 5)]
Estimating a Proportion
Sample proportion = number of “successes”
number of trials
i.e., p = x/n
p ~ ?(?,?)
 x ~ N(n, n(1-)) and p is a linear transformation
of x
 p ~ N(?,?)
 E(p)?
E(p) = E(x/n) = E(x)/n = n/n =  (as expected)
 Var (p)?
Var(p) = Var(x/n) = Var(x)/n2 = n(1-)/n2 = (1-)/n
= (1-)/n
Therefore,
p ~ N(, (1-)/n)
Example: Given the sample data p = 0.4, n = 50,
estimate the 99% confidence interval estimate of the
true proportion.
p ~ N(, (1-)/n)
Therefore, the 99% confidence interval for  can be
written down as:
[p – 2.57{p(1-p)/n}0.5, p + 2.57{p(1-p)/n}0.5]
[0.22, 0.58]
Note: The known p(1-p)/n is being used as a
replacement for the unknown (1-)/n.
Download