Uploaded by Nat Talie

stats lec 13

advertisement
Lecture 13
Summary
- Collecting data about the entire population is expensive and time consuming,
therefore we collect data about a sample and use the sample statistics to
understand about the population parameter. This process is called as statistical
inference.
- E.g. Sample mean is used to understand about the population mean; sample
variance is understood about the population variance.
Statistical inference:
1. There are 2 procedures we follow to make statistical inference.
i)
Estimation
ii)
Hypothesis testing
2. In lecture 13, we do the first step i.e. estimation
3. There are 2 ways we can do a statistical inference based on estimation.
i)
Inference based on point estimate
ii)
Inference based on interval estimate
4. Point estimate helps us to understand the parameter based on a single value i.e. we
understand the population mean using the single estimate i.e. sample mean.
5. In this topic, we learn to do inference based on interval estimates by using two
values; lower interval and upper interval, to infer about the population parameter.
These set of interval values are what we call as confidence interval.
6. When population standard deviation σ is known, use Z table. If not, use t table.
7. Formula to calculate the confidence interval is as below.
Question 1: Based on the sample size of 100, average number of hours’ Australian children
spend watching television is 27.191 hours per week. Based on the past experience, standard
deviation is 8 hours.
i)
Calculate 95% confidence interval and interpret
27.191 ± Z/2 8÷100
27.191 ± 1.96 8÷100
(27.191 - 1.568), (27.191 + 1.568)
25.62, 28.76
General: we are __% confident  is between LCI a d UCI.
We are 95% confident that the average number of hours’ Australian kids watch
TV is between 25.62 and 28.76
ii)
Calculate 99% confidence interval and interpret.
27.191 ± 2.57 8÷100
27.191 ± 2.06
25.13, 29.25
 = 0.01
/2 = 0.005
iii)
Can we use CLT here?
Yes, because sample size is > 30. If n>30, we can assume.
iv)
What will happen if sample size increases? Is our interval more accurate or less
accurate?
If n increases, standard error decreases and interval gets narrower as accuracy
increases.
v)
What is the confidence level in part i) and in part ii)?
0.95, 0.99
vi)
What is the significance level in part i) and in part ii)
0.05, 0.01
vii)
Calculate the half of the width of the confidence interval, using the formula
below:
29.25 – 25.13 = 4.12
viii)
What will happen to the width of the interval if?
1. Standard deviation increases
Width increases
2. Sample size increases
Standard error decrease, width decrease
3. Confidence level increases
95% CL z= 1.96  CL increase
99% CL z= 2.57  Z increase
Width increase
ix)
Do we prefer wider interval or narrower interval?
We prefer narrower interval because its more accurate
Desirable characteristics of an estimator:
1. An estimator helps us make inferences about the unknown population parameter.
For e.g., sample mean is an estimator and it helps us understand about the
population parameter. In simple words, just a formula to help us calculate (estimate)
a statistic.
2. Since the primary use of an estimator is to understand about the parameter, we
need it to meet certain characteristics.
3. Before that, we need to understand what ‘bias’ is. This is the difference between the
sample statistic and the population parameter.
4. Properties are:
i.
Unbiasedness: we need the estimator to be unbiased. If the expected value
of an estimator is equal to the parameter, then an estimator is called
unbiased.
ii.
Consistency: an estimator may be biased still it’s not a big problem is the
same estimator is consistent. Consistency is a property of sample size. If n
increases and if the bias decrease, then the estimator is called as consistent.
This is the most desirable property of the two.
iii.
Efficiency: there are many estimators in real life. The one with lowest
variance is the efficient estimator.
iv.
If an estimator satisfies all these conditions, then it is the ‘best’ estimator.
5. Does the sampling distribution of ample means satisfy all these conditions? To check
efficiency, compare with the sample median, whereas the variance of sample
median is.
Download