Lecture 13 Summary - Collecting data about the entire population is expensive and time consuming, therefore we collect data about a sample and use the sample statistics to understand about the population parameter. This process is called as statistical inference. - E.g. Sample mean is used to understand about the population mean; sample variance is understood about the population variance. Statistical inference: 1. There are 2 procedures we follow to make statistical inference. i) Estimation ii) Hypothesis testing 2. In lecture 13, we do the first step i.e. estimation 3. There are 2 ways we can do a statistical inference based on estimation. i) Inference based on point estimate ii) Inference based on interval estimate 4. Point estimate helps us to understand the parameter based on a single value i.e. we understand the population mean using the single estimate i.e. sample mean. 5. In this topic, we learn to do inference based on interval estimates by using two values; lower interval and upper interval, to infer about the population parameter. These set of interval values are what we call as confidence interval. 6. When population standard deviation σ is known, use Z table. If not, use t table. 7. Formula to calculate the confidence interval is as below. Question 1: Based on the sample size of 100, average number of hours’ Australian children spend watching television is 27.191 hours per week. Based on the past experience, standard deviation is 8 hours. i) Calculate 95% confidence interval and interpret 27.191 ± Z/2 8÷100 27.191 ± 1.96 8÷100 (27.191 - 1.568), (27.191 + 1.568) 25.62, 28.76 General: we are __% confident is between LCI a d UCI. We are 95% confident that the average number of hours’ Australian kids watch TV is between 25.62 and 28.76 ii) Calculate 99% confidence interval and interpret. 27.191 ± 2.57 8÷100 27.191 ± 2.06 25.13, 29.25 = 0.01 /2 = 0.005 iii) Can we use CLT here? Yes, because sample size is > 30. If n>30, we can assume. iv) What will happen if sample size increases? Is our interval more accurate or less accurate? If n increases, standard error decreases and interval gets narrower as accuracy increases. v) What is the confidence level in part i) and in part ii)? 0.95, 0.99 vi) What is the significance level in part i) and in part ii) 0.05, 0.01 vii) Calculate the half of the width of the confidence interval, using the formula below: 29.25 – 25.13 = 4.12 viii) What will happen to the width of the interval if? 1. Standard deviation increases Width increases 2. Sample size increases Standard error decrease, width decrease 3. Confidence level increases 95% CL z= 1.96 CL increase 99% CL z= 2.57 Z increase Width increase ix) Do we prefer wider interval or narrower interval? We prefer narrower interval because its more accurate Desirable characteristics of an estimator: 1. An estimator helps us make inferences about the unknown population parameter. For e.g., sample mean is an estimator and it helps us understand about the population parameter. In simple words, just a formula to help us calculate (estimate) a statistic. 2. Since the primary use of an estimator is to understand about the parameter, we need it to meet certain characteristics. 3. Before that, we need to understand what ‘bias’ is. This is the difference between the sample statistic and the population parameter. 4. Properties are: i. Unbiasedness: we need the estimator to be unbiased. If the expected value of an estimator is equal to the parameter, then an estimator is called unbiased. ii. Consistency: an estimator may be biased still it’s not a big problem is the same estimator is consistent. Consistency is a property of sample size. If n increases and if the bias decrease, then the estimator is called as consistent. This is the most desirable property of the two. iii. Efficiency: there are many estimators in real life. The one with lowest variance is the efficient estimator. iv. If an estimator satisfies all these conditions, then it is the ‘best’ estimator. 5. Does the sampling distribution of ample means satisfy all these conditions? To check efficiency, compare with the sample median, whereas the variance of sample median is.