Inferential statistics Suppose, we have a bag of nuts. I will choose one of nuts, I will crack it and it will be empty. What then I can conclude? The optimist says: „But this! Only one nut is bad and I have to pull it. At least we got rid of it. "Pessimist says:" This is what I was afraid of, the bag is full of bad nuts ". What will say Statistician? I declare that both pessimist and optimist may be right. To determine whether the nuts in the bag are bad, it is enough to select few nuts from different places of bag and crack them … doc.Ing. Zlata Sojková, CSc. 1 Statistical inference is based on the sample investigation Statistical inference is the process of using sample results to draw conclusions about the parameters of a population. The sample should be a representative sample of the population. On the picture it’s not so ... doc.Ing. Zlata Sojková, CSc. 2 Examples of inferential statistics Household accounts Marketing research of consumer behavior (patterns?) Sample investigation of agricultural enterprises Survey of public opinions Quality control doc.Ing. Zlata Sojková, CSc. 3 Inferential statistics (or Statistical inference) Assume that we are working with the sample and we calculate a sample statistics such: sample average, sample variance , sample standard deviation. Based on the sample we assume the properties of a population. This means , the values of a sample statistics are used to estimate the unknown values of population parameters Usually we estimate parameters of population such : population mean, population variance, standard deviation of population. doc.Ing. Zlata Sojková, CSc. 4 Graphicaly Symbols: parameters of population: , 2, , generally Q Sample with size n sample characteristics : Population – size N, resp. (infinity) 2 x , s ,s Generally:un doc.Ing. Zlata Sojková, CSc. 5 Statistical inference (SI) has two basic tasks statistical estimation - unknown population parameters are estimated by sample characteristics Statistical hypothesis testing - we express assumptions about the unknown parameters of the population. If we can formulate these assumptions to statistical hypotheses and we can verify their validity by statistical procedures, then these statistical process is statistical hypothesis testing. doc.Ing. Zlata Sojková, CSc. 6 Some another tasks of SI To determinate size of sample (n), which will be enough for reliable (spoľahlivý) estimation of parameters To determinate some methods of statistical units sampling from population Explanation: the sample characteristics are deterministic in relationship to the sample, but they are random variables in relationship to the population , so they have some probability distribution. That means, important is choosing of the right model of sample characteristic distribution, which we have to use in statistical inference (this made for us statisticians). Arithmetic average has usually Student distribution, but in large sample (n>30) we can approximate Student distribution by Normal distribution doc.Ing. Zlata Sojková, CSc. 7 Random sampling There are a lot of methods that can be used to select a sample from a population from the repetition point of view selection with replacement •selection without replacement Classification based on the subdivision file simple random sample (finite or infinite population) or composite, which can be:. • • Based on choosing of groups Quota sampling …..e.t.c. doc.Ing. Zlata Sojková, CSc. 8 Theory of Estimation (TO) Repetition: the main goal of theory of estimation is to estimate population parameters such: mju, sigma by using sample characteristics There are two types of estimators Point estimate – bodový odhad Interval estimate – intervalový odhad doc.Ing. Zlata Sojková, CSc. 9 Point estimation of population parameter Q (generally) Point estimator – is a single numerical value used as an estimate of population parameter Q - geometrically that means one point Estimate- estimator – abbrev.est. sign: est Q = un Q un Mostly we estimate : population mean variance of population 2 and standard deviation of population doc.Ing. Zlata Sojková, CSc. 10 Attributes of point estimates The best estimator satisfies (meets) following conditions: Unbiasedness - neskreslenosť (nevychýlenosť) Consistency - konzistencia We eplain two Efficiency - výdatnosť first condition Suficiency (postačujúci odhad) doc.Ing. Zlata Sojková, CSc. 11 Unbiasedness E(un - Q) = 0 E( un )= Q we will repeat sampling more times, always we will get some another error – so we will get x another average . x According to the unbiasedness we require that expected value of all errors should be equal to zero. We require that all errors are only random, so we don’t underestimate or overestimate the mean of population. x x x x x xx xx x Asymptotically unbiased estimator of Q is sample characteristic , which satisfy condition : lim E(u n ) Q n doc.Ing. Zlata Sojková, CSc. 12 Consistency lim P(| un - Q | ) 1 n Principle of consistency lies in the law of large numbers. The consistency provides in statistical practice, that with increasing sample size the error of estimation decreases. For large samples the error of estimation is very small Sufficient condition of consistency is asymptotically unbiased estimation of un and meeting of the condition: lim D(u n ) 0 n doc.Ing. Zlata Sojková, CSc. 13 Efficiency PE Any sample characteristic is a random variable, with some variance If we have two unbiased point estimators of the same population parameter, the point with the smaller variance is said to have greater efficiency. D( un ) min doc.Ing. Zlata Sojková, CSc. 14 Point estimator of population mean E ( x ) , D( x ) ... x 2 n ! Standard deviation of average , mean standard error of n estimation While x offers unbiased estimator of and : lim 2 D( x ) lim 0 n n n The sufficiency condition of consistence is satisfied and x is unbiased and consistent estimator of population mean est x doc.Ing. Zlata Sojková, CSc. 15 Point estimator of variance 2 resp. (n - 1) 2 E ( s ) ... . n 2 Sample variance s 2 isn’t unbiased estimator of population variance 2 -it offers negatively biased estimation. Unbiasedness is equal to 1 . n 2 The sample variance is asymptotically unbiased of 2, while n 1 2 lim E(s ) lim 2 n n n 2 doc.Ing. Zlata Sojková, CSc. 16 So, unbiased point estimator of population variance 2 is sample variance s12, which is computed: n n 1 2 s12 s2 (x x ) j n -1 n - 1 j1 Bessel’s correction Conclusion Difference between s12 and s2 is decreasing with increasing sample size n. At the sample size greater than 50, ( n > 50 ) difference is negligible est x est s 2 doc.Ing. Zlata Sojková, CSc. 2 1 17 Example:At 400 random households in one of the regions SR were investigated expenditures on alcoholic drinks and cigarettes. We will make point estimate of mean and standard error. est x 973Sk est s1 286Sk s1 286 x 14.3 20 n Estimated average error of mean is relatively small. It is only 1.5% of mean. We can expect that error in estimation of average expenditures on alcoholic drinks and cigarettes is not too large. doc.Ing. Zlata Sojková, CSc. 18 Comparison of the statistical distribution of attributes X in the population to the distribution of x sample average : f(x) )x(f σ n doc.Ing. Zlata Sojková, CSc. 19 Interval estimate of parameter Q P(q1 Q q2) = 1- q1,q2 – lower and upper limit of interval - random f(g) -risk of estimation (1 - ) confidency level /2 q1 /2 q2 doc.Ing. Zlata Sojková, CSc. 20 Interval estimation of population mean Suppose, that the statistical attribute has a Normal distribution X.....N(,2) , If we will choose a sample with the size of n, then aritmethic average has Normal distribution too .......N(, 2/n) Confidence interval for depends on disponibility of information and sample size: a) If the variance of population is known (theoretical assumption) we can create standardized normal variables : u x- u has N(0,1) independent on estiamed value n doc.Ing. Zlata Sojková, CSc. 21 x -μ P u u 1 1 σ 2 2 n 1- f(u) 1- u1 doc.Ing. Zlata Sojková, CSc. 2 u1 22 2 After transformation we get 1- P x u x u 1 1 n n 2 2 - sampling error - half of the interval, determinates accurancy of the estimation, Interval estimate is actually point estimate , t.j. x Δ x Δ x Δ doc.Ing. Zlata Sojková, CSc. 23 b) The population variance is unknown est 2 = s12 , and the sample size is large, n > 30 x u 1 s1 n We can use N(0,1) 2 c) If the population variance is unknown est 2 = s12 , and the sample size is small (less than 30), n 30 x t ( n-1) s1 n t(n-1) –critical value of Student’s distribution at alfa level and at degrees of freedom doc.Ing. Zlata Sojková, CSc. 24 Example: Based on the point estimator of household expenditure on cigarette and alcohol we will do interval estimation with 95% of probability n=400 379 x s x u 1 2 1 s1 286 x 14.3 n 400 n x 2 1 = 1.96 * 14.3 = 28.03 u1- 0.025 u0.975 1.96 u Excel... NORMSINV(0.975) 973 - 28.03 < < 973 + 28.03, t.j 944.97 < < 1 001.03 With 95% probability we estimate average expenditure from 945 Sk to 1001 Sk. doc.Ing. Zlata Sojková, CSc. 25 Example: It was taken research to investigate the weight loss of carrot, after one week storage. 20 samples of 1 kg weight at the begining of the storage was analyzed and the loss of weight was identified. Average weight loss was 49g with sample standard deviation 4g.We assume, that weight loss have normal distribution. We will estimate average loss of weight with 95% confidence. Because n<30 we will use... s1 x t (n -1) n 4 49 2.09 . 20 1.9 t(n-1) -kvantil Studentovho rozdelenia, t0.05(19)=2.09 TINV(0.05;19) - Excel With 95 % confidence, average weight loss of 1kg carrot sample is in interval 47.1g to 50.9g 47.1 50.9doc.Ing. Zlata Sojková, CSc. 26 The large of confidence error depends on the?? confidence probability (1- ) mean error of average which depends on: Variability of attributes - we can’t change it , Sample size . That we can change !!! The sample size which we need for achievement of reliability an accuracy we can determinate using next formula: n u 2 1-/2 doc.Ing. Zlata Sojková, CSc. 2 1 2 s 27 Confidence Interval for variance 2 a 2 1 / 2 χ χ 2 f(2) /2 2 1-/2 1- 2 /2 1 Critical values of CHÍ-square distribution 2 doc.Ing. Zlata Sojková, CSc./2 /2 χ2 Pχ 2 (n 1 )s 2 1 χ σ2 28 After transformation we receive: (n - 1)s P 2 χ /2 2 1 2 (n - 1)s χ12 / 2 2 1 1 Respectively confidence interval for standard deviation: (n - 1)s P 2 χ / 2 2 1 (n - 1)s 2 χ1 / 2 doc.Ing. Zlata Sojková, CSc. 2 1 1 29 Questions What is relevant difference between point and interval estimation? How boundary interval depends on the confidence level?? How confidence level influences the accuracy of the confidence interval How can we assure interval estimate of mean with chosen confidence and accurancy? doc.Ing. Zlata Sojková, CSc. 30