Unit 6: Statistical Inference Standard Error of Means Let a random sample of size n 1 be drawn from a population with mean µ 1 and standard deviation σ1 . Similarly let another random sample of size n 2 be drawn from a population with mean µ2 and standard deviation σ2. Then Mean and Standard Error are πΈ (π₯Μ ) = π ππΈ(π₯Μ ) = π √π Difference in Sample Means Let Μ Μ Μ π₯1 be the mean of sample drawn from the first population and π₯Μ Μ Μ 2 be the mean of sample drawn from the second population. Then the mean and standard error of difference of sample means are: πΈ (π₯Μ 1 − π₯Μ 2 ) = π1 − π 2 π2 π2 ππΈ (πΜ 1 − πΜ 2 ) = √ 1 + 2 π1 π2 Standard Error Of Proportions Let a random sample of size ‘n’ be drawn from a population with proportion P. Then, the sample proportion ‘p’ has mean ‘P’ and standard error ππ ππΈ = √ π πΈ(π) = π Where π = 1 −π Difference of Sample Proportions Let a random sample of size n 1 be drawn from a population with proportion P1. Similarly let another random sample of size n 1 be drawn from a population with proportion P2. Let p1 be the proportion of sample drawn from first population and p2 be the proportion of sample drawn from second population. Then difference in sample proportion is πΈ (π1 − π2 ) = (π1 − π2 ) π1 π1 π2 π2 ππΈ (π1 − π2 ) = √ + π1 π2 Or 1 1 ππΈ( π1 − π2 ) = √ππ ( + ) π1 π2 The Hypothesis Concept A hypothesis is an assumption. To test if your assumption is right or wrong we use the concept of hypothesis. A null hypothesis (H0) is the basic assumption. Lets say you feel the mean height of a class of 20 students is 150cm. This is the null hypothesis. But the alternative hypothesis (H1) can be: a) Mean height does not equal to 150cm b) Mean height is more than 150cm c) Mean height is less than 150cm In order to see if the null hypothesis is correct or the alternative hypothesis is correct we use an appropriate test statistic. The test statistic is the formula given as ππππ = π ππππ£πππ‘ π π‘ππ‘ππ π‘ππ − π»π¦πππ‘βππ‘ππππ π£πππ’π ππ‘ππππππ ππππ’π Depending on the alternative hypothesis the critical region or critical values are estimated. The critical value or region represents where the value of Z cal will fall in. This is represented with a normal curve. If for example the previous case of 150cm being the mean of a class, the various alternative hypothesis can be tested as follows: a) Mean height does not equal to 150cm (two tailed test) b) Mean height is more than 150cm (right tailed test) c) Mean height is less than 150cm (left tailed test) The graph must be drawn to represent the region where Z cal falls in in order to either accept the null hypothesis or reject the null hypothesis The critical values here will help us decide that: α 5% 1% Two Tailed Test -k K -1.96 1.96 -2.58 2.58 One Tailed Test Left (-k) Right (+k) 1.65 -2.33 2.33 If the calculated value of Zcal lies in acceptance region then the null hypothesis is accepted otherwise it is rejected. a) If the test is two tailed and α 5% then null hypothesis is accepted (-1.96 < Zcal < 1.96) b) If the test is right tailed and α 5% then null hypothesis is accepted (Zcal < 1.65) c) If the test is left tailed and α 5% then null hypothesis is accepted (Zcal > -1.65) Test for Population Mean π₯Μ − π 0 π √π π= Where π₯Μ the sample mean, σ is the population standard deviation. If it is unknown then it is replaced by sample standard deviation ‘s’ and ‘n’ is sample size Test for Equality of Means of Two Populations π= π₯Μ 1 − π₯Μ 2 π2 π2 √ 1+ 2 π1 π2 Where Μ Μ Μ π₯1 and Μ Μ Μ π₯ 2 is the sample mean, σ1 and σ2 is the population standard deviation. If it is unknown then it is replaced by sample standard deviation ‘s 1’ and ‘s2 ’ and ‘n1’ and ‘n2’ are sample sizes Test for Population Proportion π= π−π √ππ π π= π₯ π Where x is the sample size and ‘n’ is the sample proportion. Test for Equality of Proportions of Two Populations π= π1 − π2 1 1 √ππ ( + ) π1 π2 Where π1 = π₯1 π₯2 π1 π1 + π2 π2 πππ π2 = πππ π = π1 π2 π1 + π2 t-tests (Small Sample Tests or Samples of Size < 30) This test is used for: a) Finding the significance of mean of a population using small sample b) Test difference between means of two populations. c) Test the difference between means of two populations using paired observations. Testing Mean of a Population using Small Sample The test statistic is π‘= π₯Μ − π π √π − 1 π₯Μ is the sample mean π is the hypothetical mean of the population n is the sample size s is the standard deviation of the sampele given as 2 ∑ π₯2 ∑ π₯π ) π = √ π −( π π Under H0 the test statistic is: π‘= π₯Μ − π π€ππ‘β (π − 1) πππππππ ππ πππππππ π √π − 1 Degrees of freedom are the number of independent observations. If there are ‘n’ observations then π·πΉ = π − π Where ‘c’ is the number of independent constraints. Testing Difference between Means of Two Populations using two Samples Given that two sample sizes of n1 and n2 with means π₯Μ 1 and π₯Μ 2 and standard deviations s1 and s2 we may be interested in testing the hypothesis that the samples come from the same normal population. π‘= π₯Μ 1 − π₯Μ 2 π€ππ‘β (π1 + π2 − 2) π·πΉ 1 1 2 √π π ( + ) π1 π2 ∑(π₯1 − π₯Μ 1 )2 + ∑( π₯2 − π₯Μ 2 )2 π1 + π2 − 2 π π2 = Test The Difference between Means of Two Populations using two Samples The test statistic is π‘= πΜ π π π€ππ‘β (π − 1) π·πΉ √π − 1 ∑ π2 ∑π π π = √ −( ) π π 2 Chi-Square Test If Z1, Z2… Zn be n independently distributed standard normal variables. Then the distribution is 2 2 2 Z 1+Z 2+…Z n with n degrees of freedom. Chi square tests are used for: 1. To test if a population has a given variance / standard deviation 2. To test ‘goodness of fit’ of a theoretical distribution to an observed distribution 3. To test independence of attributes in a contingency table. 2 = Test for Variance / Standard Deviation If the variance of a normal population is not known, we want to test if the population has a given variance. The null hypothesis is: H0: 2 = 2 (population variance is ) 2 The alternative hypothesis is: 1. H1: 2 2. H1: 2 2 the test is one tailed with critical region in the upper tail 3. H1: 2 2 the test is one tailed with critical region in the lower tail. > > 2 The test is two tailed The test statistic is π2 = ππ 2 π2 Is a chi square variate with (n-1) degrees of freedom. If value of s is not given it can be calculated using the following: π 2 = ∑ π’2 ∑π’ −[ ] π π 2 π’ = (π₯ − π΄) To Test Goodness of Fit Suppose there is an observed frequency Oi, let a theoretical frequency Ei be fit to the observed distribution. The null hypothesis is: H0: The theoretical frequency distribution is a good fit to the observed frequency distribution The alternative hypothesis is H1: The theoretical frequency distribution is not a good fit to the observed frequency distribution The test statistic is π2 = ∑ (ππ − πΈπ )2 πΈπ Under H0 , π2 = ∑ (ππ −πΈπ ) 2 πΈπ is a chi square variate with (n-c) d.f n is the number of terms in the π 2 column after pooling frequencies less than 5 with the adjacent ones. c is one more than the number of parameters estimated from the observed distribution. Test for Independence of Attribute Suppose we want to test the independence of two attributes A and B in a population. We apply the Chi Square test as follows: A A Total B a c a+c B b d b+d Total a+b c+d N=a+b+c+d The null hypothesis is: H0: Attributes A and B are independent The alternative hypothesis is: H1: Attributes A and B are not independent. The test statistic is π2 = π (ππ − ππ) 2 (π + π)(π + π)(π + π)(π + π) Under H0 this is a chi square variate with 1 d.f This test is one tailed (upper).