Unit 6: Statistical Inference

Unit 6: Statistical Inference Standard Error of Means Let a random sample of size n 1 be drawn from a population with mean µ 1 and standard deviation σ1 . Similarly let another random sample of size n 2 be drawn from a population with mean µ2 and standard deviation σ2. Then Mean and Standard Error are 𝐸 (𝑥̅) = 𝜇 𝑆𝐸(𝑥̅ ) = 𝜎 √𝑛 Difference in Sample Means Let ̅̅̅ 𝑥1 be the mean of sample drawn from the first population and 𝑥̅̅̅2 be the mean of sample drawn from the second population. Then the mean and standard error of difference of sample means are: 𝐸 (𝑥̅1 − 𝑥̅ 2 ) = 𝜇1 − 𝜇 2 𝜎2 𝜎2 𝑆𝐸 (𝑋̅1 − 𝑋̅2 ) = √ 1 + 2 𝑛1 𝑛2 Standard Error Of Proportions Let a random sample of size ‘n’ be drawn from a population with proportion P. Then, the sample proportion ‘p’ has mean ‘P’ and standard error 𝑃𝑄 𝑆𝐸 = √ 𝑛 𝐸(𝑝) = 𝑃 Where 𝑄 = 1 −𝑃 Difference of Sample Proportions Let a random sample of size n 1 be drawn from a population with proportion P1. Similarly let another random sample of size n 1 be drawn from a population with proportion P2. Let p1 be the proportion of sample drawn from first population and p2 be the proportion of sample drawn from second population. Then difference in sample proportion is 𝐸 (𝑝1 − 𝑝2 ) = (𝑃1 − 𝑃2 ) 𝑃1 𝑄1 𝑃2 𝑄2 𝑆𝐸 (𝑝1 − 𝑝2 ) = √ + 𝑛1 𝑛2 Or 1 1 𝑆𝐸( 𝑝1 − 𝑝2 ) = √𝑃𝑄 ( + ) 𝑛1 𝑛2 The Hypothesis Concept A hypothesis is an assumption. To test if your assumption is right or wrong we use the concept of hypothesis. A null hypothesis (H0) is the basic assumption. Lets say you feel the mean height of a class of 20 students is 150cm. This is the null hypothesis. But the alternative hypothesis (H1) can be: a) Mean height does not equal to 150cm b) Mean height is more than 150cm c) Mean height is less than 150cm In order to see if the null hypothesis is correct or the alternative hypothesis is correct we use an appropriate test statistic. The test statistic is the formula given as 𝑍𝑐𝑎𝑙 = 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 − 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑉𝑎𝑙𝑢𝑒 Depending on the alternative hypothesis the critical region or critical values are estimated. The critical value or region represents where the value of Z cal will fall in. This is represented with a normal curve. If for example the previous case of 150cm being the mean of a class, the various alternative hypothesis can be tested as follows: a) Mean height does not equal to 150cm (two tailed test) b) Mean height is more than 150cm (right tailed test) c) Mean height is less than 150cm (left tailed test) The graph must be drawn to represent the region where Z cal falls in in order to either accept the null hypothesis or reject the null hypothesis The critical values here will help us decide that: α 5% 1% Two Tailed Test -k K -1.96 1.96 -2.58 2.58 One Tailed Test Left (-k) Right (+k) 1.65 -2.33 2.33 If the calculated value of Zcal lies in acceptance region then the null hypothesis is accepted otherwise it is rejected. a) If the test is two tailed and α 5% then null hypothesis is accepted (-1.96 < Zcal < 1.96) b) If the test is right tailed and α 5% then null hypothesis is accepted (Zcal < 1.65) c) If the test is left tailed and α 5% then null hypothesis is accepted (Zcal > -1.65) Test for Population Mean 𝑥̅ − 𝜇 0 𝜎 √𝑛 𝑍= Where 𝑥̅the sample mean, σ is the population standard deviation. If it is unknown then it is replaced by sample standard deviation ‘s’ and ‘n’ is sample size Test for Equality of Means of Two Populations 𝑍= 𝑥̅1 − 𝑥̅2 𝜎2 𝜎2 √ 1+ 2 𝑛1 𝑛2 Where ̅̅̅ 𝑥1 and ̅̅̅ 𝑥 2 is the sample mean, σ1 and σ2 is the population standard deviation. If it is unknown then it is replaced by sample standard deviation ‘s 1’ and ‘s2 ’ and ‘n1’ and ‘n2’ are sample sizes Test for Population Proportion 𝑍= 𝑝−𝑃 √𝑃𝑄 𝑛 𝑝= 𝑥 𝑛 Where x is the sample size and ‘n’ is the sample proportion. Test for Equality of Proportions of Two Populations 𝑍= 𝑝1 − 𝑝2 1 1 √𝑃𝑄 ( + ) 𝑛1 𝑛2 Where 𝑝1 = 𝑥1 𝑥2 𝑛1 𝑝1 + 𝑛2 𝑝2 𝑎𝑛𝑑 𝑝2 = 𝑎𝑛𝑑 𝑃 = 𝑛1 𝑛2 𝑛1 + 𝑛2 t-tests (Small Sample Tests or Samples of Size < 30) This test is used for: a) Finding the significance of mean of a population using small sample b) Test difference between means of two populations. c) Test the difference between means of two populations using paired observations. Testing Mean of a Population using Small Sample The test statistic is 𝑡= 𝑥̅ − 𝜇 𝑠 √𝑛 − 1 𝑥̅ is the sample mean 𝜇 is the hypothetical mean of the population n is the sample size s is the standard deviation of the sampele given as 2 ∑ 𝑥2 ∑ 𝑥𝑖 ) 𝑠 = √ 𝑖 −( 𝑛 𝑛 Under H0 the test statistic is: 𝑡= 𝑥̅ − 𝜇 𝑤𝑖𝑡ℎ (𝑛 − 1) 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑠 √𝑛 − 1 Degrees of freedom are the number of independent observations. If there are ‘n’ observations then 𝐷𝐹 = 𝑛 − 𝑐 Where ‘c’ is the number of independent constraints. Testing Difference between Means of Two Populations using two Samples Given that two sample sizes of n1 and n2 with means 𝑥̅1 and 𝑥̅ 2 and standard deviations s1 and s2 we may be interested in testing the hypothesis that the samples come from the same normal population. 𝑡= 𝑥̅1 − 𝑥̅ 2 𝑤𝑖𝑡ℎ (𝑛1 + 𝑛2 − 2) 𝐷𝐹 1 1 2 √𝑠𝑐 ( + ) 𝑛1 𝑛2 ∑(𝑥1 − 𝑥̅1 )2 + ∑( 𝑥2 − 𝑥̅ 2 )2 𝑛1 + 𝑛2 − 2 𝑠𝑐2 = Test The Difference between Means of Two Populations using two Samples The test statistic is 𝑡= 𝑑̅ 𝑠𝑑 𝑤𝑖𝑡ℎ (𝑛 − 1) 𝐷𝐹 √𝑛 − 1 ∑ 𝑑2 ∑𝑑 𝑠𝑑 = √ −( ) 𝑛 𝑛 2 Chi-Square Test If Z1, Z2… Zn be n independently distributed standard normal variables. Then the distribution is 2 2 2 Z 1+Z 2+…Z n with n degrees of freedom. Chi square tests are used for: 1. To test if a population has a given variance / standard deviation 2. To test ‘goodness of fit’ of a theoretical distribution to an observed distribution 3. To test independence of attributes in a contingency table. 2 = Test for Variance / Standard Deviation If the variance of a normal population is not known, we want to test if the population has a given variance. The null hypothesis is: H0: 2 = 2 (population variance is ) 2 The alternative hypothesis is: 1. H1: 2 2. H1: 2 2 the test is one tailed with critical region in the upper tail 3. H1: 2 2 the test is one tailed with critical region in the lower tail. > > 2 The test is two tailed The test statistic is 𝜒2 = 𝑛𝑠 2 𝜎2 Is a chi square variate with (n-1) degrees of freedom. If value of s is not given it can be calculated using the following: 𝑠2 = ∑ 𝑢2 ∑𝑢 −[ ] 𝑛 𝑛 2 𝑢 = (𝑥 − 𝐴) To Test Goodness of Fit Suppose there is an observed frequency Oi, let a theoretical frequency Ei be fit to the observed distribution. The null hypothesis is: H0: The theoretical frequency distribution is a good fit to the observed frequency distribution The alternative hypothesis is H1: The theoretical frequency distribution is not a good fit to the observed frequency distribution The test statistic is 𝜒2 = ∑ (𝑂𝑖 − 𝐸𝑖 )2 𝐸𝑖 Under H0 , 𝜒2 = ∑ (𝑂𝑖 −𝐸𝑖 ) 2 𝐸𝑖 is a chi square variate with (n-c) d.f n is the number of terms in the 𝜒 2 column after pooling frequencies less than 5 with the adjacent ones. c is one more than the number of parameters estimated from the observed distribution. Test for Independence of Attribute Suppose we want to test the independence of two attributes A and B in a population. We apply the Chi Square test as follows: A A Total B a c a+c B b d b+d Total a+b c+d N=a+b+c+d The null hypothesis is: H0: Attributes A and B are independent The alternative hypothesis is: H1: Attributes A and B are not independent. The test statistic is 𝜒2 = 𝑁 (𝑎𝑑 − 𝑏𝑐) 2 (𝑎 + 𝑏)(𝑐 + 𝑑)(𝑎 + 𝑐)(𝑏 + 𝑑) Under H0 this is a chi square variate with 1 d.f This test is one tailed (upper).

Unit 6: Statistical Inference

Related documents

Products

Support

Unit 6: Statistical Inference

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib