Inferential Statistics UMAR.KHAYAM Lecturer in Economics and Biz Statistics KARDAN Institute of Higher Education Kabul, Afghanistan Sampling Theory Sampling Sampling is statistical technique which is used in almost every field in order to collect information and on the basis of this information inferences about the characteristics of a population are made. The value of the population characteristics are summarized by certain numerical descriptive measures, called parameters. The values of the population parameter which are in most situations unknown, would have to be estimated and to get estimates , we resort to sampling. The observations composing a Sample are used to calculate a corresponding numerical descriptive measure, called a statistic. Thus we use statistic to estimate parameters. Advantages of sampling The important advantages of sampling over complete enumeration are briefly stated below: 1) Sampling saves money as it is much cheaper to collect the desired information from a small sample than from the whole population. 2) Sampling saves a lot of time and energy as the needed data are collected and processed much faster than census information. And this is a very important consideration in all types of investigation or surveys. 3) Sampling makes it possible to obtain more detailed information from each unit of the sample as collecting data from a few units of the population (i.e. sample) can be more complete. Sampling With and Without Replacement Samples may be selected with replacement or without replacement. Sampling is said to be with replacement when from a population a sampling unit is drawn, observed and then returned to the population before another unit is drawn. The population in this case remains the same and a sampling unit might be selected more than once. If we sample with replacement, the number of all possible samples of size n that could be n selected is N If on the other hand, a sampling unit is chosen and not returned to the population after it has been observed, the sampling is said to be without replacement. Here the sampling units cannot be selected again for the sample as the unit drawn are not replaced. If we sample without replacement, the number of all possible samples of size n that could be N selected is (n ) Example No.1 · Assume that a population consists of 5 students and the marks obtained by them in a certain statistics class are 20, 15, 12, 16 and 18. Draw all possible random samples of two students when sampling is performed (i) with replacement · (ii) without replacement. · Calculate the mean marks for each sample. · Solution on white board. Sampling Distribution · A sampling distribution is defined as a probability distribution of the value of a statistic such as a mean, a standard deviation etc, computed from all possible samples of the same size, which might be selected with or without replacement from a population. Standard Error · The standard deviation of a sampling distribution of a sample statistic is called the standard error (abbreviated to S.E.) of the statistic. It is denoted by x and is given as x n Exercise 6 page 231 If the standard error of the mean for the sampling distribution of random samples of size 36 from a large population is 2, how large must the size of the sample become if the standard error is to reduce to 1.2? Solution on white board Sampling distribution of the mean The sampling distribution of the mean is the probability distribution or the relative frequency distribution of the means X of all possible random samples of the same size that could be selected from a given population. The mean of this distribution is represented by x and the standard deviation which is called standard error of the mean, by x Cont; Where And x x f x x x f x x f x 2 2 2 Example No. 2 Assume that a population consists of 5 similar containers having the following weights (kilograms). 9.8, 10.2, 10.4, 10.0, and 9.6. a) Find the mean µ and the standard deviation σ of the given population. b) Draw random samples of 2 containers without replacement and calculate the mean weight of each sample. c) Form a sampling distribution of X d) Find the mean and the standard deviation of the sampling distribution of X Solution On White Board Properties of Sampling Distribution of X The sampling distribution of X has the following properties. 1) x 2) In case of sampling with replacement x n In case of sampling without replacement N n x n N 1 Example Page 211 Assume that a uniform population consists of 4 values 0,1,2 and 3. a) Find the mean µ and the standard deviation σ. b) Draw random samples of size 2 with replacement and calculate the mean X of each sample. c) Find the sampling distribution of X d) Find the mean and the standard deviation of the sampling distribution of X e) Verify that x and x n Solution Example Page 211 a) µ ∑X =N = 6 0+1+2+3 = 4 4 Standard Deviation 2 X X σ= 0 0 1 1 2 4 3 9 6 14 ∑X 2 N - = 1.5 2 ∑X N 2 6 = 14 4 4 = √1.25 = 1.1180 Example Page 211 Cont b) Total number of possible samples n 2 N= 4 =16 S.No Sample Mean (X) S.No Sample Mean( X) 1 (0,0) 0 9 (2,0) 1 2 (0,1) 0.5 10 (2,1) 1.5 3 (0,2) 1 11 (2,2) 2 4 (0,3) 1.5 12 (2,3) 2.5 5 (1,0) 0.5 13 (3,0) 1.5 6 (1,1) 1 14 (3,1) 2 7 (1,2) 1.5 15 (3,2) 2.5 8 (1,3) 2 16 (3,3) 3 Example Page 211 Cont c) Sampling Distribution of X X f( X) 0 1/16 0.5 2/16 1 3/16 1.5 4/16 2 3/16 2.5 2/16 3 1/16 Example Page 211 Cont d) Mean and the standard deviation of the sampling distribution of X 2 X f(X) X f(X) (X) f(X) 0 1/16 0 0 0.5 2/16 1/16 0.5/16 1 3/16 3/16 3/16 1.5 4/16 6/16 9/16 2 3/16 6/16 12/16 2.5 2/16 5/16 12.5/16 3 1/16 3/16 9/16 24/16 46/16 Example Page 211 Cont · Mean of X x x f x = 24/16 = 1.5 Standard Error of X x x f x x f x 2 2 2 = 46/16 – (24/16) · · = 2.875 – 2.25 = √ 0.625 = 0.79 Example Page 211 Cont (e) Verification As x 1.5 and µ =1.5 Hence x Also 1.1 1 8 0 1.1 1 8 0 n and x 2 o.7 9 Hen ce x n 0. 7 9 Example Page 214 Given the population 1,1,1,3,4,5,6,6,6 and 7, find the probability that a random sample of size 36, selected with replacement, will yield a sample mean greater than 3.8 but less than 4.5 if the mean is measured to the nearest tenth. Solution on White Board Z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.49865 0.49903 0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4865 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987 0.4991 0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4983 0.4987 0.4991 0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4485 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988 0.4991 0.04 0.0159 0.0557 0.0948 0.1331 0.1700 0.2054 0.2380 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988 0.4992 0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2083 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989 0.4992 0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989 0.4992 0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3990 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4758 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4980 0.4985 0.4989 0.4992 0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2518 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4430 0.4535 0.4625 0.4690 0.4762 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990 0.4993 0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3880 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990 0.4993 Exercise Page 230 Thinking Challenge · Random sample of size 54 is drawn with replacement, from a finite population 2,4,and 6. what is the probability that a sample mean will be greater than 4.1 but less than 4.4? Assume the means to be measured to the nearest tenth. Example Page 217 Given the population 1,1,1,3,4,5,6,6,6 and 7, find the mean and standard deviation of the sampling distribution of means for samples of size 4 selected at random without replacement. Between what two would you expect at least ¾ of the Sample means to fall? Solution on White Board Example page 218 An electrical firm manufactures light bulbs that have a length of life that is normally distributed , with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. Solution on white board Exercise 8 Page 231 Thinking Challenge If all possible samples of size 16 are drawn from a normal population with mean equal to 50 and standard deviation equal to 5, what is the probability that a sample mean X will fall in the interval from 47.63 to 49.5? Exercise 10 Page 231 The heights of 1000 students are approximately normally distributed with a mean of 174.5 centimeters and a standard deviation of 6.9 centimeters. If 200 random samples of size 25 are drawn from this population and the means recorded to the nearest tenth of a centimeters, Determine a) The mean and standard error of the sampling distribution of X b) The number of sample means that fall between 172.5 and 175.8 centimeters inclusive. c) The number of sample means falling below 172.0 centimeters. Solution on white board Exercise 12 page 232 Thinking challenge · If a certain machine makes electrical resistors having a mean resistance of 40 ohms and a standard deviation of 2 ohms, what is the probability that a random sample of 36 of these resistors will have a combined resistance of more than 1458 ohms? Sampling distribution of the difference between means · Suppose we have two distinct populations with means 1 and 2 2 2 and variances 1 and 2 respectively. Let independent random samples of sizes be selected from the respective populations, and the differences X 1 X 2 between the means of all possible pairs of samples be computed. · Then, a probability distribution of the differences X 1 X 2 can be obtained. Such a distribution is called the sampling distribution of the differences of sample means X 1 X 2 . Properties of the sampling distribution of X1 X 2 · The sampling distribution of the differences has the following properties. · 1) x x 1 2 1 2 · 2) 2 x1 x2 . n1 n1 2 1 2 2 Example page 224 · · · · · Draw all possible random samples of size n1 = 2 with replacement from a finite population consisting of 4, 6, 8.Similarly, draw all possible random samples of size n = 2 with replacement from another finite population consisting of 1, 2, 3. a) Find the possible differences between the sample means of the two population. b) Construct the sampling distribution of X1 X 2 and compute its mean and variance. c) verify that x x 1 2 and 2 x1 x2 1 2 12 22 . n1 n2 Example page 224 cont; SOLUTION: Whenever we are sampling with replacement from a finite population, the total number of possible samples n is N (where N is the population size, and n is the 2 sample size).Hence, in this example, there are (3) = 9 possible samples which can be drawn with replacement from each population. Example page 224 cont; These two sets of samples and their means are given below From Population 1 Sample Sample x1 No. Value 1 4, 4 4 2 4, 6 5 3 4, 8 6 4 6, 4 5 5 6, 6 6 6 6, 8 7 7 8, 4 6 8 8, 6 7 9 8, 8 8 From Population 2 Sample Sample x2 No. Value 1 1, 1 1.0 2 1, 2 1.5 3 1, 3 2.0 4 2, 1 1.5 5 2, 2 2.0 6 2, 3 2.5 7 3, 1 2.0 8 3, 2 2.5 9 3, 3 3.0 Example page 224 cont; a) Since there are 9 samples from the first population as well as 9 from the second, hence, there are 81 possible combinations ofx1 andx2 . The 81 possible differencesx1 –x2 are presented in the next table table: Example page 224 cont; x2 x1 1.0 1.5 2.0 1.5 2.0 2.5 2.0 2.5 3.0 4 5 6 5 6 7 6 7 8 3.0 2.5 2.0 2.5 2.0 1.5 2.0 1.0 1.0 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0 7.0 6.5 6.0 6.5 6.0 5.5 6.0 5.5 5.0 b)The sampling distribution of X1 X 2 is as follows Probability x1 x 2 Tally d f f x1 x 2 df (d) d2 f(d) f d 1.0 | 1 1/81 1/81 1.0/81 1.5 || 2 2/81 3/81 4.5/81 2.0 |||| 5 5/81 10/81 20.0/81 2.5 |||| | 6 6/81 15/81 37.5/81 3.0 |||| |||| 10 10/81 30/81 90.0/81 3.5 |||| |||| 10 10/81 35/81 122.5/81 4.0 |||| |||| ||| 13 13/81 52/81 208.0/81 4.5 |||| |||| 10 10/81 45/81 202.5/81 5.0 |||| |||| 10 10/81 50/81 250.0/81 5.5 |||| | 6 6/81 33/81 181.5/81 6.0 |||| 5 5/81 30/81 180.0/81 6.5 || 2 2/81 13/81 84.5/81 7.0 | 1 1/81 7/81 49.0/81 81 1 324/81 1431/81 Total --- Thus the mean and the variance are x x 1 2 x1 x2 2 324 df d 4 , and 81 d f d df d 2 2 2 1431 324 53 5 16 1.67 81 3 3 81 c) In order to verify the properties of the sampling distribution of X 1 X 2 we first need to compute the mean and variance of the first population: The mean and standard deviation of the first population are: 468 1 6 , and 3 2 2 2 4 6 6 6 8 6 8 2 1 3 3 And the mean and variance of the second population are: 1 2 3 2 , and · 2 3 2 2 2 1 2 2 2 3 2 2 2 3 2 . 3 Verification Now x1x2 4 6 2 1 2 , and 12 n1 22 n2 8 1 2 1 . . 3 2 3 2 4 1 5 3 3 3 1 .6 7 x2 x 1 2 Example 7 Page 227 A sample of size n1= 5 is drawn at random from a population that is normally distributed with mean µ1= 50 2 σ1 and = 9 and the sample mean X1 is recorded. A second random sample of size n2= 4 is selected from a second population that is also normally distributed, with mean 2 2 µ2= 40 and variance σ = 4 and the sample mean X2 is recorded. What is the P(X1 – X2 < 8.2)? Solution on white Board Exercise 21 Page 233 Thinking Challenge Draw all possible random samples of size n1 = 2 with replacement from a finite population consisting of 2, 3 and 7.Similarly, draw all possible random samples of size n2 = 2 with replacement from another finite population consisting of 1, 1 and 3. a) Find the possible differences between the sample means of the two population. b) Construct the sampling distribution of X1 X 2 and compute its mean and variance. 2 2 2 c) verify that x x 1 2 and x1 x2 1 2 . 1 2 n1 n2 Example 8 page 228 Thinking Challenge The television picture tubes of manufacturer A have a mean life time of 6.5 years and a standard deviation of 0.9 years, while those of manufacturer B have a mean life time of 0.6 years and a standard deviation of 0.8 years. What is the probability that a random sample of 36 tubes from manufacturer A will have a mean life time that is at least 1 year more than the mean lifetime of a sample of 49 tubes from manufacturer B? Exercise 23 page 233 A random sample of size 25 is taken from a normal population having a mean of 80 and a standard deviation of 5. a second random sample of size 36 is taken from a different normal population having a mean of 75 and a standard deviation of 3. find the probability that the sample mean computed from the 25 measurements will exceed the sample mean computed from the 36 measurements by at least 3.4 but less than 5.9. Solution on white board Exercise 25 page 233 Thinking Challenge The mean score for freshmen on an aptitude test, at a college, is 540, with a standard deviation of 50. what is the probability that two groups of students selected at random, consisting of 32 and 50 students, respectively, will differ in their mean score by (a) more than 20 points; (b) an amount between 5 and 10 points.