Sampling Distributions : Sampling Distributions Concepts Covered Basic concept of sampling distribution Usage of sampling distributions Central limit theorem Application of Central limit theorem Introduction As a task of statistical inference, we usually follow the following steps: Introduction As a task of statistical inference, we usually follow the following steps: Data collection Collect a sample from the population. Introduction As a task of statistical inference, we usually follow the following steps: Data collection Statistics Collect a sample from the population. Compute a statistics from the sample. Introduction As a task of statistical inference, we usually follow the following steps: Data collection Collect a sample from the population. Statistics Compute a statistics from the sample. Statistical inference From the statistics we make various statements concerning the values of population parameters. For example, population mean from the sample mean, etc. Basic Terminologies Some basic terminology which are closely associated to the above-mentioned tasks are reproduced below. Basic Terminologies Some basic terminology which are closely associated to the above-mentioned tasks are reproduced below. Population A population consists of the totality of the observation, with which we are concerned. Basic Terminologies Some basic terminology which are closely associated to the above-mentioned tasks are reproduced below. Population A population consists of the totality of the observation, with which we are concerned. Sample A sample is a subset of a population. Basic Terminologies Some basic terminology which are closely associated to the above-mentioned tasks are reproduced below. Population Sample Statistical inference A population consists of the totality of the observation, with which we are concerned. A sample is a subset of a population. It is an analysis basically concerned with generalization and prediction. Statistical Inference There are two facts, which are key to statistical inference Statistical Inference There are two facts, which are key to statistical inference 1. Population parameters are fixed number whose values are usually unknown. Statistical Inference There are two facts, which are key to statistical inference 1. Population parameters are fixed number whose values are usually unknown. 2. Sample statistics are known values for any given sample, but vary from sample to sample, even taken from the same population. Statistical Inference There are two facts, which are key to statistical inference 1. 2. Population parameters are fixed number whose values are usually unknown. Sample statistics are known values for any given sample, but vary from sample to sample, even taken from the same population. It is unlikely for any two samples drawn independently, producing identical values of sample statistics. Statistical Inference There are two facts, which are key to statistical inference 1. 2. Population parameters are fixed number whose values are usually unknown. Sample statistics are known values for any given sample, but vary from sample to sample, even taken from the same population. It is unlikely for any two samples drawn independently, producing identical values of sample statistics. The variability of sample statistics is always present and must be accounted for in any inferential procedure Statistical Inference There are two facts, which are key to statistical inference 1. 2. Population parameters are fixed number whose values are usually unknown. Sample statistics are known values for any given sample, but vary from sample to sample, even taken from the same population. It is unlikely for any two samples drawn independently, producing identical values of sample statistics. The variability of sample statistics is always present and must be accounted for in any inferential procedure This variability is called sampling variation. Statistical Inference There are two facts, which are key to statistical inference 1. 2. Population parameters are fixed number whose values are usually unknown. Sample statistics are known values for any given sample, but vary from sample to sample, even taken from the same population. It is unlikely for any two samples drawn independently, producing identical values of sample statistics. The variability of sample statistics is always present and must be accounted for in any inferential procedure This variability is called sampling variation. Note: A sample statistics is random variable and like any other random variable, a sample statistics has a probability distribution. Sampling Distributions A sample statistics is random variable and like any other random variable, a sample statistics has a probability distribution. The distribution is used to describe the variability of sample statistics. Sampling Distributions A sample statistics is random variable and like any other random variable, a sample statistics has a probability distribution. The distribution is used to describe the variability of sample statistics. Definition: Sampling Distribution The sampling distribution of a statistics is the probability distribution of that statistics. Sampling Distributions A sample statistics is random variable and like any other random variable, a sample statistics has a probability distribution. The distribution is used to describe the variability of sample statistics. Definition: Sampling Distribution The sampling distribution of a statistics is the probability distribution of that statistics. เดคis The probability distribution of sample mean (hereafter, will be denoted as ๐ called the sampling distribution of the mean (also, referred to as the distribution of sample mean). Sampling Distributions A sample statistics is random variable and like any other random variable, a sample statistics has a probability distribution. The distribution is used to describe the variability of sample statistics. Definition: Sampling Distribution The sampling distribution of a statistics is the probability distribution of that statistics. เดคis called the sampling The probability distribution of sample mean (hereafter, will be denoted as ๐ distribution of the mean (also, referred to as the distribution of sample mean). เดค, we call sampling distribution of variance (denoted as ๐2) Like ๐ Sampling Distributions A sample statistics is random variable and like any other random variable, a sample statistics has a probability distribution. The distribution is used to describe the variability of sample statistics. Definition: Sampling Distribution The sampling distribution of a statistics is the probability distribution of that statistics. เดคis called the sampling The probability distribution of sample mean (hereafter, will be denoted as ๐ distribution of the mean (also, referred to as the distribution of sample mean). เดค, we call sampling distribution of variance (denoted as ๐2) Like ๐ เดคand ๐2 for different random samples of a population, we are Using the values of ๐ to make inference on the parameters ๐ and ๐2(of the population). Issues with Sampling distribution In practical situation, for a large population, it is infeasible to have all possible samples and hence probability distribution of sample statistics. Issues with Sampling distribution In practical situation, for a large population, it is infeasible to have all possible samples and hence probability distribution of sample statistics. The sampling distribution of a statistics depends on Issues with Sampling distribution In practical situation, for a large population, it is infeasible to have all possible samples and hence probability distribution of sample statistics. The sampling distribution of a statistics depends on the type of the population Issues with Sampling distribution In practical situation, for a large population, it is infeasible to have all possible samples and hence probability distribution of sample statistics. The sampling distribution of a statistics depends on the type of the population the size of the samples, and Issues with Sampling distribution In practical situation, for a large population, it is infeasible to have all possible samples and hence probability distribution of sample statistics. The sampling distribution of a statistics depends on the type of the population the size of the samples, and the method of choosing the samples. Sampling Distribution−Example 1 Problem Consider five identical disks numbered as 1, 2, 3, 4 and 5. Consider an experiment consisting of drawing two disks, replacing the first before drawing the second, and then computing the mean of the values of the two disks. Sampling Distribution−Example 1 Solution Following table lists all possible samples and their means Sample(X) Mean(๐ฟเดฅ) Sample(X) Mean(๐ฟเดฅ) Sample(X) Mean(๐ฟเดฅ) [1,1] 1.0 [2,5] 3.5 [4,4] 4.0 [1,2] 1.5 [3,1] 2.0 [4,5] 4.5 [1,3] 2.0 [3,2] 2.5 [5,1] 3.0 [1,4] 2.5 [3,3] 3.0 [5,2] 3.5 [1,5] 3.0 [3,4] 3.5 [5,3] 4.0 [2,1] 1.5 [3,5] 4.0 [5,4] 4.5 [2,2] 2.0 [4,1] 2.5 [5,5] 5.0 [2,3] 2.5 [4,2] 3.0 [2,4] 3.0 [4,3] 3.5 Consider five identical disks numbered as 1, 2, 3, 4 and 5. Consider an experiment consisting of drawing two disks, replacing the first before drawing the second, and then computing the mean of the values of the two disks. Sampling Distribution−Example 1 contd… Solution Sampling Distribution of means เดค ๐ฑ 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 f(๐ฅาง ) 1 25 2 25 3 25 4 25 5 25 4 25 3 25 2 25 1 25 1. 0 1.5 2. 0 2.5 3. 0 3. 5 4.0 4. 5 5. 0 Consider five identical disks numbered as 1, 2, 3, 4 and 5. Consider an experiment consisting of drawing two disks, replacing the first before drawing the second, and then computing the mean of the values of the two disks. Sampling Distribution−Example 1 contd… Solution Sampling Distribution of means The distribution closely resembles a normal distribution than a uniform distribution. 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Consider five identical disks numbered as 1, 2, 3, 4 and 5. Consider an experiment consisting of drawing two disks, replacing the first before drawing the second, and then computing the mean of the values of the two disks. Sampling Distribution−Example 1 contd… Solution Sampling Distribution of means The distribution closely resembles a normal distribution than a uniform distribution. The mean of the distribution of ๐ฅางvalues is 3 and the variance is 1. 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Consider five identical disks numbered as 1, 2, 3, 4 and 5. Consider an experiment consisting of drawing two disks, replacing the first before drawing the second, and then computing the mean of the values of the two disks. Theorem on Sampling Distribution Theorem: Sampling Distribution of Mean เดค from a random sample of size ๐ drawn from a The sampling distribution of ๐ 2 เดค= ๐ and variance ๐2 = ๐ population with mean ๐ and variance ๐ 2 will have mean ๐ ๐ Theorem on Sampling Distribution Theorem: Sampling Distribution of Mean เดค from a random sample of size ๐ drawn from a population with mean ๐ and The sampling distribution of ๐ 2 เดค= ๐ and variance ๐2 = ๐ variance ๐ 2 will have mean ๐ ๐ With reference to data in Example 1 (considering Uniform distribution) Theorem on Sampling Distribution Theorem: Sampling Distribution of Mean เดค from a random sample of size ๐ drawn from a population with mean ๐ and The sampling distribution of ๐ 2 เดค= ๐ and variance ๐2 = ๐ variance ๐ 2 will have mean ๐ ๐ With reference to data in Example 1 (considering Uniform distribution) For the population, ๐ = 5+1 =3 2 Theorem on Sampling Distribution Theorem: Sampling Distribution of Mean เดค from a random sample of size ๐ drawn from a population with mean ๐ and The sampling distribution of ๐ 2 เดค= ๐ and variance ๐2 = ๐ variance ๐ 2 will have mean ๐ ๐ With reference to data in Example 1 (considering Uniform distribution) For the population, ๐ = 5+1 = 3 2 ๐ 2 = 25−1 = 2 12 Applying the theorem, we have ๐เดค=3๐๐๐ ๐2=1 Hence, the theorem is verified! Central Limit Theorem The Theorem on sampling distribution is also true if we sample from a population เดคwill be approximately with unknown distribution, the sampling distribution of ๐ normal with mean μ and variance ๐2 ๐ Central Limit Theorem The Theorem on sampling distribution is also true if we sample from a population with unknown เดคwill be approximately normal with mean μ and variance ๐ distribution, the sampling distribution of ๐ 2 ๐ This further, can also be established with the famous “central limit theorem”, which is stated below. Central Limit Theorem The Theorem on sampling distribution is also true if we sample from a population with unknown เดคwill be approximately normal with mean μ and variance ๐ distribution, the sampling distribution of ๐ 2 ๐ This further, can also be established with the famous “central limit theorem”, which is stated below. Theorem: Central Limit Theorem If random samples each of size ๐ are taken from any distribution with mean ๐ and เดคwill have a distribution approximately normal variance ๐ 2 , the sample mean ๐ ๐2 with mean ๐ and variance ๐ Central Limit Theorem The Theorem on sampling distribution is also true if we sample from a population with unknown เดคwill be approximately normal with mean μ and variance ๐ distribution, the sampling distribution of ๐ 2 ๐ This further, can also be established with the famous “central limit theorem”, which is stated below. Theorem: Central Limit Theorem If random samples each of size ๐ are taken from any distribution with mean ๐ and variance ๐ 2 , the เดคwill have a distribution approximately normal with mean ๐ and variance ๐ sample mean ๐ 2 ๐ The approximation becomes better as ๐ increases. Applicability of Central Limit Theorem 1 The theorem is an asymptotic result (being exactly true only if n goes to infinity), however the approximation is usually very good for quite moderate values of n. Sample sizes required for the approximation to be useful depend on the nature of the distribution of the population. 2 5 6 3 For populations that resemble the normal, sample sizes of 10 or more are usually sufficient 4 Sample sizes in excess of 30 are adequate for virtually all populations, unless the distribution is extremely skewed If the population is normally distributed, the sampling distribution of the mean is exactly normally distributed regardless of sample size Finally, one very important application of the Central Limit Theorem is the determination of reasonable values of the population mean ๐ REFERENCES Probability and Statistics for Engineers and Scientists (8th Ed.) by Ronald E. Walpole, Sharon L. Myers, Keying Ye (Pearson), 2013. Statistics, Witte, R.S., Witte, J.S., 2013.. Wiley Probability and Statistics, 2nd edition, Moris DeGroot, 1986. Reading, MA: Addison-Wesley. Freund, R.J. & Wilson, W.J. Statistical Methods, Academic Press, San Diego, 1997 Sampling Distributions Concepts Covered Basic concept of sampling distribution Usage of sampling distributions Central limit theorem Application of Central limit theorem Applicability of Central Limit Theorem 1 The theorem is an asymptotic result (being exactly true only if n goes to infinity), however the approximation is usually very good for quite moderate values of n. Sample sizes required for the approximation to be useful depend on the nature of the distribution of the population. 2 5 6 3 For populations that resemble the normal, sample sizes of 10 or more are usually sufficient 4 Sample sizes in excess of 30 are adequate for virtually all populations, unless the distribution is extremely skewed If the population is normally distributed, the sampling distribution of the mean is exactly normally distributed regardless of sample size Finally, one very important application of the Central Limit Theorem is the determination of reasonable values of the population mean ๐ Usefulness of the Sampling Distribution The mean of the sampling distribution of the mean is the population mean. This implies that “on the average” the sample mean is the same as the population mean. Usefulness of the Sampling Distribution The mean of the sampling distribution of the mean is the population mean. This implies that “on the average” the sample mean is the same as the population mean. We therefore say that the sample mean is an unbiased estimate of the population mean. Usefulness of the Sampling Distribution The mean of the sampling distribution of the mean is the population mean. This implies that “on the average” the sample mean is the same as the population mean. We therefore say that the sample mean is an unbiased estimate of the population mean. The variance of the distribution of the sample means is ๐2 ๐ The standard deviation of the sampling distribution (i.e., ๐เต often called the standard error of the mean. ๐ ) of the mean, Usefulness of the Sampling Distribution The mean of the sampling distribution of the mean is the population mean. This implies that “on the average” the sample mean is the same as the population mean. We therefore say that the sample mean is an unbiased estimate of the population mean. The variance of the distribution of the sample means is ๐2 ๐ The standard deviation of the sampling distribution (i.e., ๐เต often called the standard error of the mean. ๐ ) of the mean, If σ is high then the sample mean are not reliable, however for a very large sample size (๐→∞), standard error tends to zero Example−2 Problem An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. Example−2 Problem An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. Solution Here, a sample of 16 bulbs is drawn from the population. ๐๐๐๐๐๐ ๐๐๐๐ = ๐๐๐๐ข๐๐๐ก๐๐๐ ๐๐๐๐ = 800 โ๐๐ Example−2 Problem An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. Solution Here, a sample of 16 bulbs is drawn from the population. ๐๐๐๐๐๐ ๐๐๐๐ = ๐๐๐๐ข๐๐๐ก๐๐๐ ๐๐๐๐ = 800 โ๐๐ ๐. ๐ท ๐๐ ๐ ๐๐๐๐๐ = ๐.๐ท ๐๐ ๐๐๐๐ข๐๐๐ก๐๐๐ ๐ ๐๐๐๐๐ ๐ ๐๐ง๐ = ๐ ๐ = 40 16 = 10 Example−2 Problem An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. Solution Here, a sample of 16 bulbs is drawn from the population. ๐๐๐๐๐๐ ๐๐๐๐ = ๐๐๐๐ข๐๐๐ก๐๐๐ ๐๐๐๐ = 800 โ๐๐ ๐. ๐ท ๐๐ ๐ ๐๐๐๐๐ = ๐.๐ท ๐๐ ๐๐๐๐ข๐๐๐ก๐๐๐ ๐ ๐๐๐๐๐ ๐ ๐๐ง๐ = ๐ ๐ = 40 16 = 10 P(average life of given sample< ๐๐๐ ) = ๐ ๐ฅาง< 775 = ๐ ๐ง < −2.5 = ๐. ๐๐๐๐ Sampling Distribution Theorem: Sampling Distribution of the Difference Between Two Means If independent samples of size ๐1 and ๐2 are drawn at random from two populations, discrete or continuous, with means ๐1 and ๐2 and variances ๐ 21 and ๐22, respectively, then the sampling distribution of the differences of means, ๐1 − ๐2, is approximately normally distributed with mean and variance given by ๐2 ๐1 ๐2 ๐2 ๐๐เดค1 −๐เดค2 = ๐1 − ๐2 and ๐2๐เดค1 −๐เดค2 = 1 + 2 Sampling Distribution Theorem: Sampling Distribution of the Difference Between Two Means If independent samples of size ๐1 and ๐2 are drawn at random from two populations, discrete or continuous, with means ๐1 and ๐2 and variances ๐12 and ๐ 22, respectively, then the sampling distribution of the differences of means, ๐1 − ๐2, is approximately normally distributed with mean and variance given by ๐2 ๐2 ๐1 ๐2 ๐๐เดค1−๐เดค2 = ๐1 − ๐2 and ๐2 ๐เดค1−๐เดค2 = 1 + 2 Hence, ๐= แช 1−๐ แช 2 − ๐ 1 −๐ 2 ๐ ๐ 2 /๐ 1 + ๐ 2 /๐ 2 1 2 Sampling Distribution Theorem: Sampling Distribution of the Difference Between Two Means If independent samples of size ๐1 and ๐2 are drawn at random from two populations, discrete or continuous, with means ๐1 and ๐2 and variances ๐12 and ๐ 22, respectively, then the sampling distribution of the differences of means, ๐1 − ๐2, is approximately normally distributed with mean and variance given by ๐2 ๐2 ๐1 ๐2 ๐๐เดค1−๐เดค2 = ๐1 − ๐2 and ๐2 ๐เดค1−๐เดค2 = 1 + 2 Hence, ๐= แช 1−๐ แช 2 − ๐ 1 −๐ 2 ๐ ๐ 2 /๐ 1 + ๐ 2 /๐ 2 1 2 ๐ is approximately a standard normal variable. Sampling Distribution contd… Theorem: Sampling Distribution of the Difference Between Two Means (contd…) Reproductive property of normal distribution: If ๐1, ๐2, … … , ๐๐ are independent random variables, having normal distribution with mean μ1, μ2, … … , μ๐ and variance ๐12 , ๐22,……, ๐๐2 then the random variable ๐ = ๐ ๐1 1 + ๐2 ๐2 +……+๐ ๐ ๐๐ has normal distribution with mean, ๐๐เดค = ๐1๐1 + ๐2๐2 + …… + ๐๐๐ ๐ variance, ๐ 2 = ๐ 2๐ 2 + ๐ 2๐ 2 + …… + ๐ 2๐ 2 เดค ๐ 1 1 2 2 ๐ ๐ Example−3 Problem Two independent experiments are run in which two different types of paint are compared. Eighteen specimens are painted using type ๐ด, and the drying time, in hours, is recorded for each. The same is done with type ๐ต. The population standard deviations are both known to be 1.0. Assuming that the mean drying time is equal for the two types of paint, find ๐(๐เดค๐ด−๐เดค๐ต > 1.0), where ๐เดค๐ดand ๐เดค๐ตare average drying times for samples of size 18. Example−3 Solution From the sampling distribution of ๐๐ด − ๐๐ต, we know that the distribution is approximately normal with mean ๐ ๐๐ด−๐๐ต = ๐๐ด − ๐๐ต = 0 and variance ๐ ๐ด2 ๐ ๐ต2 2 ๐๐ −๐ = + = 1 + 1 = 1 ๐ด ๐ต ๐๐ด ๐๐ต 18 18 9 Two independent experiments are run in which two different types of paint are compared. Eighteen specimens are painted using type ๐ด, and the drying time, in hours, is recorded for each. The same is done with type ๐ต. The population standard deviations are both known to be 1.0. Assuming that the mean drying time is equal for the two types of paint, find ๐(๐เดค๐ด −๐เดค๐ต > 1.0), where ๐เดค๐ด and ๐เดค๐ต are average drying times for samples of size 18. Example−3 Solution From the sampling distribution of ๐๐ด − ๐๐ต, we know that the distribution is approximately normal with mean ๐ ๐ ๐ด −๐ ๐ต = ๐๐ด − ๐๐ต = 0 and variance ๐2 ๐ ๐ด −๐ ๐ต 2 2 ๐๐ด ๐๐ต = ๐ ๐ด + ๐๐ต = 1 + 1 = 1 18 18 9 แช๐ด − ๐ แช ๐ต = 1.0, Corresponding to the value ๐ we have ๐ง = 1− ๐ ๐ด −๐ ๐ต = 1−0 = 3.0 1/9 1/9 ๐ ๐ > 3.0 = 1 − ๐ ๐ < 3.0 = 1 − 0.9987 = ๐. ๐๐๐๐ Two independent experiments are run in which two different types of paint are compared. Eighteen specimens are painted using type ๐ด, and the drying time, in hours, is recorded for each. The same is done with type ๐ต. The population standard deviations are both known to be 1.0. Assuming that the mean drying time is equal for the two types of paint, find ๐(๐เดค๐ด −๐เดค๐ต > 1.0), where ๐เดค ๐ด and ๐เดค ๐ต are average drying times for samples of size 18. CONCLUSION In this lecture we learned the basics of sampling distribution. Next, we learned three very important theorems of statistics: Theorem on sampling distribution Central limit theorem Sampling distribution of the difference between two means Learners are instructed to understand these theorems theoretically and also via practice problems. In the next lecture, we will look into more concepts of Sampling distribution. REFERENCES Probability and Statistics for Engineers and Scientists (8th Ed.) by Ronald E. Walpole, Sharon L. Myers, Keying Ye (Pearson), 2013. Statistics, Witte, R.S., Witte, J.S., 2013.. Wiley Probability and Statistics, 2nd edition, Moris DeGroot, 1986. Reading, MA: Addison-Wesley. Freund, R.J. & Wilson, W.J. Statistical Methods, Academic Press, San Diego, 1997
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )