P2010 Lecture Notes Sampling, Sampling Distributions Ch 5 Samples vs. Populations Population: A complete set of observations or measurements about which conclusions are to be drawn. Sample: A subset or part of a population. Not necessarily random Statistics vs. Parameters Parameter: A summary characteristic of a population. Summary of Central tendency, variability, shape, correlation E.g., Population mean, Population Standard Deviation, Population Median, Proportion of population of registered voters voting for Bush, Population correlation between Systolic & Diastolic BP Statistic: A summary characteristic of a sample. Any of the above computed from a sample taken from the population. E.g., Sample mean, Sample Standard Deviation, median, correlation coefficient Inferential Statistics We take a sample and compute a description of a characteristic of the sample – central tendency (usually), variability or shape. That is, we compute the value of a sample statistic. We use the sample statistic to make an educated guess about the corresponding population parameter. The basic concept is easy. The devil is in the details. Biderman’s P201 Handouts Topic 10: Probability and Sampling Distributions - 1 2/5/2016 Types of sampling techniques Random Sampling Every element of the population must have the same probability of occurrence and every combination of elements must have the same probability of occurrence. Usually done by having a computer program generate a “random” order for selection of participants. Very difficult to achieve in practice. Systematic Sampling. Every Kth element of a population. The first person is selected arbitrarily. xxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxk . . . Stratified Sampling Stratum: A subgroup of a population. When different strata of a population may give different responses to a survey question, survey researchers will usually attempt to make sure that each stratum is represented in a sample. Such sampling is called stratified sampling. Typical strata: Gender groups, Ethnic groups, political groups, likelihood of voting groups. Convenience Sampling Taking whoever is available, without any attempt to randomly pick from a population or to stratify. Most samples in psychology are convenience samples. Biderman’s P201 Handouts Topic 10: Probability and Sampling Distributions - 2 2/5/2016 The Researcher’s Curse: Variation of sample statistics from sample to sample Research involves taking samples and making decisions based on the sample results. Unfortunately, sample characteristics vary from one sample to the next. So, my decision based on a sample I took might be different from your decision based on a sample you took. This means that to perform research, we have to know something about how sample characteristics vary from sample to sample. Sampling Distributions (Should be called Sample Statistic Distributions) Consider a population of IQ scores. (Illustrated on Corty p. 139) Here’s part of the population . . . 86 99 96 123 96 100 102 95 112 98 117 116 111 92 106 110 100 113 113 77 98 81 73 89 92 115 135 110 93 . . . 95 72 73 95 125 97 95 95 120 97 95 110 85 100 116 79 101 101 105 82 64 112 116 106 68 126 93 107 99 79 113 93 125 101 111 80 84 85 97 104 123 96 75 91 112 93 77 93 104 106 121 83 108 103 101 123 92 102 111 116 93 83 111 114 72 109 82 88 99 102 96 80 83 121 87 93 73 77 115 111 109 100 87 96 88 95 83 117 120 82 99 106 100 106 85 93 135 90 93 116 115 83 126 107 90 86 70 111 94 88 87 69 93 71 74 106 81 126 89 81 106 104 85 116 97 92 122 103 81 92 106 97 104 108 61 95 104 102 98 93 78 105 54 106 107 109 89 97 83 78 110 98 95 105 121 79 121 118 131 108 91 119 101 133 93 83 88 115 123 101 89 Now consider taking a sample of size 4 from that population. Compute the mean of that sample. Now repeat the above steps 1000's of 1000's of times. The result is a population of sample means. The frequency distribution of the sample means is called the Sampling Distribution of Means. A few of the sample means. Values of sample mean Biderman’s P201 Handouts Topic 10: Probability and Sampling Distributions - 3 2/5/2016 Simulating taking samples from a population . . . Open and run the Syntax file “Input program to simulate sampling disltribution of means.sps”. Dot plot of population . . . A few means of samples of size 4 . . . A few means of samples of size 25 . . . Report y Mean 88.25 N Mean 111.25 N Mean 95.00 N Mean 97.50 N Mean 109.50 N Mean 94.00 N Mean 100.00 N Mean 95.50 N 4 Report Std. Deviation 11.815 Mean 102.08 Std. Deviation 19.873 Mean 99.88 Std. Deviation 23.721 Mean 100.24 Std. Deviation 8.347 Mean 102.68 Std. Deviation 12.897 Mean 102.56 Std. Deviation 16.793 Mean 98.76 Std. Deviation 12.884 Mean 101.28 Std. Deviation 14.012 Mean 100.60 y 4 Report y 4 Report y 4 Report y 4 Report y 4 Report y 4 Report y 4 Mean 101.20 Biderman’s P201 Handouts Report y N 25 Report y N 25 Report y N 25 Report y N 25 Report y N 25 Report y N 25 Report y N 25 Report y N 25 Report y N 25 Std. Deviation 14.370 Std. Deviation 12.303 Std. Deviation 13.959 Std. Deviation 13.548 Std. Deviation 15.589 Std. Deviation 15.199 Std. Deviation 15.339 Std. Deviation 14.483 Std. Deviation 19.489 Topic 10: Probability and Sampling Distributions - 4 2/5/2016 Three theoretical facts and one practical fact about the distribution of sample means . . . The theoretical facts are about 1) central tendency, 2) variability, and 3) shape . . . 1. The mean of the population of sample means will be the same as the mean of the population from which the samples were taken. The mean of the means is the mean. µM = µ from Corty, p. 140.) Implication: The sample mean is an unbiased estimate of the population mean. If you take a random sample from a population, it is just as likely to be smaller than the population mean as it is to be larger than the population mean. 2. The standard deviation of the population of sample means – called the standard error of the mean will be equal to d original population's standard deviation divided by the square root of N, the size of each sample. (Corty, Eq. 5.1, p 142) In Corty’s notation, σ σM = ---------N The standard deviation (σM) is called the standard error of the mean. Implication: Means are less variable than individual scores. Means are likely to be closer to the population mean than individual scores. You can make a sample mean as close as you want to the population mean if you can afford a large sample. 3. The shape of the distribution of the population of sample means will be the normal distribution if the original distribution is normal or approach the normal as N gets larger in all other cases. This fact is called the Central Limit Theorem. It is the foundation upon which most of modern day inferential statistics rests. See Corty, p. 141. Why do we care about #3: Because we’ll need to compute probabilities associated with sample means when doing inferential statistics. To compute those probabilities, we need a probability distribution. Practical fact 4. The distribution of Z's computed from each sample, using the formula X-bar - M Z = -------------------- --------N will be or approach (as sample size gets large) the Standard Normal Distribution with mean = 0 and SD = 1. Another test question: What are three facts about the distribution of sample means – a fact about central, a fact about variability, and a fact about shape of the distribution of sample means? Biderman’s P201 Handouts Topic 10: Probability and Sampling Distributions - 5 2/5/2016