Samples and Sampling Distributions Ch 5

advertisement
P2010 Lecture Notes
Sampling, Sampling Distributions Ch 5
Samples vs. Populations
Population: A complete set of observations or measurements about which conclusions are to be drawn.
Sample: A subset or part of a population.
Not necessarily random
Statistics vs. Parameters
Parameter: A summary characteristic of a population.
Summary of Central tendency, variability, shape, correlation
E.g., Population mean, Population Standard Deviation, Population Median, Proportion of population of
registered voters voting for Bush, Population correlation between Systolic & Diastolic BP
Statistic: A summary characteristic of a sample. Any of the above computed from a sample taken from the
population.
E.g., Sample mean, Sample Standard Deviation, median, correlation coefficient
Inferential Statistics
We take a sample and compute a description of a characteristic of the sample – central tendency (usually),
variability or shape. That is, we compute the value of a sample statistic.
We use the sample statistic to make an educated guess about the corresponding population parameter.
The basic concept is easy. The devil is in the details.
Biderman’s P201 Handouts
Topic 10: Probability and Sampling Distributions - 1
2/5/2016
Types of sampling techniques
Random Sampling
Every element of the population must have the same probability of occurrence and every combination
of elements must have the same probability of occurrence.
Usually done by having a computer program generate a “random” order for selection of participants.
Very difficult to achieve in practice.
Systematic Sampling.
Every Kth element of a population. The first person is selected arbitrarily.
xxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxk . . .
Stratified Sampling
Stratum: A subgroup of a population.
When different strata of a population may give different responses to a survey question, survey
researchers will usually attempt to make sure that each stratum is represented in a sample. Such
sampling is called stratified sampling.
Typical strata: Gender groups, Ethnic groups, political groups, likelihood of voting groups.
Convenience Sampling
Taking whoever is available, without any attempt to randomly pick from a population or to stratify.
Most samples in psychology are convenience samples.
Biderman’s P201 Handouts
Topic 10: Probability and Sampling Distributions - 2
2/5/2016
The Researcher’s Curse: Variation of sample statistics from sample to sample
Research involves taking samples and making decisions based on the sample results.
Unfortunately, sample characteristics vary from one sample to the next.
So, my decision based on a sample I took might be different from your decision based on a sample you took.
This means that to perform research, we have to know something about how sample characteristics vary
from sample to sample.
Sampling Distributions (Should be called Sample Statistic Distributions)
Consider a population of IQ scores. (Illustrated on Corty p. 139)
Here’s part of the population . . .
86
99
96
123
96 100
102
95 112
98 117 116
111
92 106
110 100 113
113
77
98
81
73
89
92 115 135
110
93 . . .
95
72
73
95 125
97
95
95
120
97
95 110
85 100 116
79
101 101 105
82
64 112 116 106
68 126
93 107
99
79 113
93
125 101 111
80
84
85
97 104
123
96
75
91 112
93
77
93
104 106 121
83 108 103 101 123
92 102 111 116
93
83 111 114
72 109
82
88
99 102
96
80
83 121
87
93
73
77 115 111 109 100
87
96
88
95
83 117 120
82
99 106 100 106
85
93 135
90
93 116 115
83 126 107
90
86
70 111
94
88
87
69
93
71
74 106
81 126
89
81 106 104
85 116
97
92 122
103
81
92 106
97 104 108
61
95 104 102
98
93
78 105
54 106 107 109
89
97
83
78 110
98
95 105 121
79 121 118 131 108
91 119 101 133
93
83
88 115 123 101
89
Now consider taking a sample of size 4 from that population.
Compute the mean of that sample.
Now repeat the above steps 1000's of 1000's of times.
The result is a population of sample means.
The frequency distribution of the sample means is called the Sampling Distribution of Means.
A few of the
sample
means.
Values of sample mean
Biderman’s P201 Handouts
Topic 10: Probability and Sampling Distributions - 3
2/5/2016
Simulating taking samples from a population . . .
Open and run the Syntax file “Input program to simulate sampling disltribution of means.sps”.
Dot plot of population . . .
A few means of samples of size 4 . . .
A few means of samples of size 25 . . .
Report
y
Mean
88.25
N
Mean
111.25
N
Mean
95.00
N
Mean
97.50
N
Mean
109.50
N
Mean
94.00
N
Mean
100.00
N
Mean
95.50
N
4
Report
Std. Deviation
11.815
Mean
102.08
Std. Deviation
19.873
Mean
99.88
Std. Deviation
23.721
Mean
100.24
Std. Deviation
8.347
Mean
102.68
Std. Deviation
12.897
Mean
102.56
Std. Deviation
16.793
Mean
98.76
Std. Deviation
12.884
Mean
101.28
Std. Deviation
14.012
Mean
100.60
y
4
Report
y
4
Report
y
4
Report
y
4
Report
y
4
Report
y
4
Report
y
4
Mean
101.20
Biderman’s P201 Handouts
Report
y
N
25
Report
y
N
25
Report
y
N
25
Report
y
N
25
Report
y
N
25
Report
y
N
25
Report
y
N
25
Report
y
N
25
Report
y
N
25
Std. Deviation
14.370
Std. Deviation
12.303
Std. Deviation
13.959
Std. Deviation
13.548
Std. Deviation
15.589
Std. Deviation
15.199
Std. Deviation
15.339
Std. Deviation
14.483
Std. Deviation
19.489
Topic 10: Probability and Sampling Distributions - 4
2/5/2016
Three theoretical facts and one practical fact about the distribution of sample means . . .
The theoretical facts are about 1) central tendency, 2) variability, and 3) shape . . .
1. The mean of the population of sample means will be the same as the mean of the population from which
the samples were taken. The mean of the means is the mean. µM = µ from Corty, p. 140.)
Implication: The sample mean is an unbiased estimate of the population mean. If you take a random
sample from a population, it is just as likely to be smaller than the population mean as it is to be larger than
the population mean.
2. The standard deviation of the population of sample means – called the standard error of the mean will be equal to d original population's standard deviation divided by the square root of N, the size of each
sample. (Corty, Eq. 5.1, p 142)
In Corty’s notation,
σ
σM = ---------N
The standard deviation (σM) is called the standard error of the mean.
Implication: Means are less variable than individual scores. Means are likely to be closer to the population
mean than individual scores. You can make a sample mean as close as you want to the population mean if
you can afford a large sample.
3. The shape of the distribution of the population of sample means will be the normal distribution if the
original distribution is normal or approach the normal as N gets larger in all other cases. This fact is
called the Central Limit Theorem. It is the foundation upon which most of modern day inferential
statistics rests. See Corty, p. 141.
Why do we care about #3: Because we’ll need to compute probabilities associated with sample means when
doing inferential statistics. To compute those probabilities, we need a probability distribution.
Practical fact
4. The distribution of Z's computed from each sample, using the formula
X-bar - M
Z = --------------------
--------N
will be or approach (as sample size gets large) the Standard Normal Distribution with mean = 0
and SD = 1.
Another test question: What are three facts about the distribution of sample means – a fact about central, a
fact about variability, and a fact about shape of the distribution of sample means?
Biderman’s P201 Handouts
Topic 10: Probability and Sampling Distributions - 5
2/5/2016
Download