Uploaded by ruth.liprini

Statistics for the social sciences - sampling distribution of the mean

advertisement
STATISTICS FOR
THE SOCIAL
SCIENCES: A
SESSION 8: PROBABILITY AND SAMPLES
INTRODUCTION
On a piece of paper, rate how cute you think Delta is
INTRODUCTION
• I am going to calculate:
– Population mean
– Means for each of the samples I draw
• Key things to notice:
1. The mean of each sample ≠ the mean of the population
2. The means of each sample ≠ each other
• On point 1: The discrepancy between the sample mean and the population parameter is called
sampling error
DISTRIBUTION OF SAMPLE MEANS
• A THEORETICAL distribution if we were to collect all the possible random samples of a
population, and plot their means as a distribution
• A sampling distribution: a distribution of statistics (remember, statistics refers to samples)
obtained by selecting all the possible samples of a specific size from a population
– We could plot means, or standard deviations, or variances – NB is that we are plotting info from
each sample into a new distribution
DISTRIBUTION OF SAMPLE MEANS
DISTRIBUTION OF SAMPLE MEANS:
WHY??
• The ultimate reason is for inferential statistics, where we will have to include the concept of
error in our calculations to account for the fact that our sample statistics do not accurately
reflect the population parameters.
• We are going to go on a logical journey:
What the
distribution of
sample means is
Characteristics
of DoSM
Central limit
theorem
Standard error
of M
Inferential stats
and the DoSM
Z-scores and
probability for
sample means
SAMPLES AND DISTRIBUTIONS
S A M P L E D I S T R I BU T I O N
S A M P L I N G D I S T R I BU T I O N
• Is practical
• Is theoretical
• Created by drawing a sample of scores
from a population and plotting each of the
scores onto a frequency distribution table
• Created by drawing all possible random
samples from a population, calculating a
statistic (e.g. mean) for each of these
samples, and then plotting each of the
sample statistics onto a frequency
distribution table
• So far we have spoken about populations
and samples – the distribution of our
sample is called a sample distribution
• Plots individual scores drawn from a single
sample of the population
• Plots individual sample statistics calculated
from multiple samples drawn from a
population
CHARACTERISTICS OF THE
DISTRIBUTION OF SAMPLE MEANS
1.
The distribution of sample means almost always approximates a normal-shaped distribution
– It should make sense that most of the means will cluster around the population mean (μ) and it is
relatively rare to find samples mean that differ greatly from the mean
– This happens even when the associated population doesn’t have a normal distribution (because
#mathematicians)
2.
The larger the sample size, the closer the sample means will be to the population mean
– Stated another way, as sample size increases, the sampling distribution becomes more compact
(clustered around the mean)
CHARACTERISTICS OF THE
DISTRIBUTION OF SAMPLE MEANS
DISTRIBUTION OF SAMPLE MEANS
CENTRAL LIMIT THEOREM
• This theorem provides us with a precise description of the distribution that would be obtained
if you selected every possible sample, calculated every sample mean, and constructed the
distribution of sample means (again, because #mathematicians)
“For any population with mean μ and standard deviation σ, the
distribution of sample means for sample size n will have a mean of μ and
𝜎
a standard deviation of √𝑛 and will approach a normal distribution as n
approaches infinity.”
• Where it says “the distribution of sample means will approach a normal distribution..”
– This magic number is 30
http://onlinestatbook.com/stat_sim/
CENTRAL LIMIT THEOREM: SHAPE OF
DOSM
• The shape of the distribution of sample means tends to be nearly perfectly normal if:
– The population from which the samples are drawn is a normal distribution
– The number of scores per sample is relatively large (n ≥ 30)
CENTRAL LIMIT THEOREM: MEAN OF
DOSM
• The mean of the distribution of sample means = the population mean (μ)
• The means of DoSM is called the expected value of M, as the sample means are “expected” to
be near the population means
µM = µ
•
We use the Greek letter µ because the distribution of means is a kind of population
CENTRAL LIMIT THEOREM: STANDARD
DEVIATION
• The standard deviation of a DoSM is called the standard error of M and is denoted by σM
• Why is it called standard error?
– It provides an estimate of how much distance is expected, on average, between a sample mean and a
population mean
– Because we would ideally want our sample mean to = our population mean, any deviations are
considered “error”
• The standard error describes how spread out the sample means are (variability of scores)
– When it is small, the sample means are close together (clustered around the mean)
– When it is large it implies that the sample means are spread out (big differences from one mean to
another)
• The standard error also measures how well an individual sample mean represents the entire
distribution by telling us how much distance is reasonable to expect between sample mean and
overall mean
CENTRAL LIMIT THEOREM: STANDARD
DEVIATION
• The standard error (σM) is calculated:
𝜎𝑀 =
𝜎
𝑛
OR
𝜎𝑀 =
𝜎2
𝑛
• The magnitude of standard error is consequently affected by
– Sample size
– Population standard deviation
• Sample size: the greater the sample size, the more accurate the sample (as sample size ↑,
error↓)
• Standard deviation: when a sample consists of a single score, σM = σ
THE THREE DISTRIBUTIONS
THE THREE DISTRIBUTIONS
Z-SCORES AND PROBABILITY OF
SAMPLE MEANS
• The process for calculating z-scores of sample means is the same as the calculation for
individual scores, except we infer some information:
– The distribution is normal
– The distribution’s mean = population mean
– Standard error can be calculated
Z-SCORES AND PROBABILITY OF
SAMPLE MEANS
• E.g. population of Stats A scores forms a normal distribution, where μ = 500 and σ = 100. If I
take a random sample of n = 16 students, what is the probability that the sample mean will be
greater than M = 525?
1. Reconceptualise as a proportions problem
2. Infer the relevant characteristics:
– Distribution is normal because population distribution is normal
– Distribution has a mean of 500 because μ = 500
– For n = 16, standard error is:
– 𝜎𝑀 =
𝜎
𝑛
=
100
16
=
100
4
= 25
Z-SCORES AND PROBABILITY OF
SAMPLE MEANS
3.
Calculate the z-score for the stipulated mean score
z=
z =
𝑀−𝜇
𝜎𝑀
525−500
25
25
 z = 25
 z = +1
4.
Use the unit normal table to determine the proportions associated with z = +1. The table
indicates that 0,1587 (15,87%) of the distribution is in the tail
5.
It is therefore relatively unlikely, p = 0,1587, to obtain a random sample of n=16 students
with an average Stats A score > 525.
MORE ABOUT ERROR
S A M P L I N G E R RO R
S TA N DA R D E R RO R
• Refers to the idea that a sample typically
does not provide a perfectly accurate
description of its population
• A measure of the standard/typical distance
between the population mean and a
sample mean
• There will therefore be some discrepancy
between the mean of a sample and the
mean of the corresponding population
• Size of standard error depends on size of
samples
• The discrepancy
dependant)
is
random
• Discrepancy is not random
(sample
SEM
µ
µM
STANDARD ERROR & INFERENTIAL
STATS
• When we move into inferential stats calculations, we
often use the measure of standard error in our
calculations. E.g.:
𝑋−𝜇
𝑀−μ
z=
𝑡=
𝜎
𝑠𝑀
• We can use data from two separate samples used to
draw inferences about the mean difference between
two populations / treatment conditions
EXAMPLE
Download