Users of statistics

advertisement
Psych 524, 10/10/05
p. 1/5
Normal Distributions and Sampling Distributions (based on Kirk, Ch.9)
The Normal Distribution (9.1)
The normal distribution has many applications in science.
Many populations are normally distributed (e.g., weight, height, IQ scores)
Binomial distributions tend to be normally distributed.
But binomial distributions are just one class of a larger set of sampling
distributions (distributions of (sample) statistics), which are also often
normally distributed. More on this later…
The normal distribution is defined by an equation which tells us what the height
of the distribution is for any given score:
f (X ) 
1
 2
e ( X   )
2
( 2 2 )
(remember, π = 3.142 and e = 2.718)
You will not be asked to apply this formula! However, please note that the
formula is based on the values of µ and σ. Thus, the height of the curve will
vary based on the mean and the standard deviation.
It is also important to know that one standard deviation unit on either side of the
mean marks the inflection point, which is where the shape of the curve turns
from being concave to convex (bulging in to bulging out) and vice versa.
Converting from Scores to Z-Scores to Areas (9.1)
Because the formula allows us to know what score corresponds to what height of
the curve (and, by extension, area under certain intervals of the curve), we can
convert back and forth between one and the other.
Instead of using the formula, though, we can use computers or tables (e.g., Table
D.2) to help us with this.
Because there are an infinite number of normal distributions (because there are an
infinite number of possible means and standard deviations!), using a table to
perform these conversions might become unwieldy…we would have to use an
infinite number of tables.
Psych 524, 10/10/05
p. 2/5
But, all normal distributions take on the same characteristic shape, with a mean of
0 and the inflection points occurring at one standard deviation from the mean.
standard normal distribution
If we convert all scores into standard deviation units (e.g., a score of 115 on
an IQ test with a mean of 100 and a standard deviation of 15 is one standard
deviation unit above the mean), we will not change the basic shape of the
distribution. Instead, this linear transformation changes the scale into one
that can be consistently applied to all normal distributions (mean is 0 and
standard deviation is 1, with the inflection points occurring at one SD unit
above and below the mean).
The distribution that results from this scale change is referred to as the
standard normal distribution.
The scale of the scores is now in standard deviation units, and these scores are
referred to as standard scores, or z-scores.
The formula for computing z-scores takes the general form of
z = (original score-mean)/standard deviation
Again, we’re just quantifying how many standard deviation units a score lies from
the mean. These standard scores can be positive (above the mean) or negative
(below the mean).
Because normal distributions exist in many situations, the exact symbols we use
for the z-score formula vary from context to context. For example:
for a population: z 
for a sample: z 
X 

XX
S
for a sampling distribution of the mean: z 
X  X
X
…more on this later!
In order to use the tables to convert a score (or a statistic) into an area under the
curve, we must first convert the score into a z-score using an appropriate zscore formula. The table will then give you information about area under the
curve. Refer to page 1 of “In Class Exercise”
Psych 524, 10/10/05
p. 3/5
Converting from Areas to Z-Scores to Scores (9.1)
Here we use the opposite process. We know an area, look up area in the table to
get the z-score, then convert the z-score to a score.
To make this final conversion, we can use the same formulas as on page 2, but we
must solve for X. So, for example, for a population:
X =µ + zσ
Refer to page 2 of “In Class Exercise”
Z-scores and Percentile Ranks
Note that the area under a curve represents a proportion; therefore, z-scores can
easily be converted to percentile ranks.
These relationships can be depicted on a normal curve:
It’s important to note that if we were to graph percentile rank on the x-axis using
even intervals, the resulting distribution would be rectangular:
Although percentile ranks are easily interpretable by the general public, they are
less tractable than z-scores and are therefore of little use to statisticians.
Psych 524, 10/10/05
p. 4/5
Sampling Distributions (9.3)
Recall that, in statistical inference the goal is to draw a conclusion about a
population (e.g., make an estimate or conduct a hypothesis test) based on a
sample drawn from that population.
If we repeatedly sample from a given population, the sample statistics (e.g., the
mean and SD) will vary; we can therefore generate a distribution that
illustrates this variability.
Sampling Distribution: theoretical relative frequency distribution of the values of
a statistic (not regular scores!!) that would be obtained by chance from an
infinite number of samples of a particular size drawn from a given population
This is key to understanding inferential statistics!
The sampling distribution allows us to determine what sample statistics are
likely to occur by chance and with what probability.
When we obtain a statistic about a sample, we are interested in knowing
where that statistic falls in a hypothesized sampling distribution. When we
conduct hypothesis tests, our hypotheses are evaluated based on distributions
of hypothetical samples, not distributions of scores.
Sampling Distribution of the Mean (SDOM): theoretical relative frequency
distribution of all means that would be obtained by chance from an infinite
number of samples of a particular size drawn from a given population
Refer to demo at http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ (to try
this at home, your computer will need to have Java installed)
Relationship between SDOM and the Population Distribution of Scores (9.3)
Mean of the SDOM:
The expected value of the mean of the SDOM is simply the mean of the
population of scores; we would expect the “mean of the means” to be the
mean of the population.
Symbolically, we say:
X  X
Psych 524, 10/10/05
p. 5/5
Standard Deviation of the SDOM
Thought question: If we take samples of size 2 or larger, would the means we
obtained be more or less variable than the actual scores?
It turns out that the standard deviation of the SDOM, which is technically
called the standard error, can be quantified as:
ˆ X   X / n
Thus, as the sample size gets larger, the denominator of the equation gets
larger, which means that, all else being equal, as the sample size increases, the
standard error (standard deviation of the SDOM) gets smaller. This is called
the law of large numbers, and it makes sense because larger samples will give
you more accurate estimates of the mean, which means there will be less
variability in those estimates.
Shape of the SDOM
If the population of scores is normally distributed, we would expect the
SDOM to be normally distributed, too.
But what about populations that are not normally distributed?
 Central Limit Theorem: If random samples are selected from a population
with a mean of μ and SD of σ, as the sample size (n) increases, the SDOM
approaches normal with mean of μ and SD of  X / n .
So how big does the sample size need to be for the SDOM of a non-normal
distribution to approach normal? The answer will depend on how non-normal
the distribution is. Many texts say n >= 25 or 30 is large enough, but Kirk
says about 100.
Experimenting with the demo
(http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ ) should give you an
intuitive sense of the central limit theorem.
Download