Chapter 5: Normal distribution

advertisement
Chapter 5: Normal distribution
L5_S1 Normal curve
Most frequency distributions have most of the values or observations situated around the mean, with
fewer and fewer observations towards the extremes of the range of values. If n is large, the frequency
polygons of many biological data distributions are “bell shaped” and look like the figure here. There
are two variable constants which define the curve – the Standard Deviation and the Mean.
The mathematician Gauss discovered that distributions of this shape, estimated from large samples
of measurements drawn from a single population, often agree well with a model of this form.
This is the most important distribution in statistics for two main reasons:
We see it when a variable is measured for a large number of nominally identical objects, and when
the variation may be assumed to be caused by a number of factors, each exerting a small positive or
negative random influence on an individual object.
L5_S2 Normal distribution example
As an example, we can look at the weight of a group of female students, where the variation is
caused by many factors such as age, diet, exercise, heights of parents, bone structure and so on.
The properties of the normal distribution have very important applications in the statistical theory of
drawing conclusions from sample data, about populations from which the samples were drawn.
Not all continuous variables are normally distributed however - there are also rectangular or
continuous uniform distribution etc
L5_S3 Normal curve equation
This is the equation that describes the normal curve. It allows us to compute the height of the curve
(Y) for a given value of X (data point).
L5_S4 Eg. u = 0
This results in a symmetrical curve in which the apex occurs when x is equal to the population mean.
In reality, these parameters may vary, continuously generating an infinite variety of normal curves,
because there are two parameters in the equation, mu and sigma. So for any given sigma there are
an infinite number of normal curves, possibly depending on mu.
And therefore, for any given mean (u), an infinity of normal curves is possible, each with a different
value of sigma.
Here the graph shows normal curves for u = 0 and sigma = 1, 1.5 and 2
L5_S5 Eg sigma = 1
Likewise, for any given sigma there are an infinite number of normal curves, possible depending on
mu. Here the graph shows normal curves for sigma = 1 and mu = 0, 1, and 2.
L5_S6 x = u
A normal curve is symmetrical, with the axis of symmetry passing through the baseline where x = ,
in other words, through one of the parameters of the curve. The standard normal distribution is a
normal distribution with a mean of zero and a standard deviation of 1.
Theoretically the two tails never touch the horizontal axis.
L5_S7 Probability distribution
If the vertical axis of the distribution is re-scaled by dividing by the number of observations it
effectively becomes a probability distribution or, strictly a probability density.
The total probability encompassed by the density is then also 1.
L5_S8 Total area under curve
If we say that the total area under the curve is 100% then one of the mathematical properties of the
normal curve is that the area bounded by one standard deviation on either side of the central axis is
approximately 68.26% of the total area. Or we can say that if the total area is equal to 1, that each
standard deviation on either side of the central axis (or the mean) is 0.3413 times 2, which is equal to
0.6826.
L5_S9 Probability determination
Because we have the exact mathematical equation describing the normal distribution, we can
determine probabilities (or proportions) of the normal distribution quite easily. In order to do so, we
need a value for x, the mean, the standard deviation for the normal distribution, and a table of
“proportions of the normal curve”, which you have in your text book by Zar, table B.2 “proportions of
the curve (one-tailed)” on page 483 of the second edition.
We will use the following example: we have a normal distribution of values, with a mean of 50 and a
standard deviation of 15. We are going to ask “what is the probability of finding a value greater than
75?”
L5_S10 Z distribution
Since there is an infinite family of normal distributions, we need a way to find the probability without
having to have a different table for each possible distribution. So what we do is convert our data point
(which is 75) into what is called a “Z score” or a “standard score”. To convert a data point to a z score,
we use the formula we have shown here. We call the Z score a standard score because it has no
units.
Applying the formula will always produce a transformed distribution with a mean of zero and a
standard deviation of 1. However, the shape of the distribution will not be affected by the
transformation.
L5_S11 Z distribution example
In our example, Xi = 75, the mean = 50, and the standard deviation is 15. Our Z score is therefore 75
minus 50, divided by 15, which = 1.67. Now we go to table B.2 in Zar, where we have the normal
distribution of Z scores. Go down the column in the table labelled “Z” to 1.6, then go to the column
under “7”. This will give you the proportion of the normal distribution more extreme than an absolute
value of Z = 1.67. The table value is 0.0475. The proportion of the normal distribution greater than
1.67 is therefore 0.0475. The chances (or probability) therefore of finding a value greater than 75 is
0.0475.
L5_S12 Area under two tails
The proportion greater than 1.67 is 0.0475, but the proportion less than minus 1.67 is also 0.0475.
If the proportion less than minus 1.67 is 0.0475, and the proportion greater than 1.67 is 0.0475, then
the proportion between minus 1.67 and 1.67 can be easily calculated if you remember that the area
under the graph (total proportion) always has to be 1. The two tails together are (0.0475 x 2) which
we then subtract from 1 to get an answer of 0.905.
Download