Chapter 5: Normal distribution L5_S1 Normal curve Most frequency distributions have most of the values or observations situated around the mean, with fewer and fewer observations towards the extremes of the range of values. If n is large, the frequency polygons of many biological data distributions are “bell shaped” and look like the figure here. There are two variable constants which define the curve – the Standard Deviation and the Mean. The mathematician Gauss discovered that distributions of this shape, estimated from large samples of measurements drawn from a single population, often agree well with a model of this form. This is the most important distribution in statistics for two main reasons: We see it when a variable is measured for a large number of nominally identical objects, and when the variation may be assumed to be caused by a number of factors, each exerting a small positive or negative random influence on an individual object. L5_S2 Normal distribution example As an example, we can look at the weight of a group of female students, where the variation is caused by many factors such as age, diet, exercise, heights of parents, bone structure and so on. The properties of the normal distribution have very important applications in the statistical theory of drawing conclusions from sample data, about populations from which the samples were drawn. Not all continuous variables are normally distributed however - there are also rectangular or continuous uniform distribution etc L5_S3 Normal curve equation This is the equation that describes the normal curve. It allows us to compute the height of the curve (Y) for a given value of X (data point). L5_S4 Eg. u = 0 This results in a symmetrical curve in which the apex occurs when x is equal to the population mean. In reality, these parameters may vary, continuously generating an infinite variety of normal curves, because there are two parameters in the equation, mu and sigma. So for any given sigma there are an infinite number of normal curves, possibly depending on mu. And therefore, for any given mean (u), an infinity of normal curves is possible, each with a different value of sigma. Here the graph shows normal curves for u = 0 and sigma = 1, 1.5 and 2 L5_S5 Eg sigma = 1 Likewise, for any given sigma there are an infinite number of normal curves, possible depending on mu. Here the graph shows normal curves for sigma = 1 and mu = 0, 1, and 2. L5_S6 x = u A normal curve is symmetrical, with the axis of symmetry passing through the baseline where x = , in other words, through one of the parameters of the curve. The standard normal distribution is a normal distribution with a mean of zero and a standard deviation of 1. Theoretically the two tails never touch the horizontal axis. L5_S7 Probability distribution If the vertical axis of the distribution is re-scaled by dividing by the number of observations it effectively becomes a probability distribution or, strictly a probability density. The total probability encompassed by the density is then also 1. L5_S8 Total area under curve If we say that the total area under the curve is 100% then one of the mathematical properties of the normal curve is that the area bounded by one standard deviation on either side of the central axis is approximately 68.26% of the total area. Or we can say that if the total area is equal to 1, that each standard deviation on either side of the central axis (or the mean) is 0.3413 times 2, which is equal to 0.6826. L5_S9 Probability determination Because we have the exact mathematical equation describing the normal distribution, we can determine probabilities (or proportions) of the normal distribution quite easily. In order to do so, we need a value for x, the mean, the standard deviation for the normal distribution, and a table of “proportions of the normal curve”, which you have in your text book by Zar, table B.2 “proportions of the curve (one-tailed)” on page 483 of the second edition. We will use the following example: we have a normal distribution of values, with a mean of 50 and a standard deviation of 15. We are going to ask “what is the probability of finding a value greater than 75?” L5_S10 Z distribution Since there is an infinite family of normal distributions, we need a way to find the probability without having to have a different table for each possible distribution. So what we do is convert our data point (which is 75) into what is called a “Z score” or a “standard score”. To convert a data point to a z score, we use the formula we have shown here. We call the Z score a standard score because it has no units. Applying the formula will always produce a transformed distribution with a mean of zero and a standard deviation of 1. However, the shape of the distribution will not be affected by the transformation. L5_S11 Z distribution example In our example, Xi = 75, the mean = 50, and the standard deviation is 15. Our Z score is therefore 75 minus 50, divided by 15, which = 1.67. Now we go to table B.2 in Zar, where we have the normal distribution of Z scores. Go down the column in the table labelled “Z” to 1.6, then go to the column under “7”. This will give you the proportion of the normal distribution more extreme than an absolute value of Z = 1.67. The table value is 0.0475. The proportion of the normal distribution greater than 1.67 is therefore 0.0475. The chances (or probability) therefore of finding a value greater than 75 is 0.0475. L5_S12 Area under two tails The proportion greater than 1.67 is 0.0475, but the proportion less than minus 1.67 is also 0.0475. If the proportion less than minus 1.67 is 0.0475, and the proportion greater than 1.67 is 0.0475, then the proportion between minus 1.67 and 1.67 can be easily calculated if you remember that the area under the graph (total proportion) always has to be 1. The two tails together are (0.0475 x 2) which we then subtract from 1 to get an answer of 0.905.