The normal distribution (Session 08) SADC Course in Statistics Learning Objectives At the end of this session you will be able to: • describe the normal probability distribution • state and interpret parameters associated with the normal distribution • use a calculator and statistical tables to calculate normal probabilities • appreciate the value of the normal distribution in practical situations To put your footer here go to View > Header and Footer 2 The Normal Distribution • In the previous two sessions, you were introduced to two discrete distributions, the Binomial and the Poisson. • In this session, we introduce the Normal Distribution – one of the commonest distributions followed by a continuous random variable • For example, heights of persons, their blood pressure, time taken for banana plants to grow, weights of animals, are likely to follow a normal distribution To put your footer here go to View > Header and Footer 3 Example: Weights of maize cobs Graph shows histogram of 100 maize cobs. Data which follows the bell shape of this histogram are said to follow a normal distribution. To put your footer here go to View > Header and Footer 4 Frequency definition of probability In a histogram, the bar areas correspond to frequencies. For example, there are 3 maize cobs with weight < 100 gms, and 19 maize cobs with weight < 120 gms. Hence, using the frequency approach to probability, we can say that Prob(X<120) = 19/100 = 0.019 The areas under the curve can be regarded as representing probabilities since the curve and edges of histogram would coincide for n=. To put your footer here go to View > Header and Footer 5 Probability Distribution Function f(x) 0.4 Two parameters associated with the normal distribution, its mean and variance 2. 0.2 0.0 x The mathematical expression describing the form of the normal distribution is f(x) = exp(–(x–)2/22)/(22) To put your footer here go to View > Header and Footer 6 Properties of the Normal Distribution f(x) 0.4 0.2 0.0 x • Total area under the curve is 1 • characterised by mean & variance: N(,2 ) • symmetric about mean () • 95% of observations lie within ± 2 of mean To put your footer here go to View > Header and Footer 7 The Standard Normal Distribution This is a distribution with =0 and =1, shown below in comparison with N(0,2), =3. 0.4 f(x) 0.2 -6 -4 -2 0 +2 To put your footer here go to View > Header and Footer +4 x +6 8 The Standard Normal Distribution This is a distribution with =0 and =1. Tables give probabilities associated with this distribution, i.e. for every value of a random variable Z which has a standard normal distribution, values of Pr(Z<z) are tabulated. In graph on right, P=Pr(Z<z). P Symmetry means any area (prob) can be found. 0 z To put your footer here go to View > Header and Footer 9 Calculating normal probabilities Any random variable, say X, having a normal distribution with mean and standard deviation , can be converted to a value (say z) from the standard normal distribution. This is done using the formula z = (X - ) / The z values are called z-scores. The z scores can be used to compute probabilities associated with X. To put your footer here go to View > Header and Footer 10 An example The pulse rate (say X) of healthy individuals is expected to have a normal distribution with mean of 75 beats per minute and a standard deviation of 8. What is the chance that a randomly selected individual will have a pulse rate < 65? We need to find Pr(X < 65) i.e. Pr(X - 75 < 65 - 75) = Pr[ (X – 75/8) < (-10/8) ] = Pr(Z < -1.28) Pr(Z<-1.3) = 0.0968 n) (using tables of the standard normal dist To put your footer here go to View > Header and Footer 11 A practical application Malnutrition amongst children is generally measured by comparing their weight-for-age with that of a standard, age-specific reference distribution for well-nourished children. A child’s weight-for-age is converted to a standardised normal score (an z-score), standardised to and of the reference distribution for the child’s gender and age. Children whose z-score<-2 are regarded as being underweight. To put your footer here go to View > Header and Footer 12 A Class Exercise Similarly to the above, height-for-age is used as a measure of stunting again converted to a standardised z score (stunted if z-score<-2). Suppose for example, the reference distribution for 32 months old girls has mean 91 cms with standard deviation 3.6 cms. What is the probability that a randomly selected girl of 32 months will have height between 83.8 and 87.4 cms? Graph below shows the area required. A class discussion will follow to get the answer. To put your footer here go to View > Header and Footer 13 Depicting required probability as an area under the normal curve 83.8 87.4 91.0 94.6 98.2 Answer = To put your footer here go to View > Header and Footer 14 Is a child stunted? Suppose a 32 month old girl has height-forage value = 82.1 Would you consider this child to be stunted? Discuss this question with your neighbour and write down your answer below. To put your footer here go to View > Header and Footer 15 Cumulative normal distribution Cumulative distribution is given by the function F(x) = P(X ≤a) a x In example above, the shaded area is 0.6, the value of a from tables of the standard normal distribution is 0.726. To put your footer here go to View > Header and Footer 16 a b P(a<X<b) = F(b)-F(a) is the area under the cumulative normal curve between points a and b. To put your footer here go to View > Header and Footer 17 Practical work follows … To put your footer here go to View > Header and Footer 18