Probability distributions By Dr. Ameer kadhim Hussein. M.B.Ch.B. FICMS (Community Medicine). Probability distributions 1.Discrete probability distributions are the binomial distribution and Poisson distribution. 2. A continuous probability distribution is a probability density function. The area under the smooth curve is equal to 1 and the frequency of occurrence of values between any two points equals the total area under the curve between the two points and the x-axis . The normal distribution The normal distribution is the most important distribution in biostatistics. It is frequently called the Gaussian distribution. It is used for continuous variables . The two parameters of the normal distribution are the mean (µ) and the standard deviation (σ). The graph has a familiar bell-shaped curve. The normal distribution Graph of a normal distribution (Gaussian distribution) 1. It is symmetrical around the mean . 2. The mean, median and mode are all equal. 3. The total area under the curve above the x-axis is 1 square unit. Therefore 50% is to the right of mean and 50% is to the left of mean. 4. Perpendiculars of: ±1 (σ) contain about 68%; ±2 (σ) contain about 95%; ±3 (σ) contain about 99.7% of the area under the curve. Relationship between the normal curve and the standard deviation: frequency All normal curves share this property: the SD cuts off a constant proportion of the distribution of scores:- 68% 95% 99.7% -3 -2 -1 mean +1 +2 +3 Number of standard deviations either side of mean The standard normal distribution A normal distribution is determined by µ and σ. The normal distribution creates a family of distributions depending on whatever the values of µ and σ are. The most important member of that family is the standard normal distribution which has µ =0 and σ =1. Standard z score The standard z score is obtained by creating a variable z whose value is Given the values of µ and σ we can convert a value of x to a value of z and find its probability using the table of normal curve areas. Finding probabilities a. What is the probability that z < -1.96? (1) Sketch a normal curve (2) Draw a line for z = -1.96 (3) Find the area in the table (4) The answer is the area to the left of the line P(z < 1.96) = .0250 Finding probabilities b) What is the probability that z > 1.96? (1) Sketch a normal curve (2) Draw a line for z = 1.96 (3) Find the area in the table (4) The answer is the area to the right of the line; found by subtracting table value from 1.0000; P(z > 1.96) =1.0000 - .9750 = .0250 What is the probability that (-1.96 <z < 1.96)? P(-1.96 < z <1.96) = 0.9750 – 0.0250 = 0.95 Example : If Z is a standard normal distribution, then P( Z < 2) = 0.9772 is the area to the left to 2 and it equals 0.9772. 2 19 Example: P(-2.55 < Z < 2.55) is the area Between -2.55 and 2.55, Then it Equals P(-2.55 < Z < 2.55) =0.9946 – 0.0054 = 0.9892. Example: P(-2.74 < Z < 1.53) is the area Between -2.74 and 1.53. P(-2.74 < Z < 1.53) =0.9370 – 0.0031 = 0.9339. -2.55 -2.74 0 2.55 1.53 20 Example : P(Z > 2.71) is the area to the Right to 2.71. So, P(Z > 2.71) =1 – 0.9966 = 0.0034. 2.71 21 Given the following probabilities. Find Z1: 1. P(z ≤ Z1)= 0.0055 2. P(z1≤ z ≤ 2.98)= 0.1117 0.9986-0.1117= 0.8869 z1= -2.54 Z1= 1.21 How to transform normal distribution (X) to standard normal distribution (Z)? This is done by the following formula: z x Example: If X is normally distributed with µ = 3, σ = 2. Find the value of standard normal Z, If X= 6? Answer: x 63 z 1.5 2 Example Suppose that systolic blood pressure among teachers is approximately normally distributed with mean of 140 and standard deviation of 50. Find the probability that a teacher picked at random will have a systolic blood pressure less than 100. We follow the steps to find the solution. (1) Write the given information µ = 140 σ = 50 x = 100 (3) Convert x to a z score P(X<100) = P(Z<100-140/50) = P(Z< -0.8) (4) P (z < -0.8) = 0.2119. (5) Complete the answer: the probability that a teacher picked at random will have a systolic blood pressure less than 100 is 0.2119. Example: In a study of children ages 8 to 15 years. The researchers found that the amount of time children spend in the upright position followed a normal distribution with mean of 5.4 hours and standard deviation of 1.3. If a child selected at random ,then 1-The probability that the child spend less than 3 hours in the upright position 24-hour period P( X < 3) = P(Z < 3 5.4 1.3 ) = P(Z < -1.85) = 0.0322 ------------------------------------------------------------------------2-The probability that the child spend more than 5 hours in the upright position 24-hour period P( X > 5) = P(Z > 5 5.4 1.3 ) = P(Z > -0.31) = 1- 0.3783= 0.6217 ----------------------------------------------------------------------- 4-The probability that the child spend from 4.5 to 7.3 hours in the upright position 24-hour period P( 4.5 < X < 7.3) = P( 4.5 5.4 1.3 = P( -0.69 < Z < 1.46 ) = 0.9279 – 0.2451 = 0.6828 < Z< 7.3 5.4 ) 1.3 Skewed Data Data may have a positive skewness (long tail to the right, or a negative skewness (long tail to the left). Kurtosis Kurtosis indicates data that are bunched together or spread out. Data that are bunched together give a tall, thin distribution which is not normal. This is called leptokurtic. Data that are spread out give a low, flat distribution which is not normal. This is called platykurtic. Kurtosis Thank you