CONTINUOUS RANDOM VARIABLES The Binomial random variable is called a discrete random variable because it takes on discrete values 0, 1, 2, 3, ..,n. Consider the probability histogram of X, Binomial n =3, p =.7. The areas under the histogram are related to the probability distribution. P(X=2) = P(X 2) = The total area under this probability histogram is p(0) + p(1) + p(2) + p(3) = 1 A continuous random variable X is one which represents measurements that (theoretically) can be made to any degree of accuracy. For example suppose X = the weight (in kg) of a randomly chosen newborn baby. Depending on the accuracy of our scale the weight X of a randomly selected baby could be recorded either as 3 or 3.3 or 3.26 or 3.258 etc. The probability histogram for such a variable has to be formed in a very different way than for a discrete random variable. For example, in the case of “X=the birth weight of a newborn”, we could take a very large sample from the population of newborns, measure the sample birth weights very accurately ( with many decimal of accuracy) and form a histogram of the birth weights using classes of small width. If the vertical scale is adjusted so the total area under the histogram is one then area under the histogram can be used to calculate (approximately) the probabilities. The larger the sample and the more accurate our measurements, the more accurate will be these probabilities. In the illustration below a srs of 812 babies was used to form a probability histogram. (70) Birth Weight (kg) of 812 Newborns Frequency 30 20 10 0 1 2 3 4 5 WT(kg) Let X be the birth weight of a randomly chosen newborn. P(X3) = Shaded Area The larger the sample, the smaller we can make these rectangles, and the “smoother” will be the resulting histogram. Thus a “model” of the distribution could be obtained by fitting a curve to such a histogram and using the area under the curve to calculate the probabilities ( this curve is called a Probability Density Function). Thus P(X3) is the area under the curve to the left of 3. Thus total area under the curve must be 1. Note: As we know there are many different shapes among various populations (e.g. left skewed, right skewed, symmetric etc.). In the class of bell-shaped curves there is a specific one which is called normal curve ( there is a mathematical formula which defined it exactly). If a population can be modeled by this certain bell –shaped curve the population is said to have a Normal ( or Gaussian) Distribution. (71) The Normal Distribution A Normal ( or Gaussian) Population is one that can be modeled by a certain bellshaped curve called the normal curve. The population of weights of newborn babies described above is an example of such a population. In describing such a population, we need to know two quantities, and . stands for the population mean and stands for the population standard deviation. In our example is the mean (average) weight of all newborn babies in the population and measures the spread of the population values about the mean . A Normal (or Gaussian) random variable X represents a randomly chosen measurement sampled from this population. Probabilities about X are found by finding the appropriate areas under the curve i.e. P(Xx) = the area under the normal curve to the left of x. Note: (a) The total area under the curve is 1. (b) For a normal random variable P(X=x) is always 0. Practically speaking this means that if we can measure observations very accurately, the chances of finding a newborn weighing exactly 3.0000000000kg, say, is very small. Thus for all practical purposes (X=3) =0. One consequence of this is that P(Xx) = P(X<x). The population of Z-scores of a normal population is called the Standard Normal Population. A randomly chosen measurement from the standard normal population is denoted by Z. Z is simply the Z-score of a normal random variable X. X- Z= For the standard normal random variable Z: Z = 0 and Z = 1. We will show later that most of the time an observed value of Z will fall between –3 and +3. (72) Probabilities for the Standard Normal Probabilities for the standard normal distribution can be obtained from the Table A on pages T-2 and T-3. Examples: (a) (i) P(Z1.5) = (ii) P (Z>1.5) = (b) (i) P(Z -1.5) = (ii) P(Z-1.5) = (c) P(-1.5 Z 1.5) = (d) P(-1.5 < Z < 2.21) = (73) Probabilities for Normal Random Variables in General Example: The heart rate of patients suffering from heart disease is normally distributed with a mean of 97 beats per minute and a standard deviation of 18 beats per minute. For a randomly chosen patient, find the probability the heart rate is (a) below 80 (b) more than 140, (c) between 55 and 90. Let X = the heart rate of a randomly selected patient; =97, = 18. (a) P(X< 80) = (b) P(X>140) = (c) P(55<X<90) = (74) Example: Let X be any normal random variable with mean and standard deviation . What is the probability that X is within two standard deviations of the mean? First note that the statement “ X is within two standard deviations of the mean” means that X lies between - 2 and + 2. Thus P( X is within two standard deviations of the mean) = P ( - 2 < X < + 2) = (75) Note: Similarly, P(X is within one standard deviations of the mean) = Note: P(X is within three standard deviations of the mean) = (76) Percentiles of the Standard Normal Distribution (Using the Normal Tables backwards) (a) Find the 95th percentile of the standard normal distribution i.e. find the value of z0 such that P( Z z0 ) = .95 ( b) Find z0 such that P ( Z z0 ) = .41 (c)Find z0 such that P( z0 Z 0 ) = .1 (d) Find z0 such that P( -z0 Z z0 ) = .95 (77) Example: Let x be a normal random variable with mean = 100 and standard deviation =10. Find x0 such that (a) P(X x0 ) =.80 [i.e. x0 is the 80th percentile of X] (b) P ( X x0 ) = .025 (78) Example : The scores on the Scholastic Aptitude Test ( SAT) for verbal ability of high school seniors is normally distributed with mean 430 and standard deviation 100. What score must a student attain in order to be in the top 5% of all the students who took the test? Example: The time to first failure for a certain model of television set is normally distributed with mean 5 years and standard deviation 1.56 years. If the manufacturer wishes to repair only 10% of the sets sold, for how long should he guarantee his product? (79) Normal Approximation to Binomial Probabilities Let X be a binomial random variable with parameters n and p. Mean and standard deviation of this distribution are given by = np and = npq , q = 1-p. Then if both np 5 and nq 5 the binomial probabilities may be closely approximated by Normal probabilities in the following way. For a = 0,1,2,3…n a+ .5 -np P(X a) P(Z ) npq Example: An automotive plant employs workers and suffers a daily absentee rate of 1%. (a) What is the mean ( or expected value) of absentees on a given day? (b) What is the variance and standard deviation of the absentees on a given day? (c) Find the probability that on a given day (i) at most 30 of the workers are absent, (ii) at least 60 of the workers are absent, (iii) between 30 and 60 (inclusive) of the workers are absent. Solution: Let, X = the number of workers absent on a given day. Therefore, X is binomial with n =5000 and p =.01. (a) = np = (b) 2 = npq = = (c) Since the values for n and p are not in Table C (Binomial Probabilities) and since using the binomial formula would take a very long time ( not to mention the difficulty of the calculation when n = 5000), we use the normal approximation to the binomial. Check: np = ; nq = (80) (i) P(X 30) (ii) P( X 60 ) = (iii) P ( 30 X 60 ) = (81) Example : A travel agency promotes vacation packages by telephoning households at random in the evening hours. Historically, only 65% of heads of households are at home when the agency calls. Suppose that 30 households are phoned in a given evening. Find the probability that the agency will find (a) between 15 and 25 households, inclusively, with the head of the household at home. (b) fewer than 23 households with the head of the household at home. (c) P( 17 < X < 28) (82)