STAT 211 Handout 4 (Chapter 4): Continuous Random Variables A r.v. X is said to be continuous if its set of possible values is an entire interval of numbers. Then a probability distribution of X is f(x) (pdf: probability density b function) such that for any two numbers a and b, P(a X b) f ( x)dx. a Conditions to be a legitimate probability distribution: (i) f(x) 0, for all x (ii) f ( x)dx 1 (area under the entire graph of f(x)). If X is a continuous r.v., then for any number c, P(X=c)=0. x Cumulative Distribution Function for X (cdf) : F(x)= P( X x) f ( y)dy. F(-)=0, F()=1, P(a X b) P(a X b) F (b) F (a) , P(X>a)=P(Xa)=1-F(a) Example 1: A college professor never finishes his lecture before the bell rings to end the period and always finishes his lectures within 2 min after the bell rings. Let X=the time that elapses between the bell and the end of the lecture and suppose the pdf of X is k x 2 , 0 x 2 . Find the value of k which makes f(x) a legitimate pdf. f ( x) 0 , otherwise (Answer:3/8) Example 2: The amount of bread (in hundreds of pounds) that a certain bakery is able to sell in a day is a random variable with probability function, 0 x5 Ax, f ( x) A(10 x), 5 x 10 0, otherwise (a) Find the value of A which makes f(x) a legitimate pdf. 10 5 10 x2 5 x2 0 Axdx 5 A(10 x)dx A 2 10 x 2 25 A . Then A=1/25 5 0 (b) What is the probability that the number of pounds of bread that will be sold tomorrow is (i) more than 500 pounds 10 1 1 x2 10 x 0.5 P(X>5)= (10 x)dx 25 25 2 5 5 (ii) less than 500 pounds 10 5 5 1 1 x2 xdx 0.5 P(X<5)= 25 25 2 0 0 (iii) between 250 and 750 pounds 5 7.5 1 1 P(2.5<x<7.5)= xdx + (10 x)dx =0.75 25 25 5 2.5 (c) What is the F(x)? x0 0, x 2 0 1 xdx x , 0 x5 0 25 50 F(x)= x 2 1 1 (10 x)dx 1 10 x x 50 5 x 10 2 25 25 2 2 5 1, x 10 Obtaining f(x) from F(x) : If X is a continuous r.v. with pdf f(x) and cdf F(x), then at every x at which the derivative exists, F`(x)=f(x). Percentile of a continuous distribution: Let p be a number between 0 and 1. the (100p)th percentile of the distribution of a continuous r.v. X, denoted by r(p), is defined r ( p) f ( y)dy. by p F (r ( p)) P( X r ( p)) P(Xmedian)=P(X>median)=0.50 Expected value for the continuous random variable, X: E ( X ) x f ( x)dx. Variance for the continuous random variable, X: 2 Var ( X ) E[( X ) 2 ] ( x ) 2 f ( x)dx. or the shortcut is 2 Var ( X ) E ( X 2 ) 2 where E( X 2 ) x 2 f ( x)dx. If h(X) is the function of random variable X, E (h( X )) h( X ) f ( x)dx Var (h( X )) (h( X ) E (h( X ))) 2 f ( x)dx If h(X) is a linear function of X, the rules of the mean and the variance can directly be used instead of going through the mathematics. Example 1(continue): Go back to the example with the college professor in this chapter and show that (i) (ii) (iii) (iv) 0, x kx3 x 3 the cumulative density function F(X)= kx 2 dx , 3 8 0 1, x2 3 the expected value of X is 4k=1.5 kx4 E ( X ) x f ( x)dx x(kx )dx 4 0 2 4k 0 4k =1.5 2 the variance of X is Var(X)= 0 32k 16k 2 =0.15 5 2 kx5 E ( X ) x f ( x)dx x (kx )dx 5 0 2 2 2 2 where Var(X)= E(X )-[E(X)] (vi) 0 x2 2.7 =1.931 k Set F(r)=0.90 and solve for r to find it. r is the 90th percentile of X. 1.5 the median is 3 =1.5874 k Set F(median)=0.50 and solve for median to find it. 90th percentile of X is 2 (v) x0 2 2 0 32k 32k =2.4 0 5 5 2 how to obtain f(x) from F(x). Uniform Distribution: X ~U[a,b] 1 f ( x) , axb ba ab (b a) 2 E(X)= and Var(X)= 2 12 Chebyshev's Inequality : P(| X |) k ) 1 k2 is the probability that the value of 1 . k2 Example 3: If k=2 then the probability that the value of X lies at least 2 standard deviations from its mean is at most 1/4=0.25=25%. or 75% of the values are within two standard deviation of the mean. X lies at least k standard deviations from its mean is at most Empirical Rule: If the population distribution of a random variable is (approximately) normal, then Roughly 68% of the values are within one standard deviation of the mean Roughly 95% of the values are within two standard deviations of the mean Roughly 99.7% of the values are within three standard deviations of the mean Normal Distribution: X ~ N ( , 2 ) A continuous r.v. X is said to have a normal distribution with parameters and 2 where - < < and > 0. The pdf of X is f ( x; , ) 1 e ( x ) 2 / 2 2 , 2 It is symmetric and bell-shaped. x , The standard normal random variable Z X , 2 0 has =0 and 2 =1. Z ~ N(),1) The cdf of Z is (z ) =P(Zz). Appendix table A.3 can be used to compute (z ) . The (100p)th percentile of X with N ( ,2) = + [The (100p)th percentile of Z with N (0,1)] Example 4: In mathematics test, if we assume that your test scores (X) were approximately normally distributed with mean of 76.8 and standard deviation of 13.94. a. If a score below 60 represents a grade of F (failure), approximately what percent of students failed the test? 60 76.8 P(X<60)= P Z =P(Z<-1.21)=0.1131=11.31% 13.94 b. If the cutoff for a grade of A is the lowest score of the top 15%, what is that cutoff point? x * 76.8 P(Xx*)=0.15 then P(Zz*)= P Z =0.15. 13.94 x * 76.8 1.04 then x*=91.2976 By looking at the table, z*= 13.94 c. How many points must be added to the student scores so that only 10% fail (less than 60 be the failing grade)? x * 76.8 P(X<x*)=0.10 then P(Z<z*)= P Z =0.10. 13.94 x * 76.8 1.28 then x*=58.9568. 1.0432 should be added. z*= 13.94 d. If the cutoff for a grade of C is the lowest score of the top 45%, what is that cutoff point? x * 76.8 P(Xx*)=0.45 then P(Zz*)= P Z =0.45. 13.94 x * 76.8 0.13 then x*=78.6122. z*= 13.94 e. What is the 90th percentile of test scores (x)? x * 76.8 P(Xx*)=0.90 then P(Zz*)= P Z =0.90. 13.94 x * 76.8 1.28 then x*=94.6432. z*= 13.94 Example 5:A chemical plant superintendent orders a process shutdown and setting readjustment whenever the pH of the final product falls below 6.9 or above 7.1. The sample pH is normally distributed with unknown and standard deviation 0.05. Determine the probability (a) of readjusting when the process is operating as intended and =7 P(X<6.9 or X>7.1)=P(X<6.9)+P(X>7.1)=P(Z<-2)+P(Z>2)=0.0228+0.0228=0.0456 (b) of failing to readjust when the process is too alkaline and the mean pH is =7.15 P(6.9 X 7.1)=P(X 7.1)-P(X< 6.9)=P(Z -1)-P(Z<-5)=0.1587-0=0.1587 IF X is a random variable whose logarithm is normally distributed then X has a lognormal distribution. Normal Approximation to the Binomial Distribution: Let X be a binomial r.v. based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with x 0.5 np P Z x 0.5 np = np and np(1 p) then P( X x) np(1 p) np(1 p) (check if np10 and n(1-p)10 to use the formula). Example 6 (Exercise 4.50, 6th edition, Exercise 4.48, 5th edition): Suppose that 10% of all steel shafts produced by a certain process are nonconforming but can be reworked (rather than having to be scrapped). Consider a random sample of 200 shafts and let X denote the number among these that are nonconforming and can be reworked. What is the approximate probability that X is (a) At most 30? P(X30)=P(Z2.48)=0.9934 (b) Less than 30? P(X<30)= P(X29)=P(Z2.24)=0.9875 (because X is discrete) (c) Between 15 and 25 (inclusive)? P(15X25)=P(X25)-P(X14)= P(Z1.30)-P(Z-1.30)=0.9032-0.0968=0.8064 The Gamma Distribution: X ~ Gamma ( , ) 1 x 1 e x / , x 0, ( ) It is a skewed distribution. Note that f ( x) ( ) x 1 e x dx, , 0 0 0 ( 1) ( 1), ( 1)!, (1 / 2) , 1 is any positive integer E( X ) , Var ( X ) 2 If =1 then it is called standard gamma distribution. When the random variable is a standard gamma r.v. then the cdf is called the incomplete gamma function (Appendix Table A.4). F(x;,)=F(x/;) If = 1 then it is Exponential(=1/). It is used as a model for the distribution of times between the occurrence of successive events. The gamma distribution is used for the segment of time or and space occurring until some specified number of events has transpired where being the average process rate and being the specified number of events that must transpired as X is reached. The pdf for exponentially distributed random variable is x f ( x) e , x 0, 0 x* P(X>x*)= e E(X)= 1/ and Var(X)=1/ 2 If = r / 2 and = 2 then it is Chi-squared(r). (Appendix table A.7) If the random variable, X is distributed Exponential () then Y=X1/ Weibull(,). has Example 7 (Exercise 4.56, 6th edition, Exercise 4.57, 5th edition): Suppose the time spent by a randomly selected student who uses a terminal connected to a local time sharing facility has a gamma distribution with mean 20 and variance 80 min2. (a) What are the values of and ? E(X)==20 and Var(X)=2=80 then =4 and =5 (b) What is the probability that a student uses the terminal for at most 24 min? P(X24)=P(X/6)=0.715 using the table of incomplete gamma function (c) What is the probability that a student spends between 20 and 40 min using the terminal? P(20X40)=P(5X/10)= P(X/10)- P(X/<5)=0.971-0.56=0.411 using the table of incomplete gamma function Example 8: Find the probability that the time taken for the next two cars to arrive at a tollbooth will be 1 minute or less when =2 per minute. X~Gamma(=2,=1/2) P(X 1)=P(X/2)=0.594 using the table of incomplete gamma function 1 Or using the pdf of gamma 4 xe 2 x dx =0.594 0 Example 9: Flaws in a reel of high-fidelity radar recording tape occur on the average of once every 10 feet. Determine the probability that the next recording will begin on a flawless stretch of tape over 5 feet long. X~Exponential(=0.1) P(X>5)= 0.1e 0.1x dx =0.6065 5 Example 10: A series system consists of 100 independent units, each with exponential distribution with =0.005. Find the system reliability over a span of t=10. Rs(10)= P( X 10) 100 = 0.005e 0.005x dx 10 100 =(0.9512)100=0.0067 Example 11 (Exercise 4.61, 6th edition, Exercise 4.60, 5th edition): Extensive experience with fans of a certain type used in diesel engines has suggested that the exponential distribution provides a good model for time until failure. Suppose the mean time until failure is 25000 hours. What is the probability that 1 f ( x) e x / 25000 , x>0 25000 (a) A randomly selected fan will last at least 20000 hours? P(X20000)= e 20000/ 25000 A randomly selected fan will last at most 30000 hours? P(X30000)= 1- e 30000/ 25000 A randomly selected fan will last between 20000 and 30000 hours? P(20000X30000)= e 20000/ 25000 - e 30000/ 25000 (b) The lifetime of a fan exceeds the mean value by more than 2 standard deviations? P(X>+2)=P(X>25000+2(25000))= e 75000/ 25000 The lifetime of a fan exceeds the mean value by more than 3 standard deviations? P(X>+3)=P(X>25000+3(25000))= e 100000/ 25000 Exponential distribution shares memoryless property of the geometric distribution then P(Xt+t0|Xt0)=P(Xt). It simply is the distribution of additional lifetime is exactly the same as the original distribution of lifetime. Probability plots: Order the n-sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i-0.5)/n]th sample percentile. Example 12 (Exercise 4.82, 6th edition, Exercise 4.80, 5th edition): Ten observations on bearing lifetime (in hours) are collected. Construct a normal probability plot and comment on the plausibility of the normal distribution as a model for bearing lifetime. i 1 2 3 4 5 6 7 8 9 10 (i-0.5)/n .05 .15 .25 .35 .45 .55 .65 .75 .85 .95 z -1.645 -1.036 -0.675 -0.385 -0.126 0.126 0.385 0.675 1.036 1.645 450 400 350 300 250 200 150 100 50 0 data Data 152.7 172 172.5 173.3 193 204.7 216.5 243.9 262.6 422.6 -2 -1.5 -1 -0.5 0 z 0.5 1 1.5 2 Normal Probability Plot for lifetime ML Estimates - 95% CI 99 ML Estimates 95 Mean 221.38 StDev 74.6053 90 Goodness of Fit Percent 80 AD* 70 60 50 40 30 1.902 20 10 5 1 0 200 400 Data Example 13 (Exercise 4.91, 6th edition, Exercise 4.87, 5th edition): The failure time observations (1000’s of hours) resulted from accelerated life testing of 16 integrated circuit chips of a certain type. Use the corresponding percentiles of the exponential distribution with =1 to construct a probability plot. Comment on the sample having been generated from any exponential distribution. F(x)= 1 e x 1 e x , x 0 That means if the smallest 5 th percentile is observed then F(x)=0.05 and we are trying to find what x is. x can be found as -ln(1-F(x)). Data 11.6 26.5 82.8 179.7 204.6 212.6 229.9 242 244.8 304.3 307.8 359.5 366.7 379.1 502.5 558.9 i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (i-0.5)/n 5/160 15/160 25/160 35/160 45/160 55/160 65/160 75/160 85/160 95/160 105/160 115/160 125/160 135/160 145/160 155/160 x 0.031749 0.09844 0.169899 0.24686 0.330242 0.421213 0.521297 0.632523 0.757686 0.900787 1.067841 1.268511 1.519826 1.856298 2.367124 3.465736 1.2 1 data 0.8 0.6 0.4 0.2 0 0 100 200 300 x 400 500 600