Continuous Random Variables -3 -2 - + +2 +3 Lecture 4 © 2010, All Rights Reserved, Robi Polikar. No part of this presentation may be used without explicit written permission. Such permission will be given – upon request – for noncommercial educational purposes only. Limited permission is hereby granted, however, to post or distribute this presentation if you agree to all of the following: 1. you do so for noncommercial educational purposes; 2. the entire presentation is kept together as a whole, including this entire notice. 3. you include the following link/reference on your site: Robi Polikar, http://engineering.rowan.edu/~polikar. ECE 09.360 Dr. P.’s Clinic Consultant Module in Probability & Statistics in Engineering Today in P&S -3 -2 - + +2 +3 Review of Discrete Random Variables Binomial distribution Hypergeometric and negative binomial distributions Poisson distribution Continuous Random Variables and Their Probability Distributions Probability density (distribution) function Cumulative distribution function Percentiles, Expected Values & Variances of Cont. Random Variables The Normal (Gaussian) Distribution Standard & non-standard normal distribution The normal approximation to the binomial distribution Other continuous distributions Gamma, Beta, Exponential, Chi-Squared and Weibull Distributions © 2010 All Rights Reserved, Robi Polikar, Rowan University Random Variables -3 -2 - + +2 +3 A random variable X is a function that maps every possible event in the space S of a random experiment to a real number. 5 x Other 3 Sword Seabass 4 2 Trout 0.2 0.15 0.1 1 A cumulative distribution function tells us the probability of X assuming a value x or less: F(X)=P(X<x) 0.4 Salmon A probability mass (distribution) function is a function that tells us the probabilities of each observation of X, assuming a specific value P(X=x). Fish p(x) F(x) 1 0.6 The expected value of a random variable is its mean computed as 5 Other 3 Sword Seabass 4 2 Trout Salmon 1 0.4 0.3 0.15 x E( X ) X x p ( x) xD The variance of a RV is the average 2 E X 2 2 squared distance of the data from the mean x p x D © 2010 All Rights Reserved, Robi Polikar, Rowan University Bernoulli Experiments & Associated Distributions -3 -2 - + +2 +3 Each individual trial in a series of n binary outcome (S/F) experiments, where the probability of success p is constant and trials are independent is a Bernoulli Experiment (B.E.). If the samples are selected without replacement, we obtain the hypergeometric distribution. If a B.E. continues until a total of r successes are observed, r.v. the number of failures that precede the r th success is negative bin. dist. If r = 1, then above r.v. is equivalent to the number of trials required to observe the first success, which has a geometric distribution. g x; p p1 p x The r.v. defined as the number of n n x bx; n, p p x 1 p successes among n trials of a Bernoulli x experiment is a binomial r.v.. The probability of observing x successes in n trials is given by the Binomial dist. E(X) = μ= n.p σ2 = np(1 – p) = npq X = the number of S’s in a random sample of size n drawn from a population consisting of M S’s and (N – M) F’s is given by the HG dist. M N M x r 1 r p 1 p x nbx; r , p r 1 x n x h( x; n, M , N ) N n If a r.v. describes the number of a rare event happening over a period of time / area with an average rate of λ, the r.v. has a Poisson distribution, which is the limiting case of a binomial distribution where n →∞ and p→0 while np →λ . e x p x, x! © 2010 All Rights Reserved, Robi Polikar, Rowan University Continuous Random Variables -3 -2 - + +2 +3 A r.v. X is continuous, if its set of possible outcomes include an entire interval, say between a and b. Therefore, a continuous r.v. may assume an infinite number of possible values The actual wattage of a 60W bulb, actual speed of a 3GHz chip The weights / heights of students in a population The exact point on which a spinning pointer lands (but not the number on which the roulette ball lands, why not?) Note that in each case, the r.v. may assume any one of infinite number of values! © 2010 All Rights Reserved, Robi Polikar, Rowan University PDF for Cont. Distributions -3 -2 - + +2 +3 Let X=the actual speed of a 3 GHz chip. For simplicity, let us assume that the actual speed of 3 GHZ chip stays within the interval [2.5 3.5] GHz. Then X may assume any number between 2,500,000,000 Hz and 3,500,000,000 Hz. Now assume we have a device that can measure the actual speed to the nearest 1 MHz. We measure speeds of a certain number of chips and plot its histogram (a) We then get a device that can measure the actual speed to the nearest 1 kHz. We again measure the speeds and plot the histograms…(b) Continuing like this, in the limiting case, if we had a device that could measure the speed precisely (is it possible..?) and plot the histogram, we would obtain the smooth histogram in (c). Such a histogram in the limit case, defines a continuous probability distribution. Now, what is the probability that a chip’s actual speed is 3 GHz? How about P(speed=3.023987423452974562347652937456293475629347524752693475GHz)=? 0.323 0.35 2.5 3 (a) 3.5 0.3 2.5 3 (b) 3.5 2.5 3 (c) 3.5 © 2010 All Rights Reserved, Robi Polikar, Rowan University PDF & CDF -3 -2 - + +2 +3 Unlike the pmf in discrete case, in continuous domain, the points on the pdf do not represent the probabilities, as the probability of any single value on the continuous axis is zero, rather, the pdf represent densities. P(speed=3.2398742345129745623476529374562934756293475 GHz)=0 or P(weight = 175.246541635465468748764546879876523543)=0 or P(weight=175=0; Therefore, we can only compute the probabilities that lie within an interval, say a and b. The actual probability that a r.v. lies between two values is given by the area under the pdf curve between those two points: pdf b f(x) P ( a x b) f ( x)dx 800 700 600 500 400 300 200 2.5 2.9 3 3.1 3.5 Actual speed of a 3GHz chip (GHz) 100 0 xa Weights of NJ residents (lbs) © 2010 All Rights Reserved, Robi Polikar, Rowan University PDF for Continuous Distributions -3 -2 - + +2 +3 Let X be a continuous r.v. Then a probability distribution or probability density function (pdf) of X is a function f (x) such that for any two numbers a and b, b Pa X b f x dx a the graph of f(x) is then called the density curve. For f (x) to be a pdf 1. f (x) > 0 for all values of x. 2. The area of the region between the graph of f and the x – axis is equal to 1. y f ( x) P X f x dx 1 Area = 1 © 2010 All Rights Reserved, Robi Polikar, Rowan University The actual probability is the area under the PDF… I repeat…The Actual Probability is … -3 -2 - + +2 +3 P(a X b) is the area of the shaded region. y f ( x) a b © 2010 All Rights Reserved, Robi Polikar, Rowan University Uniform Distribution -3 -2 - + +2 +3 If all outcomes of a random experiment are equally likely, the random variable is then said to have a uniform distribution. More formally, A continuous r.v. X is said to have a uniform distribution on the interval [a, b] if the pdf of X is 1 a xb f x; a, b b a 0 otherwise f(x) 1 ba a b x © 2010 All Rights Reserved, Robi Polikar, Rowan University Cumulative Distribution Function -3 -2 - + +2 +3 Similar to the discrete case, the cumulative distribution function, F(x) for a continuous rv X is defined for every number x as the P(X ≤ x) by F x P X x x f t dt Note that for each x, F(x) is the area under the density curve to the left of x. f(x) F(x) F (200) P X 200 200 f x dx 0 Conversely, the pdf can be obtained from the cdf as f x F ' x df x dx © 2010 All Rights Reserved, Robi Polikar, Rowan University Cumulative Distribution Function -3 -2 - + +2 +3 f(x) F(x) P100 X 200 200 f x dx 100 F (200) F (100) © 2010 All Rights Reserved, Robi Polikar, Rowan University Percentiles in Cont. Distributions -3 -2 - + +2 +3 Percentiles indicate the relative standing in ordered data: For example, if we are talking about SAT scores, and you are in the 90th percentile, then 90% of all test takers did more poorly then you did, and 10% did better. So 100pth percentile , ), 0 ≤ p ≤ 1, is the value that exceeds 100p% of all scores, and exceeded by 100(1-p)% of all scores. More formally: Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X denoted by ( p) , is defined by p F ( p) ( p) f ( y)dy Thus on a pdf curve, η(p) is the value on the horizontal axis such that 100p% of the area under f(x) lies to the left of η(p) and 100(1-p)% lies to the right! © 2010 All Rights Reserved, Robi Polikar, Rowan University Parameters of Cont. RVs: Mean & Median -3 -2 - + +2 +3 The median of a continuous distribution, denoted by , is the 50th percentile. So satisfies 0.5 F ( ). That is, half the area under the density curve is to the left of . The mean, or expected value, of a cont. r.v. is defined similar to its discrete counterpart, where the summation is replaced by integration: x E X x f x dx Often we wish to compute the expected value of some function h(X) of the r.v. X. Simply think of h(X) as a new r.v. with the pdf f(x) and hence h X Eh X hx f x dx © 2010 All Rights Reserved, Robi Polikar, Rowan University Variance of a Continuous rv. -3 -2 - + +2 +3 The variance and standard deviation of a continuous rv are also defined similar to their discrete counterparts, with summations replaced by integrals: X f xdx X2 E X 2 2 E X 2 E X 2 X X2 © 2010 All Rights Reserved, Robi Polikar, Rowan University The Normal (Gaussian) Distribution -3 -2 - + +2 +3 By far the most important distribution in all of probability and statistics, because the most commonly occurring in nature • It provides a good explanation for many, but not all, continuously valued phenomena • Physical measurements of length, weight, width, etc., measurement errors, exam scores, quality control results, outcomes of medical diagnostic tests, many financial indicators… • Even if the individual variables of an experiment are not normal, their sum is (CLT) • Even if individual factors affecting an experiment outcome are not normal, their combination that determine the actual experiment outcome is normal !!! • Entirely determined by just two parameters. Knowing them means knowing everything! • Well studied and well understood • Has a nice bell shape to it…! © 2010 All Rights Reserved, Robi Polikar, Rowan University The Normal Distribution -3 -2 - + +2 +3 A continuous r.v. X is said to have a normal distribution with parameters μ and σ (or σ2), where -∞<μ< ∞ and σ>0, if the pdf of X is 1 f x; , e 2 x 2 2 f ( x) 1 e 2 1 x 2 2 68.2% 2 4 95.4% 95.4% 6 99.7% -3 -2 - 99.7% + +2 +3 © 2010 All Rights Reserved, Robi Polikar, Rowan University Computing Normal Distribution -3 -2 - + +2 +3 As in other distributions, to compute the probability of a random variable assuming a particular range values, we need to integrate the area under the normal curve. For example, if the weight of students in this class are normally distributed (probably true) with a mean of say – 170 lbs and a std. dev. of 20 lbs – then the probability of a randomly selected student to have a weight between 185 and 200 lbs would be: 200 P(185 x 200) f ( x)dx x 185 200 x 185 1 2 20 e x 1702 202 dx 0.1598 16% 185 200 © 2010 All Rights Reserved, Robi Polikar, Rowan University Computing Normal Distributions -3 -2 - + +2 +3 The problem however, the integral of the function exp(-x2) cannot be computed analytically! Instead, the integral is computed numerically for a range of values and the results are tabulated. However, tabulating integrals for every possible value of μ and σ is impossible…! Therefore we define the standard normal distribution as The normal distribution with parameter values μ = 0 and σ =1 is called a standard normal distribution. The random variable for this distribution is typically denoted by Z. The pdf is therefore 1 z2 2 f ( z;0,1) e 2 The associated cdf function for Z, typically denoted by Φ(z) is z P( Z z ) z f y;0,1dy z 1 y2 2 e dy 2 We therefore list the tables for std. norm. dist. only. Although the standard norm. dist. does not occur too often, it is very commonly used as a reference distribution. It is straightforward to convert a nonstandard normal dist. to and from standard dist. © 2010 All Rights Reserved, Robi Polikar, Rowan University Using Gaussian Tables -3 -2 - + +2 +3 Area under the curve on each side of zero is 0.5. The curve is symmetric, so the total area is 1 Area B= Φ(0.82) BA C Example: if z=0.82 A=Area under the curve for [0 0.82] : 0.294 B=Total area for [-∞ 0.82]=0.5+0.294=0.794 = Φ(0.82) This value is the probability that z<0.82 C= Total area for [0.82 ∞]=1-B=0.5-A=0.206=1- Φ(0.82) This value is the probability that z>0.82 © 2010 All Rights Reserved, Robi Polikar, Rowan University Using Gaussian Tables -3 -2 - + +2 +3 In some books, the standard cdf Φ(Z) function is tabulated, rather then the pdf from 0 to z P(-0.44<Z<0.82)=Φ(0.82)- Φ(-0.44)=0.7939-0.3300=0.4639 © 2010 All Rights -0.44 Reserved, 0.82 Robi Polikar, Rowan University -3 -2 - Yeah…but what if my μ and σ are not 0 and 1 …? + +2 +3 It is very simple to convert a nonstandard normal pdf to a standard pdf. The procedure which simply includes the following substitution is called standardization. If X has a normal distribution with mean μ and std.dev. σ (denoted as N(μ, σ)), then the variable obtained by (x- μ)/ σ has a standard normal distribution N(0,1) Z X The z-value obtained from this normalization is often called the z-score. It represents the number of std. dev.’s the value “x” is away from its mean Once the substitution is made, the probabilities will remain identical: • Recall our example of class weight distribution being N(170,20), for which we computed P(185<X<200) as 0.1598. you can also reach this solution by x 200 170 1.5 20 x 185 170 z185 0.75 20 z200 Then P(185<X<200) = P(0.5<Z<1.5) Φ(1.5)-Φ(0.75)=0.9332-0.7734=0.1598 (200 is 1.5 σ away from the mean 170) © 2010 All Rights Reserved, Robi Polikar, Rowan University Normal Distribution in Discrete Populations -3 -2 - + +2 +3 Often times we use the normal dist. – a continuous dist. – even for discrete outcomes. Is this kosher…? Say we want to find out the probability of a Rowan student taking 14 credits or less. We can reasonably assume that the distribution is normal, with mean say 12 (average number of credits taken on campus) and a std. dev. of 2.5 We may be tempted to standardize this by (14-12)/2.5…but wait…Gaussian is a continuous distribution, whereas the outcomes of this experiment are always discrete (unless you are taking 12.238423 credits!). But…let’s give it a try: • There are two ways to compute this, one more precise then the other… © 2010 All Rights Reserved, Robi Polikar, Rowan University Continuity Correction -3 -2 - + +2 +3 12.5 P(X*<14)=P[Z<(14-12)/2.5] =P(Z<0.6) = 0.7257 (Gaussian estimated probability) 14 Continuity Correction The true probability P(X*<14.5)=0.881 © 2010 All Rights Reserved, Robi Polikar, Rowan University Normal Approximation to Binomial Dist. -3 -2 - + +2 +3 We note that the previous example is in fact a binomial distribution: For a population of n students, we describe the Bernoulli experiment as the number of courses each student take and define the r.v. X= the number of students taking 14 credits. In a similar thought process, the number of credits taken by any student can be defined as the number of successes and hence modeled as a binomial variable The exact calculation for computing the P(X<14) was in fact computed using the binomial distribution – where the probability of success is 0.5. Of course one can argue that as we move away from mean, p may not be 0.5. After all, probability of any student taking 12, 13 credits may be 0.5, but taking 18 credit would certainly not be. In any case, however, if n.p>5 (better yet if n.p>10) the binomial distribution becomes mostly symmetric, and can be approximated by the normal distribution with μ=np and σ=npq bin(n,p)≈N(np, npq) © 2010 All Rights Reserved, Robi Polikar, Rowan University Example -3 -2 - + +2 +3 At a particular small college the pass rate of Clinic Consultant is 72%. If 500 students enroll in a semester determine the probability that at least 375 students pass. Note that there is no mention of the distribution being normal. However, this is a binomail distribution with a n.p=500*0.72>>10, hence can be approximated as a normal distribution. Hence, np 500(.72) 360 npq 500(.72)(.28) 10 375.5 360 P( X 375) (1.55) = 0.9394 10 © 2010 All Rights Reserved, Robi Polikar, Rowan University Probability Plots -3 -2 - + +2 +3 In most disciplines of science and engineering, it is common practice to assume that the data comes from a Gaussian (normal) distribution, which is in part supported by the Central Limit Theorem (we’ll see later). But is the data really Gaussian? How to tell? There are many ways, the most rigorous ones involve certain hypothesis tests (which we will study later) A simpler way is to use normal probability plots, which shows how well the percentiles (or more accurately, the quantiles) of the observed data match the expected percentiles of the truly normally distributed data. • Quantiles are points taken at regular intervals from the cumulative distribution functions of a random variable © 2010 All Rights Reserved, Robi Polikar, Rowan University -3 -2 - + +2 +3 Probability Plots in Matlab Normplot - Normal probability plot h = normplot(X) displays a normal probability plot of the data in X. For matrix X, normplot displays a line for each column of X. h is a handle to the plotted lines. The plot has the sample data displayed with the plot symbol '+'. Superimposed on the plot is a line joining the first and third quartiles (25th and 75th percentiles) of each column of X (a robust linear fit of the sample order statistics.) This line is extrapolated out to the ends of the sample to help evaluate the linearity of the data. The purpose of a normal probability plot is to graphically assess whether the data in X could come from a normal distribution. If the data are normal the plot will be linear. Other distribution types will introduce curvature in the plot. normplot uses midpoint probability plotting positions. Use probplot when the data included censored observations. If the data does come from a normal distribution, the plot will appear linear. Other probability density functions will introduce curvature in the plot. Examples Generate a normal sample and a normal probability plot of the data. x = normrnd(10,1,25,1); normplot(x) © 2010 All Rights Reserved, Robi Polikar, Rowan University Probability Plots -3 -2 - + +2 +3 y = rand(100,1); normplot(y) x = normrnd(10,1,100,1); normplot(x) Normal Probability Plot 0.997 0.99 0.98 0.99 0.98 0.95 0.95 0.90 0.90 0.75 0.75 Probability Probability Normal Probability Plot 0.997 0.50 0.50 0.25 0.25 0.10 0.10 0.05 0.05 0.02 0.01 0.02 0.01 0.003 0.003 7 8 9 10 Data 11 12 13 0 0.1 0.2 0.3 0.4 0.5 Data 0.6 0.7 0.8 0.9 1 © 2010 All Rights Reserved, Robi Polikar, Rowan University Other Continuous (less worthy) Distributions -3 -2 - + +2 +3 The Normal distribution is a symmetric distribution. Not all phenomena lend itself to symmetric distributions. For such phenomena that generate skewed data, there are non-symmetric distributions as well. These include Gamma distribution Exponential distribution Chi-squared distribution coming soon to this classroom!!! Beta distribution Log-normal distribution Weibull distribution © 2010 All Rights Reserved, Robi Polikar, Rowan University Homework -3 -2 - + +2 +3 Problems from Section 4.5 8, 14, 20, 24 Problems from Section 4.10 2, 4, 8 © 2010 All Rights Reserved, Robi Polikar, Rowan University