Discrete Math CS 2800 Prof. Bart Selman selman@cs.cornell.edu Module Probability --- Part d) 1) Probability Distributions 2) Markov and Chebyshev Bounds 1 Discrete Random variable Discrete random variable – Takes on one of a finite (or at least countable) number of different values. – X = 1 if heads, 0 if tails – Y = 1 if male, 0 if female (phone survey) – Z = # of spots on face of thrown die 2 Continuous Random variable Continuous random variable (r.v.) – Takes on one in an infinite range of different values – W = % GDP grows (shrinks?) this year – V = hours until light bulb fails For a discrete r.v., we have Prob(X=x), i.e., the probability that r.v. X takes on a given value x. What is the probability that a continuous r.v. takes on a specific value? E.g. Prob(X_light_bulb_fails = 3.14159265 hrs) = ?? 0 However, ranges of values can have non-zero probability. E.g. Prob(3 hrs <= X_light_bulb_fails <= 4 hrs) = 0.1 3 Probability Distribution The probability distribution is a complete probabilistic description of a random variable. All other statistical concepts (expectation, variance, etc) are derived from it. Once we know the probability distribution of a random variable, we know everything we can learn about it from statistics. 4 Probability Distribution Probability function – One form the probability distribution of a discrete random variable may be expressed in. – Expresses the probability that X takes the value x as a function of x (as we saw before): PX x P( X x) 5 Probability Distribution The probability function – May be tabular: 1 w. p. 1 / 2 X 2 w. p. 1 / 3 3 w. p. 1 / 6 6 Probability Distribution The probability function – May be graphical: .50 .33 .17 1 2 3 7 Probability Distribution The probability function – May be formulaic: 4 x P X x 6 for x 1,2,3 8 1 2 3 X 4 5 6 Probability Distribution: Fair die w. p. 1 / 6 w. p. 1 / 6 w. p. 1 / 6 w. p. 1 / 6 w. p. 1 / 6 w. p. 1 / 6 .50 .33 .17 1 2 3 4 5 6 9 Probability Distribution The probability function, properties PX x 0 for each x P x 1 X x 10 Cumulative Probability Distribution Cumulative probability distribution – The cdf is a function which describes the probability that a random variable does not exceed a value. FX x P X x Does this make sense for a continuous r.v.? Yes! 11 Cumulative Probability Distribution Cumulative probability distribution – The relationship between the cdf and the probability function: FX x P X x PX X y y x 12 PX x P( X x) 1/ 6 Cumulative Probability Distribution FX x P X x PX X y Die-throwing y x tabular 0 1 / 6 2 / 6 FX x 3 / 6 4 / 6 5 / 6 6 / 6 graphical x 1 1 x 2 2 x3 3 x 4 4 x5 5 x6 x6 1 1 2 3 4 5 6 13 Cumulative Probability Distribution The cumulative distribution function – May be formulaic (die-throwing): P X x floor min max x,0 , 6 6 14 Cumulative Probability Distribution The cdf, properties 0 FX x 1 for each x FX x is non - decreasing FX x is continuous from the right 15 Example CDFs Of a discrete probability distribution Of a continuous probability distribution Of a distribution which has both a continuous part and a discrete part. 16 Functions of a random variable It is possible to calculate expectations and variances of functions of random variables E g X g x P X x x 2 V g X g x E g x P X x x 17 Functions of a random variable Example – You are paid a number of dollars equal to the square root of the number of spots on a die. – What is a fair bet to get into this game? P(X=x) Product x x 1 1 1/6 0.167 2 1.414 1/6 0.236 3 1.732 1/6 0.289 4 2 1/6 0.333 5 2.231 1/6 0.372 6 2.449 1/6 0.408 Tot 1.804 18 Functions of a random variable Linear functions – If a and b are constants and X is a random variable – It can be shown that: EaX b aE X b V aX b a V X 2 Intuitively, why does b not appear in variance? And, why a2 ? 19 The Most Common Discrete Probability Distributions (some discussed before) 1) --- Bernoulli distribution 2) --- Binomial 3) --- Geometric 4) --- Poisson 20 Bernoulli distribution The Bernoulli distribution is the “coin flip” distribution. X is Bernoulli if its probability function is: p 1 w. p. X 0 w. p. 1 p X=1 is usually interpreted as a “success.” E.g.: X=1 for heads in coin toss X=1 for male in survey X=1 for defective in a test of product X=1 for “made the sale” tracking performance 21 Bernoulli distribution Expectation: E X p1 1 p0 p Variance: E X V X E X 2 2 p1 1 p 0 p 2 2 2 p p p1 p 2 22 Binomial distribution The binomial distribution is just n independent Bernoullis added up. It is the number of “successes” in n trials. If Z1, Z2, …, Zn are Bernoulli, then X is binomial: X Z1 Z 2 Z n 23 Binomial distribution The binomial distribution is just n independent Bernoullis added up. Testing for defects “with replacement.” – – – – Have many light bulbs Pick one at random, test for defect, put it back Pick one at random, test for defect, put it back If there are many light bulbs, do not have to replace 24 Binomial distribution Let’s figure out a binomial r.v.’s probability function. Suppose we are looking at a binomial with n=3. We want P(X=0): – Can happen one way: 000 – (1-p)(1-p)(1-p) = (1-p)3 We want P(X=1): – Can happen three ways: 100, 010, 001 – p(1-p)(1-p)+(1-p)p(1-p)+(1-p)(1-p)p = 3p(1-p)2 We want P(X=2): – Can happen three ways: 110, 011, 101 – pp(1-p)+(1-p)pp+p(1-p)p = 3p2(1-p) We want P(X=3): – Can happen one way: 111 – ppp = p3 Binomial distribution 0 So, binomial r.v.’s probability function 1 X 2 3 w. p. 1 p 3 w. p. 3 p 1 p 2 w. p. 3 p 2 1 p w. p. p3 PX x # of ways p 1 p x n x n! n x x p 1 p x ! n x ! 26 Binomial distribution Typical shape of binomial: – Symmetric 27 Expectation: n n n E X E Zi E Zi p np i 1 i 1 i 1 Variance: n n V X V Z i V Z i i 1 i 1 n p1 p np1 p i 1 Aside: V X Y V ( X ) V (Y ) 2V ( X ) If V(X) = V(Y). And? But V X X V (2 X ) 4V ( X ) Hmm… 28 Binomial distribution A salesman claims that he closes a deal 40% of the time. This month, he closed 1 out of 10 deals. How likely is it that he did 1/10 or worse given his claim? 29 Binomial distribution PX 0 PX 1 10! 0 10 0.4 0.6 11 0.006 0.006 0!10! 10! 9 0.4 0.6 10 0.4 0.010 0.040 1! 9! Note: PX ( X 1) 0.046 Less than 5% or 1 in 20. So, it’s unlikely that his success rate is 0.4. 10 9 8 7 10! 4 6 PX 4 0.4 0.6 0.0256 0.0467 0.251 4! 6! 4 3 2 30 Binomial and normal / Gaussian distribution The normal distribution is a good approximation to the binomial distribution. (“large” n, small skew.) B(n, p) Prob. density function: Geometric Distribution A geometric distribution is usually interpreted as number of time periods until a failure occurs. Imagine a sequence of coin flips, and the random variable X is the flip number on which the first tails occurs. The probability of a head (a success) is p. 32 Geometric Let’s find the probability function for the geometric distribution: PX 1 1 p PX 2 p1 p PX 3 p p 1 p p 1 p 2 etc. So, PX x ? PX x p x 1 1 p (x is a positive integer) Geometric Notice, there is no upper limit on how large X can be Let’s check that these probabilities add to 1: x 1 x 1 PX x p x 1 1 p 1 p p x 1 x 1 1 1 p p 1 p 1 1 p x 0 x Geometric series 34 1 p 1 p x 0 x xp x 0 x 1 (| p | 1) Geometric differentiate both sides w.r.t. p: 1 1 1 1 2 (1 p) (1 p) 2 See Rosen page 158, example 17. Expectation: x 1 x 1 xPX x xp x 1 1 p 1 p xpx 1 x 1 1 1 1 p 2 1 p 1 p Variance: p V X 1 p 2 35 Poisson distribution The Poisson distribution is typical of random variables which represent counts. – Number of requests to a server in 1 hour. – Number of sick days in a year for an employee. 36 The Poisson distribution is derived from the following underlying arrival time model: – The probability of an unit arriving is uniform through time. – Two items never arrive at exactly the same time. – Arrivals are independent --- the arrival of one unit does not make the next unit more or less likely to arrive quickly. 37 Poisson distribution The probability function for the Poisson distribution with parameter is: is like the arrival rate --- higher means more/faster arrivals e x Px X x x! for x 0,1,2,3,... E X V X 38 Poisson distribution Shape Low Med High 39 Markov and Chebyshev bounds 40 Often, you don’t know the exact probability distribution of a random variable. We still would like to say something about the probabilities involving that random variable… E.g., what is the probability of X being larger (or smaller) than some given value. We often can by bounding the probability of events based on partial information about the underlying probability distribution Markov and Chebyshev bounds. Note: relates cumulative distribution to expected value. Theorem Markov Inequality Let X be a nonnegative random variable with E[X] = . Then, for any t > 0, Hmm. What if But P( X t ) t ? t P( X t ) 1 1 t 2 gives P( X 2 ) 2 Sure! “Can’t have too much prob. to the right42of E[X]” t . P( X t ) E[ X ] P( X x ) t Proof t . P( X t ) t . P( X x) x:x t x x P( X x) x:x t x x P( X x) E[ X ] Where did we use X >= 0? 3rd line E[ X ] I.e. P ( X t ) t 43 I(X ) 0 E[ X ] P( X t ) t t 0 Alt. proof Markov Inequality 0 Define Y t X t X t P( X t ) pY ( y ) P( X t ) A discrete random variable y0 y t E[Y] E[X] E[Y ] 0 P( X t ) t P( X t ) E[ X ] 44 P( X t ) t Example: Consider a system with mean time to failure = 100 hours. Use the Markov inequality to bound the reliability of the system, R(t) for t = 90, 100, 110, 200 X – time to failure of the system; E[X]=100 By Markov R(t)= P[X>t] , with t =90, 100, 110 , 200 P( X 90) 100 / 90 1.11 P( X 100) 100 /100 1 P( X 110) 100 /110 0.9 Markov inequality is somewhat crude, P( X 200) 100 / 200 0.5 since only the mean is assumed to be known. Theorem Chebyshev's Inequality Assume that mean and variance are given. Better estimate of probability of events of interest using Chebyshev inequality: Proof: Apply Markov inequality to non-negative r.v. (X- )2 and number t2 to obtain 46 I(X ) 0 E[ X ] P( X t ) t t 0 Theorem Chebyshev's Inequality Alternate form E[ X ] X V[X ] 2 X P X X t t 2 X 2 47 I(X ) 0 P( X t ) t 0 E[ X ] t Theorem Chebyshev's Inequality P X X t t 2 X 2 Because 2 P X X t P X X 2 E X X 2 t 2 t 48 P X X t t 2 X 2 Chebyshev inequality: Alternate forms Yet two other forms of Chebyshev’s ineqaulity: Says something about the probability of being “k standard deviations from the mean”. 49 P X X t t 2 X 2 Theorem Chebyshev's Inequality 1 P X X k X 2 k 1 P X X k X 1 2 k X X X 0.934 0.889 0.75 0 X X X X X X 50 X ~ N ( , P ) X X t 2 2 X 2 Inequality Theorem Chebyshev's t 1 P X X k X 2 k 1 P X X k X 1 2 k Facts: X X X 0.934 0.889 0.75 0 X X X X X X 51 Example 52 P X X Aside “just” Markov: P( X t ) t 70 P( X 84) 0.8 84 60 6 P X X 4 X 1/16 X 84 X 24 4 1 k X 2 k 1000 161 62.5 70 53