Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability p Binary random variable Bernoulli trial with success probability p You flip k coins How many heads would you expect Number of heads X: discrete random variable Binomial distribution with parameters k and p Discrete Random Variables Random variables (RVs) which may take on only a countable number of distinct values E.g. the total number of heads X you get if you flip 100 coins X is a RV with arity k if it can take on exactly one value out of x1 , , xk E.g. the possible values that X can take on are 0, 1, 2,…, 100 Probability of Discrete RV Probability mass function (pmf): P X xi Easy facts about pmf P X x 1 i i P X xi X x j 0 if i j P X x X x P X x P X x i j i j P X x1 X x2 X xk 1 if i j Common Distributions Uniform X U 1, , N X takes values 1, 2, …, N P X i 1 N E.g. picking balls of different colors from a box Binomial X Bin n, p X takes values 0, 1, …, n n i n i P X i p 1 p i E.g. coin flips Coin Flips of Two Persons Your friend and you both flip coins Head with probability 0.5 You flip 50 times; your friend flip 100 times How many heads will both of you get Joint Distribution Given two discrete RVs X and Y, their joint distribution is the distribution of X and Y together E.g. P(You get 21 heads AND you friend get 70 heads) x y P X x Y y 1 E.g. 50 100 i 0 j 0 P You get i heads AND your friend get j heads 1 Conditional Probability P X x Y y is the probability of X x , given the occurrence of Y y E.g. you get 0 heads, given that your friend gets 61 heads P X x Y y P X x Y y P Y y Law of Total Probability Given two discrete RVs X and Y, which take values in x1 , , xm and y1 , , yn , We have P X x Y y P X x Y y P Y y P X xi j i j i j j j Marginalization Marginal Probability Joint Probability P X x Y y P X x Y y P Y y P X xi j i j i Conditional Probability j j j Marginal Probability Bayes Rule X and Y are discrete RVs… P X x Y y P X xi Y y j P X x Y y P Y y P Y y j X xi P X xi P Y y k j X xk P X xk Independent RVs Intuition: X and Y are independent means that X x neither makes it more or less probable that Y y Definition: X and Y are independent iff P X x Y y P X x P Y y More on Independence P X x Y y P X x P Y y P X x Y y P X x P Y y X x P Y y E.g. no matter how many heads you get, your friend will not be affected, and vice versa Conditionally Independent RVs Intuition: X and Y are conditionally independent given Z means that once Z is known, the value of X does not add any additional information about Y Definition: X and Y are conditionally independent given Z iff P X x Y y Z z P X x Z z P Y y Z z More on Conditional Independence P X x Y y Z z P X x Z z P Y y Z z P X x Y y, Z z P X x Z z P Y y X x, Z z P Y y Z z Monty Hall Problem You're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1 The host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead? Host reveals Goat A or Host reveals Goat B Host must reveal Goat B Host must reveal Goat A Monty Hall Problem: Bayes Rule Ci : the car is behind door i, i = 1, 2, 3 P Ci 1 3 Hij : the host opens door j after you pick door i P H ij Ck i j 0 0 jk ik 1 2 1 i k , j k Monty Hall Problem: Bayes Rule cont. WLOG, i=1, j=3 P C1 H13 P H13 P H13 C1 P C 1 P H13 1 1 1 C1 P C1 2 3 6 Monty Hall Problem: Bayes Rule cont. P H13 P H13 , C1 P H13 , C2 P H13 , C3 P H13 C1 P C1 P H13 C2 P C2 1 1 1 6 3 1 2 16 1 P C1 H13 12 3 Monty Hall Problem: Bayes Rule cont. 16 1 P C1 H13 12 3 1 2 P C2 H13 1 P C1 H13 3 3 You should switch! Continuous Random Variables What if X is continuous? Probability density function (pdf) instead of probability mass function (pmf) A pdf is any function f x that describes the probability density in terms of the input variable x. PDF Properties of pdf f x 0, x f x 1 f x 1 ??? Actual probability can be obtained by taking the integral of pdf E.g. the probability of X being between 0 and 1 is P 0 X 1 1 0 f x dx Cumulative Distribution Function FX v P X v Discrete RVs FX v vi P X vi Continuous RVs FX v v f x dx d FX x f x dx Common Distributions N , Normal X f x 2 1 x exp , x 2 2 2 E.g. the height of the entire population 0.4 0.35 0.3 0.25 f(x) 2 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 x 1 2 3 4 5 Common Distributions cont. Beta X Beta , 1 1 1 x 1 x , x 0,1 f x; , B , 1 : uniform distribution between 0 and 1 E.g. the conjugate prior for the parameter p in Binomial distribution 1.6 1.4 1.2 1 f(x) 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 Joint Distribution Given two continuous RVs X and Y, the joint pdf can be written as fX,Y x, y x y f X,Y x, y dxdy 1 Multivariate Normal Generalization to higher dimensions of the one-dimensional normal Covariance Matrix f X x1 , , xd 1 2 d 2 12 T 1 1 exp x x 2 Mean Moments Mean (Expectation): E X Discrete RVs: E X vi P X vi v i Continuous RVs: E X Variance: V X E X Discrete RVs: V X Continuous RVs: V X xf x dx 2 vi P X vi 2 vi x f x dx 2 Properties of Moments Mean E X Y E X E Y E aX aE X If X and Y are independent, E XY E X E Y Variance V aX b a V X 2 If X and Y are independent, V X Y V (X) V (Y) Moments of Common Distributions Uniform X U 1, , N Mean 1 N 2 ; variance N 1 12 2 Binomial X Mean np ; variance np 2 Normal X Bin n, p N , 2 Mean ; variance 2 Beta X Beta , Mean ; variance 1 2 Probability of Events X denotes an event that could possibly happen P(X) denotes the likelihood that X happens, or X=true E.g. X=“you will fail in this course” What’s the probability that you will fail in this course? denotes the entire event set X, X The Axioms of Probabilities 0 <= P(X) <= 1 P 1 P X1 X2 disjoint events Useful rules i P Xi , where X i are P X1 X2 P X1 P X2 P X1 X2 P X 1 P X Interpreting the Axioms X1 X2