Machine Learning Probability Review for Discrete Random Variables Fabio Vandin October 1st , 2021 1 Probability Refresher Definition A probability space has three components: 1 A sample space Z , which is the set of all possible outcomes of the random process modeled by the probability space; 2 A family F of sets representing the allowable events, where each event A is a subset of Z : A ✓ Z (Must be a -field...) A probability distribution D : F ! [0, 1] that satisfies the following conditions: 3 1 2 D[Z ] = 1; Let E1 , E2 , E3 , . . . be any finite or countably infinite sequence of pairwise mutually disjoint events (Ei \ Ej = ; for all i 6= j): 2 3 [ X D 4 Ei 5 = D[Ei ]. i 1 i 1 2 3 Distributions and Probability We use z ⇠ D to say that event z 2 Z is sampled according to D 4 Distributions and Probability We use z ⇠ D to say that event z 2 Z is sampled according to D Given a function f : Z ! {true, false}, define the probability of f (z) Pz⇠D [f (z)] = D({z 2 Z : f (z) = true}) 4 Distributions and Probability We use z ⇠ D to say that event z 2 Z is sampled according to D Given a function f : Z ! {true, false}, define the probability of f (z) Pz⇠D [f (z)] = D({z 2 Z : f (z) = true}) In many cases, we express an event A ✓ Z using a function ⇡A : Z ! {true, false}, that is: A = {z 2 Z : ⇡A (z) = true} where ⇡A (z) = true if z 2 A and ⇡A (z) = false otherwise. 4 Distributions and Probability We use z ⇠ D to say that event z 2 Z is sampled according to D Given a function f : Z ! {true, false}, define the probability of f (z) Pz⇠D [f (z)] = D({z 2 Z : f (z) = true}) In many cases, we express an event A ✓ Z using a function ⇡A : Z ! {true, false}, that is: A = {z 2 Z : ⇡A (z) = true} where ⇡A (z) = true if z 2 A and ⇡A (z) = false otherwise. In this case we have P[A] = Pz⇠D [⇡A (z)] = D(A) Note: sometimes we use ⇡ : Z ! {0, 1} instead of ⇡ : Z ! {true, false}. 4 5 Independent Events Definition Two events E and F are independent (E ?F ) if and only if P[E \ F ] = P[E ] · P[F ] More generally, events E1 , E2 , . . . Ek are mutually independent if and only if for any subset I ✓ [1, k], " # \ Y P Ei = P[Ei ]. i2I i2I 6 7 Random Variable (R.V.) Definition A (scalar) random variable X (z) on a sample space Z is a real-valued function on Z ; that is, X : z 2 Z ! R. 8 Random Variable (R.V.) Definition A (scalar) random variable X (z) on a sample space Z is a real-valued function on Z ; that is, X : z 2 Z ! R. Discrete random variable: codomain is finite or countable (countably infinite). Example: Z, N, {0; 1}, . . . Continuous random variable: codomain is continuous. Example: R, [a, b], . . . 8 Description of R.V. Discrete: • pX (x) = P[X = x] [Probability Mass Function - PMF] P • FX (x) = P[X x] = kx pX (k) [Cumulative Distribution Function - CDF] 9 Example: coin flipping 10 Vector Valued R.V. Example X= X1 X2 X1 , X2 discrete: . pX (x) = pX1 ,X2 (x1 , x2 ) = PX [X1 = x1 , X2 = x2 ] 11 Vector Valued R.V. Example X= X1 X2 X1 , X2 discrete: . pX (x) = pX1 ,X2 (x1 , x2 ) = PX [X1 = x1 , X2 = x2 ] [“X1 = x1 , X2 = x2 ” are joint events] Note: If X is obvious, we may write P[X1 = x1 , X2 = x2 ] instead of PX [X1 = x1 , X2 = x2 ] 11 12