Probability Theory Review

Probability CSE 5522 Instructor: Alan Ritter What is Probability? • “The probability the coin will land heads is 0.5” – Q: what does this mean? • 2 Interpretations: – Frequentist (Repeated trials) • If we flip the coin many times… – Bayesian • We believe there is equal chance of heads/tails • Advantage: events that do not have long term frequencies Q: What is the probability the polar ice caps will melt by 2050? P r obabilit y basics Begin with a set Ω—the sample space e.g., 6 possible rolls of a die. ω ∈ Ω is a sample point/ possible world/ atomic event A probability space or probability model is a sample space with an assignment P(ω) for every ω ∈ Ω s.t. 0 ≤ P(ω) ≤ 1 Σ ωP (ω) = 1 e.g., P(1) = P(2) = P (3) = P (4) = P (5) = P (6) = 1/ 6. An event A is any subset of Ω P(A) = Σ { ω∈A} P (ω) E.g., P(die roll < 4) = P(1) + P (2) + P(3) = 1/ 6 + 1/ 6 + 1/ 6 = 1/ 2 R andom var iables A random variable is a function from sample points to some range, e.g., the reals or Booleans e.g., Odd(1) = tr ue. P induces a probability distribution for any r.v. X : P(X = x i ) = Σ { ω:X (ω) = x i } P(ω) e.g., P(Odd = tr ue) = P(1) + P(3) + P(5) = 1/ 6 + 1/ 6 + 1/ 6 = 1/ 2 Chapt er 13 8 P r op osit ions Think of a proposition as the event (set of sample points) where the proposition is true Given Boolean random variables A and B : event a = set of sample points where A(ω) = tr ue event ¬a = set of sample points where A(ω) = f alse event a ∧ b = points where A(ω) = tr ue and B (ω) = tr ue Often in AI applications, the sample points are defi ned by the values of a set of random variables, i.e., the sample space is the Cartesian product of the ranges of the variables With Boolean variables, sample point = propositional logic model e.g., A = tr ue, B = f alse, or a ∧ ¬b. Proposition = disjunction of atomic events in which it is true e.g., (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b) ⇒ P(a ∨ b) = P (¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b) Chapt er 13 9 Sy nt ax for pr op osit ions Propositional or Boolean random variables e.g., Cavi ty (do I have a cavity?) Cavi ty = tr ue is a proposition, also written cavi ty Discrete random variables (finite or infinite) e.g., W eather is one of ⟨sunny, r ai n, cloudy, snow⟩ W eather = r ai n is a proposition Values must be exhaustive and mutually exclusive Continuous random variables (bounded or unbounded) e.g., Temp = 21.6; also allow, e.g., Temp < 22.0. Arbitrary Boolean combinations of basic propositions Chapt er 13 11 P r ior pr obabilit y Prior or unconditional probabilities of propositions e.g., P(Cavi ty = tr ue) = 0.1 and P (W eather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(W eather ) = ⟨0.72, 0.1, 0.08, 0.1⟩ (normalized, i.e., sums to 1) Joint probability distribution for a set of r.v.s gives the probability of every atomic event on those r.v.s (i.e., every sample point) P(W eather, Cavi ty) = a 4 × 2 matrix of values: W eather = sunny r ai n cloudy snow Cavi ty = tr ue 0.144 0.02 0.016 0.02 Cavi ty = f alse 0.576 0.08 0.064 0.08 Ever y quest ion ab out a dom ain can b e answer ed by t he j oint dist r ibut ion because ever y event is a sum of sam ple p oint s Chapt er 13 12 G aussian densit y P (x) = 2 2 √ 1 e− (x− µ) / 2σ 2πσ 0 Chapt er 13 14 C ondit ional pr obabilit y Conditional or posterior probabilities e.g., P(cavi ty|toothache) = 0.8 i.e., given t hat toothache is all I know N OT “ if toothache then 80% chance of cavi ty” (Notation for conditional distributions: P(Cavi ty|Toothache) = 2-element vector of 2-element vectors) If we know more, e.g., cavi ty is also given, then we have P(cavi ty|toothache, cavi ty) = 1 Note: the less specific belief r em ains valid after more evidence arrives, but is not always useful New evidence may be irrelevant, allowing simplification, e.g., P(cavi ty|toothache, 49er sW i n) = P(cavi ty|toothache) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial Chapt er 13 15 I nfer ence by enum er at ion Start with the joint distribution: catch L catch catch L catch toothache L toothache .108 .012 .072 .008 cavity .016 .064 .144 .576 L cavity For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω|= φP(ω) Chapt er 13 17 I nfer ence by enum er at ion Start with the joint distribution: catch L catch catch L catch toothache L toothache .108 .012 .072 .008 cavity .016 .064 .144 .576 L cavity For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω|= φP(ω) P (toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 Chapt er 13 18 I nfer ence by enum er at ion Start with the joint distribution: catch L catch catch L catch toothache L toothache .108 .012 .072 .008 cavity .016 .064 .144 .576 L cavity For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω|= φP(ω) P (cavi ty∨toothache) = 0.108+ 0.012+ 0.072+ 0.008+ 0.016+ 0.064 = 0.28 Chapt er 13 19 I nfer ence by enum er at ion Start with the joint distribution: catch L catch catch L catch toothache L toothache .108 .012 .072 .008 cavity .016 .064 .144 .576 L cavity Can also compute conditional probabilities: P(¬cavi ty ∧ toothache) P(toothache) 0.016 + 0.064 = = 0.4 0.108 + 0.012 + 0.016 + 0.064 P(¬cavi ty|toothache) = Chapt er 13 20 N or m alizat ion catch L catch catch L catch toothache L toothache .108 .012 .072 .008 cavity .016 .064 .144 .576 L cavity Denominator can be viewed as a normalization constant α P(Cavi ty|toothache) = α P (Cavi ty, toothache) = α [P(Cavi ty, toothache, catch) + P(Cavi ty, toothache, ¬catch)] = α [⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩] = α ⟨0.12, 0.08⟩ = ⟨0.6, 0.4⟩ General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables Chapt er 13 21 I nfer ence by enum er at ion, cont d. Let X be all the variables. Typically, we want the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X − Y − E Then the required summation of joint entries is done by summing out the hidden variables: P(Y |E = e) = αP (Y , E = e) = α Σ h P(Y , E = e, H = h) The terms in the summation are joint entries because Y , E, and H together exhaust the set of random variables Obvious problems: 1) Worst-case time complexity O(dn ) where d is the largest arity 2) Space complexity O(dn ) to store the joint distribution 3) How to find the numbers for O(dn ) entries??? Chapt er 13 22 I ndep endence A and B are independent iff P (A|B ) = P(A) or P(B |A) = P (B ) or P(A, B ) = P (A)P(B ) Cavity Toothache Catch Cavity decomposes into Toothache Catch Weather Weather P (Toothache, Catch, Cavi ty, W eather ) = P(Toothache, Catch, Cavi ty)P(W eather ) 32 entries reduced to 12; for n independent biased coins, 2n → n Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? Chapt er 13 23 C ondit ional indep endence P (Toothache, Cavi ty, Catch) has 23 − 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn’t depend on whether I have a toothache: (1) P(catch|toothache, cavi ty) = P(catch|cavi ty) The same independence holds if I haven’t got a cavity: (2) P(catch|toothache, ¬cavi ty) = P (catch|¬cavi ty) Catch is conditionally independent of Toothache given Cavi ty: P(Catch|Toothache, Cavi ty) = P(Catch|Cavi ty) Equivalent statements: P(Toothache|Catch, Cavi ty) = P(Toothache|Cavi ty) P(Toothache, Catch|Cavi ty) = P(Toothache|Cavi ty)P(Catch|Cavi ty) Chapt er 13 24 C ondit ional indep endence cont d. Write out full joint distribution using chain rule: P(Toothache, Catch, Cavi ty) = P(Toothache|Catch, Cavi ty)P(Catch, Cavi ty) = P(Toothache|Catch, Cavi ty)P(Catch|Cavi ty)P(Cavi ty) = P(Toothache|Cavi ty)P(Catch|Cavi ty)P(Cavi ty) I.e., 2 + 2 + 1 = 5 independent numbers (equations 1 and 2 remove 2) In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Condit ional indep endence is our m ost basic and r obust for m of knowledge ab out uncer t ain envir onm ent s. Chapt er 13 25 B ayes’ R ule Product rule P(a ∧ b) = P(a|b)P (b) = P (b|a)P(a) ⇒ Bayes’ rule P (a|b) = P (b|a)P(a) P (b) or in distribution form P(Y |X ) = P(X |Y )P(Y ) = αP (X |Y )P (Y ) P (X ) Useful for assessing diagnostic probability from causal probability: P(Cause|E f f ect) = P(E f f ect|Cause)P(Cause) P (E f f ect) E.g., let M be meningitis, S be stiff neck: P(m|s) = P(s|m)P(m) 0.8 × 0.0001 = = 0.0008 P (s) 0.1 Note: posterior probability of meningitis still very small! Chapt er 13 26 B ayes’ R ule and condit ional indep endence P(Cavi ty|toothache ∧ catch) = α P(toothache ∧ catch|Cavi ty)P(Cavi ty) = α P(toothache|Cavi ty)P(catch|Cavi ty)P(Cavi ty) This is an example of a naive Bayes model: P(Cause, E f f ect 1, . . . , E f f ect n ) = P (Cause)Πi P (E f f ect i |Cause) Cavity Toothache Cause Catch Effect 1 Effect n Total number of parameters is linear in n Chapt er 13 27

Probability Theory Review

Related documents

Products

Support

Probability Theory Review

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib