PPT

Bayes Rule P( B | A) P( A) P( A | B)  P( B) Rev. Thomas Bayes (1702-1761) • How is this rule derived? • Using Bayes rule for probabilistic inference: P( Evidence | Cause) P(Cause) P(Cause | Evidence)  P( Evidence) – P(Cause | Evidence): diagnostic probability – P(Evidence | Cause): causal probability Bayesian decision theory • Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidence E = e – Partially observable, stochastic, episodic environment – Examples: X = {spam, not spam}, e = email message X = {zebra, giraffe, hippo}, e = image features – The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise – What is agent’s optimal estimate of the value of X? • Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e) MAP decision • X = x: value of query variable • E = e: evidence P (e | x ) P ( x ) x*  arg max x P( x | e)  P (e)  arg max x P(e | x) P( x) P ( x | e)  P ( e | x ) P ( x ) posterior likelihood prior • Maximum likelihood (ML) decision: x*  arg max x P(e | x) Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? – Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? – Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability P(spam | message)  P(message | spam) P(spam) P(¬spam | message)  P(message | ¬spam) P(¬spam) Example: Spam Filter • We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) • How do we represent the message? – Bag of words model: • The order of the words is not important • Each word is conditionally independent of the others given message class • If the message consists of words (w1, …, wn), how do we compute P(w1, …, wn | spam)? – Naïve Bayes assumption: each word is conditionally independent of the others given message class n P(message | spam)  P( w1 , , wn | spam)   P( wi | spam) i 1 Example: Spam Filter • Our filter will classify the message as spam if n n i 1 i 1 P( spam) P( wi | spam)  P(spam) P( wi | spam) • In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow: n n   log  P( spam) P( wi | spam)  log P( spam)   log P( wi | spam) i 1 i 1   • Model parameters: – Priors P(spam), P(¬spam) – Likelihoods P(wi | spam), P(wi | ¬spam) • These parameters need to be learned from a training set (a representative sample of email messages marked with their classes) Parameter estimation • Model parameters: – Priors P(spam), P(¬spam) – Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: # of occurrences of wi in spam messages P(wi | spam) = total # of words in spam messages – This happens to be the parameter estimate that maximizes the likelihood of the training data: D nd  P(w d ,i | class d ) d 1 i 1 d: index of training document, i: index of a word Parameter estimation • Model parameters: – Priors P(spam), P(¬spam) – Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: # of occurrences of wi in spam messages P(wi | spam) = total # of words in spam messages • Parameter smoothing: dealing with words that were never seen or seen too few times – Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did Bayesian decision making: Summary • Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E • Inference problem: given some evidence E = e, what is P(X | e)? • Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x1,e1), …, (xn,en)} Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005) Bag-of-word models for images 1. Extract image features Bag-of-word models for images 1. Extract image features Bag-of-word models for images 1. Extract image features 2. Learn “visual vocabulary” Bag-of-word models for images 1. Extract image features 2. Learn “visual vocabulary” 3. Map image features to visual words

PPT

Related documents

Products

Support

PPT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib