CSC242 Second-Half Review 1. Summarize briefly why using logic exclusively to formalize a domain like medical diagnosis is hard. 2. State the principle of maximum expected utility briefly. 3. True or false: A prior (or unconditional) probability takes into account all relevant other information 4. Posterior (conditional) probabilities are conditional on what? CSC242 Second-Half Review 5. Prove that P (a|b ∧ a) = 1. Use the definition of conditional probability and some basic properties of conjunction. 6. Suppose we have the following random variables and domains (values) to describe the state of a car at a repair shop: • Problem : {brakes, clutch, electrical , steering, tires} • Rattling : {true, false} • Squeaky : {true, false} • Type : {economy, midrange, luxury} Give an English translation of the following conditional probability statement using these variables and values and the notational conventions in the book: P (clutch|rattling ∧ ¬noisy ∧ luxury) = 0.4 CSC242 Second-Half Review 7. Assume a random variable Cuisine with values {american, japanese, chinese, french, polish} representing the type of meal served in a cafeteria for lunch. What does the following probability statement say? P(Cuisine) = h0.5, 0.2, 0.2, 0.1, 0i 8. Using the random variables from the car repair shop example above, draw (or describe) tables showing the elements of the following joint probability distributions (of course you can’t fill in the values): (a) P(Problem, Type) (b) P(Problem, Type, Rattling) (c) P(Problem, Type, Rattling, Sqeaky) CSC242 Second-Half Review 9. What does it mean for two random variables to be independent? Be specific and use formal notation where possible. 10. What does it mean for two random variables to be conditionally independent? Be specific and use formal notation where possible. 11. Why are independence assertions useful for inference? 12. State Bayes’ Rule briefly and formally. CSC242 Second-Half Review 13. What are the components of a Bayesian Network for a set of random variables {Xi }? 14. Exact inference in Bayesian Networks relies on two properties: X P(X | e) = αP(X, e) = α P(X, e, y) y and P (x1 , . . . , xn ) = n Y i=1 Briefly explain what these equations mean. P (xi | parents(Xi )). CSC242 Second-Half Review 15. Is exact inference for Bayesian networks feasible? Why or why not? 16. What does it mean for a probability estimate to be consistent. Be brief. CSC242 Second-Half Review 17. What are the main strength and main weakness of rejection sampling? 18. How does likelihood weighting improve on rejection sampling? 19. Briefly explain how Gibbs sampling works. CSC242 Second-Half Review 20. The basic approach to modeling uncertainty in a changing world is to view the world as a series of “snapshots” or time slices. Each time slice includes a set X of state variables assumed be unobservable, and a set E of evidence variables that can be observed. (a) Explain briefly what the (first-order) Markov assumption is and what it means for the state variables X. Be formal. (b) Explain briefly what a stationary process assumption is and what it means for the state variables X. Be formal. (c) Explain briefly what the (first-order) sensor Markov assumption means for the evidence variables E. Be formal. (d) Explain briefly what a stationary process assumption means for the evidence variables E. Be formal. (e) Give an example of a Bayesian network for a temporal model with a single state variable X and a single evidence variable E. Draw a picture. Include formulas describing the conditional probabilities required by the network (no numbers needed). CSC242 Second-Half Review 21. The book (and class) identified four inference tasks for temporal models. (a) Identify the four tasks and give the posterior distribution that each task needs to compute (or estimate). Use the notation X for the state variables, E for the evidence variables, e for the evidence values (a vector of values of the evidence variables), and x for a vector of values of the state variables, with appropriate subscripts. (b) Can the basic temporal inference tasks on Bayesian networks be computed efficiently? Why or why not (very briefly)? (c) Why is this important? CSC242 Second-Half Review 22. Given a nondeterministic transition function Result(a) (or Result(s, a)) and a utility function U (s): (a) Give an expression for the expected utility of performing action a given evidence e. (b) State the principle of maximum expected utility formally and in words. CSC242 Second-Half Review 23. Define the expected monetary value (EMV) of a gamble or lottery [p1 , v1 ; . . . ; pn , vn ], where the gamble pays value vi with probability pi . 24. Is the expected monetary value of a gamble equal to its expected utility? Why or why not? CSC242 Second-Half Review 25. Tickets to a lottery cost $1. There are two possible prizes: a $10 payoff with probability 1/50, and a $1,000,000 payoff with probability 1/2,000,000. (a) What is the expected monetary value of the lottery? (b) When (if ever) is it rational to buy a ticket? Be formal. Assume Sn is the state of having n dollars, you currently have k dollars (i.e., you are in state Sk ), and U (Sk ) = 0. State and justify any other assumptions you need to make about utilities. (c) Studies show that lower income people buy more lottery tickets. How would you explain this using the current framework? CSC242 Second-Half Review 26. Suppose you are in the market for a new house. A house can be in good shape (g) or bad shape (¬g). There are various tests and inspections that you could perform, each with an associated cost. The tests can indicate what shape the house is in. Suppose you are considering buying house for $150,000. If it’s in good shape, you believe that its market value is really $200,000. If not, it will need $70,000 in repairs to get it into shape. You believe that it has 70% chance of being in good shape. Suppose you only have time to perform at most one test, which costs $5000. (a) Draw the decision network that represents this problem and justify your design. (b) Calculate the expected net gain (in dollars) of buying the house, given no test is performed. CSC242 Second-Half Review (c) Suppose you have some knowledge about the probability that the test will accurately reflect the shape of the house: P (pass | g) = 0.8 P (pass | ¬g) = 0.35 That is, a house in good shape will pass the test 80% of the time (and fail it 20% of the time–a false negative), while a house in bad shape will nonetheless pass the test 35% of the time (a false positive). Compute the conditional probability for the shape of the house given whether the test is passed (that is, P(G | P ass)). (d) Calculate the optimal decisions for the cases of a pass and of a fail, and their expected utilities. CSC242 Second-Half Review (e) What does this say about the test? What does it say about the optimal plan for buying the house? CSC242 Second-Half Review 27. Identify the elements of a Markov Decision Problem (or Process). Be formal. 28. What counts as a solution to an MDP? 29. Why is it important for the MDP approach that the environment be fully observable? CSC242 Second-Half Review 30. Define the optimal solution to an MDP formally, assuming the use of discounted rewards with discount factor γ. 31. (a) Identify the two main algorithms for computing optimal policies for MDPs. (b) Explain briefly how they differ. 32. State the assumptions underlying the MDP approach. How reasonable are they? CSC242 Second-Half Review 33. For each of the following types of learning, briefly describe what is learned and what knowledge, data, or feedback the learning agent receives to help it learn. (a) (2 points) Unsupervised learning (b) (2 points) Supervised learning (c) (2 points) Reinforcement learning CSC242 Second-Half Review 34. The figure below shows a data set fit with several different function hypotheses. y=mx+b y=ax3+bx2+cx+d (a) (b) y=ax+b+csin(x) (c) Briefly compare the different hypotheses according to each of the following criteria: (a) (2 points) Simplicity of the hypothesis (b) (2 points) Goodness of fit (c) (2 points) Generalization CSC242 Second-Half Review 35. (2 points) Why (besides the philosophical or aesthetic appeal of Occam’s Razor) do we prefer simpler hypotheses for machine learning? 36. (3 points) Write down the Propositional Logic sentence that is equivalent to the decision tree shown in Figure 18.6 (the one induced from the examples by the Decision-TreeLearning algorithm). CSC242 Second-Half Review 37. (2 points) When learning a decision tree, suppose that you have considered all the attributes but still you are left with a subset of the examples that cannot be classified uniquely. What does this mean? Could it happen in real life? Why or why not? 38. (2 points) Explain briefly how to get the most out of your data when evaluating it using the training set/testing set paradigm. CSC242 Second-Half Review 39. (a) (2 points) Express the problem of fitting a line to two-dimensional data in terms of an optimization of the weight vector w. (b) (2 points) Do you need to do search in order to fit a line to two-dimensional data? CSC242 Second-Half Review 40. Suppose you want to learn a function that predicts the season batting average of a baseball player given information available at the start of the season. (This is sort of a simplified version of the premise of the movie Moneyball, by the way.) (a) (3 points) Explain how you would formalize this as a multivariate regression problem. (b) (2 points) Is it reasonable to model this relationship using a linear model? Why or why not? (c) (2 points) Explain briefly how you might learn the model using an iterative algorithm. CSC242 Second-Half Review 41. (3 points) Briefly define the following terms: (a) Decision boundary (b) Linear separator (c) Linearly separable 42. (2 points) Explain briefly the relationship between linear regression (learning a linear model that fits the data) and classification. You might want to use some of the terms you defined above. CSC242 Second-Half Review 43. (2 points) Explain briefly the attraction(s) of logistic regression. Is there a downside to using it?