An Enquiry into Computer Understanding Peter Cheeseman Computational Intelligence 4 (1) pp58-66 (1988) Cheeseman’s Thesis • McDermott (1987) – Common-sense reasoning is not logical, but plausible • E.g. drop a glass: will it break? – The logicistic approach is fatally flawed – Thus, AI is doomed to “procedural ad hocery” • Cheeseman (here) – No! Bayesian probability removes these difficulties… – and is theoretically sound… – and will also account for inductive inference! “Only the myopic belief – that logic (and its underlying semantics of `truth’) is the possible language for describing the world – could have lead AI researchers into shoehorning all reasoning into the logical mold whether it fitted or not.” The Argument (1) • Common-sense reasoning is non-monotonic • Non-monotonic logics don’t work – “default” conclusions are just as “true” as logically sound ones, so can’t distinguish • Which conclusions to revise • How strongly conclusions should be believed • Bayesian reasoning – is monotonic – But captures non-monotonicity by using conditional probs • P(milk off | 3 days old) = 0.1 • P(milk off | 3 days old & smelly) = 0.95 – Extra info doesn’t invalidate old statements – There’s no “real” prob: it’s all subjective – Unlike logic, where truth is unconditional The Argument (2) • In logic, degrees of truth are restricted to 1 or 0 (t or f) • So how can we say “unlikely the glass will break”? – In logic it must break or it’s impossible to break Real-world knowledge is rarely of this categorical form – why anyone in AI ever thought it is, I will never understand. Why Bayesian reasoning? Cox’s requirements: • • • • Propositions well-defined A single number is necessary & sufficient for belief All propositions have a unique degree of belief Degree of belief can depend on other degrees of belief – (violated by fuzzy sets & Dempster-Shafer) • Need to calculate belief(p1…pN) given belief(p1),..,belief(pN) • As belief(p) , belief(p) • Equal belief in p1, …, pN if p1,…,pN have same truth value Bayes is the answer! Induction • Bayes: find most likely hypotheses from priors + data • Can use this for induction & clustering • Basic premise: gives a principled way of ranking theories from observations – “If Bayesian inference is the solution, why isn’t there a pile of papers on it in AI? Ignorance? Belief that numbers AI?” • Popper: Theories can only be proved incorrect • Cheeseman: Theories are only more or less plausible depending on the priors and data More against logic… • “if A then B” (English) “A B” (logic) • E.g. “if there’s smoke, there’s fire” – x smoke(x) fire(x) ? – x smoke(x) fire(x) ? • Raven’s paradox:x raven(x) black(x) [1] – Black ravens should increase belief in this – But so should non-black non-ravens!, as [1] = x black(x) raven(x) • Bayes doesn’t have these problems • How else can we represent “all ravens are black”? Even more against logic… In logic, it is sufficient to find a chain of inference from the premise to the conclusion to be able to establish its truth. Additional chains of inference to the same conclusion are redundant. However, in Bayesian reasoning, all evidence that is relevant should be used in making a probability assessment…. The ability to combine information from multiple sources and balance different contributions is lacking in logic. Discovery of new information leads to a revision of past beliefs. Probabilities give a measure of how much the new information alters our beliefs. The Counter-Arguments… Priors & The Independence Assumption • Aleliunas: Where do the priors come from? – Bayes’ hunger for probabilities makes it generally infeasible • Bundy: Bayes is only proof-functional (A and B A & B) by making horrendous independence assumptions – P(A) = .5, p(B) = .5, p(A&B) = 0 to 0.5 …yuk! • Supose Fred & Sue & Joe & … all say Mike is tall Mike is 7’ tall for sure? Where do the models come from? Dempster: Bayes is essentially a calculus for deducing posterior probabilities and expectations from specified models. Neither Fisher (1950) nor Cheeseman tell us where the formal models come from in the first place. Israel: Non-monotonicity isn’t just updating probabilities; it’s the much broader problem of revision or defeasibility, = specifying principles of rational belif change under the pressure of new info. Revision has to do with global principles of governing change of the total cognitive state. Where do the models come from? (cont) • Kanal & Perlis: Bayes isn’t so defeasible: – Bayes’ assumes a fixed model: rules p(H|E) fixed – We can only vary belief in the inputs to the model – Doesn’t account for “rule revision” e.g. I no longer believe that E indicates H and more (Not) representing imprecision Dubois & Prade: • Cox’s “axioms” rule out imprecision: Cheat!! • Bayes can’t distinguish uncertaing vs. ignorance – Roll a dice: p(6) = 1/6 – After 1000 tries, we’re more sure it’s unbiased, so our ignorance is less. But p(6) = 1/6 still. • Sometimes you need to know the error bounds before acting Probabilities don’t solve everything! • Kanal & Perlis: “Cat isa animal” • McDermott: Put this in probabilities, pal: “My wife teaches music. A student who’d borrowed some music called to arrange to return it Monday. My wife suggested just keep it until your next lesson on Thursday. But she returned it Monday anyway. Then Wednesday she called saying she was sick and couldn’t come Thursday. My wife suspected she was lying.” – p(bring music back | planning not to show for lesson) = ?? • Pearl: Does Yale Shooting Problem with Bayes What do we Condition on? • Dalkey, Schafer: – p(black|raven) – How we we characterize the population “raven”? Major problem. • Dempster: – P(morning star | F) …. What is F? – F = my whole state of knowledge – Isn’t this rather big? And continously changing? • McDermott: The real problem is deciding what evidence to look at in the first place, given the intractability of taking everything into account. What are Probabilities Anyway? • Hayes, Schafer: What does “probability” mean anyway? – Objective: Statistics over a population? – Subjective: e.g. betting odds? – Degree of belief?? Against Numbers… • Greiner: – Would prefer ATMS, but have meta-rules to resolve conflicts and rank arguments, rather than numbers. • Kanal & Perlis: – Logical reasoning is an integral part of common-sense