Bayesian Logic Programs Summer School on Relational Data Mining 17 and 18 August 2002, Helsinki, Finland Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Context Real-world applications uncertainty complex, structured domains probability theory discrete, continuous logic objects, relations,functors Bayesian networks Logic Programming (Prolog) + Bayesian logic programs Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Outline • Bayesian Logic Programs • Examples and Language • Semantics and Support Networks • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation • Structural Learning Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning • • • • • Bayesian Logic Programs Probabilistic models structured using logic Extend Bayesian networks with notions of objects and relations Probability density over (countably) infinitely many random variables Flexible discrete-time stochastic processes Generalize pure Prolog, Bayesian networks, dynamic Bayesian networks, dynamic Bayesian multinets, hidden Markov models,... Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Bayesian Networks • • One of the successes of AI State-of-the-art to model uncertainty, in particular the degree of belief • Advantage [Russell, Norvig 96]: „strict separation of qualitative and quantitative aspects of the world“ • Disadvantge [Breese, Ngo, Haddawy, Koller, ...]: Propositional character, no notion of objects and relations among them Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning • • • Stud farm (Jensen ´96) The colt John has been born recently on a stud farm. John suffers from a life threatening hereditary carried by a recessive gene. The disease is so serious that John is displaced instantly, and the stud farm wants the gene out of production, his parents are taken out of breeding. What are the probabilities for the remaining horses to be carriers of the unwanted gene? Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Bayesian networks [Pearl ´88] Based on the stud farm example [Jensen ´96] bt_unknown1 bt_fred bt_ann bt_cecily bt_brian bt_dorothy bt_gwenn bt_eric bt_henry bt_irene bt_john Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_unknown2 Pa bt _ john Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Bayesian networks [Pearl ´88] Based on the stud farm example [Jensen ´96] bt_unknown1 bt_ann bt_dorothy bt_fred bt_cecily bt_brian bt_gwenn bt_eric bt_henry bt_unknown2 bt_irene Pa bt _ john (Conditional) Probability distribution P(bt_john) bt_henry bt_irene (1.0,0.0,0.0) aa aa (0.5,0.5,0.0) aa aA (0.0,1.0,0.0) aa AA ... bt_john P(bt_cecily=aA|bt_john=aA)=0.1499 P(bt_john=AA|bt_ann=aA)=0.6906 P(bt_john=AA)=0.9909 AA AA Summer(0.33,0.33,0.33) School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning • • Bayesian networks (contd.) acyclic graphs probability distribution over a finite set X1 , , X n of random variables: P X1, , Xn P X1 X 2 , , X n P X 2 X3, , Xn P X 1 Pa X 1 P X 2 Pa X 2 n P X i Pa X i i 1 Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany P Xn P X n Pa X n Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning bt_unknown1 bt_fred From Bayesian Networks to Bayesian Logic Programs bt_ann bt_cecily bt_brian bt_dorothy bt_eric bt_henry bt_gwenn bt_irene bt_john Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_unknown2 Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning bt_unknown1. bt_fred From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_cecily. bt_brian. bt_dorothy bt_eric bt_henry bt_gwenn bt_irene bt_john Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_unknown2. Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning From Bayesian Networks to Bayesian Logic Programs bt_unknown1. bt_ann. bt_cecily. bt_brian. bt_unknown2. Pa bt _ fred bt_fred | bt_unknown1, bt_ann. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_henry bt_irene bt_john Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_gwenn | bt_ann, bt_unknown2. Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning From Bayesian Networks to Bayesian Logic Programs bt_unknown1. bt_ann. bt_cecily. bt_brian. bt_unknown2. Pa bt _ fred bt_fred | bt_unknown1, bt_ann. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_gwenn | bt_ann, bt_unknown2. Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning From Bayesian Networks to Bayesian Logic Programs bt_unknown1. bt_ann. bt_cecily. bt_brian. bt_unknown2. Pa bt _ fred bt_fred | bt_unknown1, bt_ann. bt_dorothy | bt_ann, bt_brian. bt_henry | bt_fred, bt_dorothy. bt_eric| bt_brian, bt_cecily. bt_irene | bt_eric, bt_gwenn. bt_john | bt_henry ,bt_irene. Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_gwenn | bt_ann, bt_unknown2. P(bt_john) bt_henry bt_irene (1.0,0.0,0.0) aa aa (0.5,0.5,0.0) aa aA (0.0,1.0,0.0) aa AA AA AA ... (0.33,0.33,0.33) Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning From Bayesian Networks to Bayesian Logic Programs % apriori nodes Domain e.g. finite, discrete, continuous bt_ann. bt_brian. bt_cecily. bt_unknown1. bt_unknown1. % aposteriori nodes bt_henry | bt_irene | bt_fred | bt_dorothy| bt_eric | bt_gwenn | bt_john | bt_fred, bt_dorothy. bt_eric, bt_gwenn. bt_unknown1, bt_ann. bt_brian, bt_ann. bt_brian, bt_cecily. bt_unknown2, bt_ann. bt_henry, bt_irene. (conditional) probability distribution P(bt_john) bt_henry bt_irene (1.0,0.0,0.0) aa aa (0.5,0.5,0.0) aa aA (0.0,1.0,0.0) aa AA AA AA ... (0.33,0.33,0.33) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). % aposteriori nodes bt(henry) | bt(irene) | bt(fred) | bt(dorothy)| bt(eric) | bt(gwenn) | bt(john) | bt(fred), bt(dorothy). bt(eric), bt(gwenn). bt(unknown1), bt(ann). bt(brian), bt(ann). bt(brian), bt(cecily). bt(unknown2), bt(ann). bt(henry), bt(irene). (conditional) probability distribution P(bt(john)) bt(henry) bt(irene) (1.0,0.0,0.0) aa aa (0.5,0.5,0.0) aa aA (0.0,1.0,0.0) aa AA AA AA ... (0.33,0.33,0.33) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning From Bayesian Networks to Bayesian Logic Programs % ground facts / apriori bt(ann). bt(brian). bt(cecily). bt(unkown1). father(unkown1,fred). father(brian,dorothy). father(brian,eric). father(unkown2,gwenn). father(fred,henry). father(eric,irene). father(henry,john). bt(unkown1). mother(ann,fred). mother(ann, dorothy). mother(cecily,eric). mother(ann,gwenn). mother(dorothy,henry). mother(gwenn,irene). mother(irene,john). % rules / aposteriori bt(X) | father(F,X), bt(F), mother(M,X), bt(M). (conditional) probability distribution P(bt(X)) father(F,X) bt(F) mother(M,X) bt(M) (1.0,0.0,0.0) true Aa true aa AA false AA ... (0.33,0.33,0.33) false Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Dependency graph = Bayesian network bt(unknown2) father(brian,dorothy) bt(ann) bt(cecily) bt(brian) father(brian,eric) mother(ann,dorothy) bt(eric) mother(ann,gwenn) mother(cecily,eric) bt(dorothy) bt(unknown1) bt(gwenn) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown1,fred) bt(henry) father(unknown2,eric) bt(john) mother(gwenn,irene) mother(ann,fred) father(fred,henry) father(henry,john) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany mother(irene,john) Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning bt_unknown1 bt_fred Dependency graph = Bayesian network bt_ann bt_cecily bt_brian bt_dorothy bt_eric bt_henry bt_gwenn bt_irene bt_john Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany bt_unknown2 Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Bayesian Logic Programs - a first definition A BLP B consists of • • a finite set of Bayesian clauses. To each clause c in B a conditional probability distribution cpd c is associated: cpd c P head c body c • • • Proper random variables ~ LH(B) graphical structure ~ dependency graph Quantitative information ~ CPDs Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Bayesian Logic Programs - Examples % apriori nodes nat(0). nat(0) nat(s(0)) nat(s(s(0)) % aposteriori nodes nat(s(X)) | nat(X). % apriori nodes state(0). % aposteriori nodes state(s(Time)) | state(Time). output(Time) | state(Time) % apriori nodes n1(0). state(0) state(s(0)) output(0) output(s(0)) n2(s(0)) n2(0) n3(0) % aposteriori nodes n1(s(TimeSlice) | n2(TimeSlice). n1(0) n2(TimeSlice) | n1(TimeSlice). n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice). Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany ... ... ... n3(s(0)) n1(s(0)) Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning • Associated CPDs represent generically the CPD for each ground instance of the corresponding Bayesian clause. P(bt(X)) father(F,X) bt(F) mother(M,X) bt(M) (1.0,0.0,0.0) true Aa true aa AA false AA ... (0.33,0.33,0.33) 1 false 2 ... n Multiple ground instances of clauses having the same head atom? Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Combining Rules Multiple ground instances of clauses having the same head atom? % ground facts as before % rules bt(X) | father(F,X), bt(F). bt(X) | mother(M,X), bt(M). cpd(bt(john)|father(henry,john), bt(henry)) and cpd(bt(john)|mother(henry,john), bt(irene)) But we need !!! cpd(bt(john)|father(henry,john),bt(henry),mother(irene,john),bt(irene)) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Combining Rules (contd.) Any algorithm which P(A|B) and P(A|C) combines a set of PDFs cpd A A , i1 , Aini 1 i m into the (combined) PDFs cpd A B1 , CR where B1 , P(A|B,C) , Bk , Bk m 1 i 1 A , has an empty output if and only if the input is empty E.g. noisy-or, regression, ... Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany , Aini Bayesian Logic Programs E Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Bayesian Logic Programs - a definition A BLP B consists of • • • • • • a finite set of Bayesian clauses. To each clause c in B a conditional probability distribution cpd c is associated: cpd c P head c body c To each Bayesian predicate p a combining rule cr is associated to combine CPDs of multiple ground instances of clauses having the same head p Proper random variables ~ LH(B) graphical structure ~ dependency graph Quantitative information ~ CPDs and CRs Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Outline • Bayesian Logic Programs • Examples and Language E • Semantics and Support Networks • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation • Structural Learning Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Discrete-Time Stochastic Process • Family X t , t J of random variables X t over a domain X, where J 0,1, 2, • for each linearization of the partial order induced by the dependency graph a Bayesian logic program specifies a discrete-time stochastic process Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Theorem of Kolmogorov Existence and uniqueness of probability measure X : a Polish space • • H J : set of all non-empty, finite subsets of J I • PI : the probability measure over X , I H • J If the projective family PI I H J exists then there exists a unique probability measure Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Consistency Conditions • Probability measure PI , I H J is represented by a finite Bayesian network which is a subnetwork of the dependency graph over LH(B): Support Network • (Elimination Order): All stochastic processes represented by a Bayesian logic program B specify the same probability measure over LH(B). Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Support network bt(unknown2) father(brian,dorothy) bt(ann) bt(cecily) bt(brian) father(brian,eric) mother(ann,dorothy) bt(eric) mother(ann,gwenn) mother(cecily,eric) bt(dorothy) bt(unknown1) bt(gwenn) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown1,fred) bt(henry) father(unknown2,eric) bt(john) mother(gwenn,irene) mother(ann,fred) father(fred,henry) father(henry,john) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany mother(irene,john) Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Support network bt(unknown2) father(brian,dorothy) bt(ann) bt(cecily) bt(brian) father(brian,eric) mother(ann,dorothy) bt(eric) mother(ann,gwenn) mother(cecily,eric) bt(dorothy) bt(unknown1) bt(gwenn) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown1,fred) bt(henry) father(unknown2,eric) bt(john) mother(gwenn,irene) mother(ann,fred) father(fred,henry) father(henry,john) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany mother(irene,john) Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Support network bt(unknown2) father(brian,dorothy) bt(ann) bt(cecily) bt(brian) father(brian,eric) mother(ann,dorothy) bt(eric) mother(ann,gwenn) mother(cecily,eric) bt(dorothy) bt(unknown1) bt(gwenn) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown1,fred) bt(henry) father(unknown2,eric) bt(john) mother(gwenn,irene) mother(ann,fred) father(fred,henry) father(henry,john) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany mother(irene,john) Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning • Support network Support network N x of x LH B is the induced subnetwork of S x y LH B y is influencing x • Support network N x of x LH B is defined as N x N x xx • Computation utilizes And/Or trees Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Queries using And/Or trees • A probabilistic query ?- Q1...,Qn|E1=e1,...,Em=em. asks for the distribution P(Q1, ..., Qn |E1=e1, ..., Em=em). • • Or node is proven if at least one bt(eric) cpd father(brian,eric),bt(brian), mother(cecily,eric),bt(cecily) bt(brian) of its successors is provable. And node is proven if all of its successors are provable. bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) bt(brian) • ?- bt(eric). bt(cecily) father(brian,eric) mother(cecily,eric) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany combined cpd Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Consistency Condition (contd.) Projective family PI IH J exists if • the dependency graph is acyclic, and • every random variable is influenced by a finite set of random variables only well-defined Bayesian logic program Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Relational Character % ground facts bt(petra). bt(bsilvester). bt(brian). bt(anne). bt(wilhelm). bt(unknown1). bt(beate). % ground facts bt(ann). bt(cecily). bt(unknown1). father(silvester,claudien). mother(beate,claudien). father(unknown1,fred). mother(ann,fred). mother(anne, marthe). father(wilhelm,marta). % ground facts father(brian,dorothy). mother(ann, dorothy). ... bt(susanne). bt(ralf).father(brian,eric). mother(cecily,eric). bt(peter). bt(uta). father(unknown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(ralf,luca). mother(susanne,luca). father(eric,irene). mother(gwenn,irene). ... father(henry,john). mother(irene,john). % rules bt(X) | father(F,X), bt(F), mother(M,X), bt(M). P(bt(X)) father(X,F) bt(F) mother(X,M) bt(M) (1.0,0.0,0.0) true Aa true aa AA false AA ... (0.33,0.33,0.33) false Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language E Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning • • • • • • • Bayesian Logic Programs - Summary First order logic extension of Bayesian networks constants, relations, functors discrete and continuous random variables ground atoms = random variables CPDs associated to clauses Dependency graph = (possibly) infinite Bayesian network Generalize dynamic Bayesian networks and definite clause logic (range-restricted) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Applications • Probabilistic, logical • • • • Description and prediction Regression Classification Clustering • APrIL IST-2001-33053 • Computational Biology • Web Mining • Query approximation • Planning, ... Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Other frameworks • • • • • • Probabilistic Horn Abduction [Poole 93] Distributional Semantics (PRISM) [Sato 95] Stochastic Logic Programs [Muggleton 96; Cussens 99] Relational Bayesian Nets [Jaeger 97] Probabilistic Logic Programs [Ngo, Haddawy 97] Object-Oriented Bayesian Nets [Koller, Pfeffer 97] Probabilistic Frame-Based Systems [Koller, Pfeffer 98] Probabilistic Relational Models [Koller 99] Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Outline • Bayesian Logic Programs • Examples and Language • Semantics and Support Networks E • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation • Structural Learning Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning D Data Cases Parameter Estimation Structural Learning Learning Bayesian Logic Programs Data + Background Knowledge A | E, B. learning algorithm E B P(A) e b .9 .1 e b .7 .3 e b .8 .2 e b .99 .01 Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning D Data Cases Parameter Estimation Structural Learning Why Learning Bayesian Logic Programs ? Learning within Bayesian network Inductive Logic Programming Learning within Bayesian Logic Programs Of interest to different communities ? • scoring functions, pruning techniques, theoretical insights, ... Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning E Data Cases Parameter Estimation Structural Learning What is the data about ? m ann, dorothy true, f brian, dorothy true, pc brian b, bt ann a, bt brian ?, bt dorothy a m cecily, fred true, f henry, fred true, bt cecily ab, bt henry b, bt fred ?, m kim, bob true, f fred , bob true, bt kim ?, bt bob b A data case Di D is a partially observed joint state of a finite, nonempty subset x LH B Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Learning Task Given: • set D D1 , , Dn of data cases • a Bayesian logic program B Goal: for each c B the parameters λ(c) c 1 , , c e c of cpd(c ) that best fit the given data Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Parameter Estimation (contd.) • „best fit“ ~ ML-Estimation λ* arg max PB λ D λH where the hypothesis space H is spanned by the product space over the possible values of λ c λ : : cB Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Parameter Estimation (contd.) λ* arg max PB λ D λH arg max ln PB λ D λH Assumption: D1,...,DN are independently sampled from indentical distributions (e.g. totally separated families),), arg max ln PB λ Di λH i arg max ln PB λ Di λH i Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Parameter Estimation (contd.) λ arg max ln PB λ Di * λH marginal of i arg max ln PN var Di Di λH N λ var Di i arg max ln PN λ Di λH i N λ is an ordinary BN marginal of N λ : Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany i N λ var Di Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Parameter Estimation (contd.) • Reduced to a problem within Bayesian networks: given structure, partially observed random varianbles • EM [Dempster, Laird, Rubin, ´77], [Lauritzen, ´91] • Gradient Ascent [Binder, Koller, Russel, Kanazawa, ´97], [Jensen, ´99] Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning • Decomposable CRs Parameters of the clauses and not of the support network. A11 A12 A1 ... A1n1 A21 A22 ... A2 Am1 A2n2 ... Am 2 ... Amnm Am Single ground instance Multiple ground instance of a Bayesian clause of the sameA Bayesian clause CPD for Combining Rule Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Gradient Ascent Bayesian ground clause ln PN λ D cpd c jk Bayesian clause ln PN λ D cpd c jk cpd c jk subst. j , k cpd c j k subst. ln PN λ D cpd c jk Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany 1, if j j , k k 0, if j j , k k Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Gradient Ascent ln PN λ D Bayesian Network Inference Engine cpd c jk subst. ln PN λ D cpd c jk n subst. i 1 PN λ head c u j , body c uk Di cpd c jk Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning Algorithm Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases E Parameter Estimation Structural Learning 1. Expectation-Maximization Initialize parameters 2. E-Step and M-Step, i.e. compute expected counts for each clause and treat the expected count as counts cpd c jk subst. n i 1 PN λ head c u j , body c uk Di n i 1 PN λ body c uk Di subst. 3. If not converged, iterate to 2 Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases EParameter Estimation Structural Learning • Experimental Evidence [Koller, Pfeffer ´97] support network is a good approximation • [Binder et al. ´97] equality constraints speed up learning m(M,X) f(F,X) pdf (c)(h(X)|h(M),h(F)) true true N(0.5*h(M)+0.5*h(F),s) true false N(165,s) false true N(165,s) false false N(165,s) • 100 data cases • constant step-size • Estimation of means • 13 iterations • Estimation of the weights • sum = 1.0 Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Outline • Bayesian Logic Programs • Examples and Language • Semantics and Support Networks • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation E • Structural Learning Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning • Structural Learning Combination of Inductive Logic Programming and Bayesian network learning • Datalog fragment of Bayesian logic programs (no functors) • intensional Bayesian clauses Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Idea - CLAUDIEN learning from interpretations • all data cases are Herbrand interpretations • a hypothesis should reflect what is in the data probabilistic extension all data cases are partially observed joint states of Herbrand intepretations all hypotheses have to be (logically) true in all data cases Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning What is the data about ? m ann, dorothy true, f brian, dorothy true, pc brian b, bt ann a, bt brian ?, bt dorothy a ... m cecily, fred true, f henry, fred true, bt cecily ab, bt henry b, bt fred ?, m kim, bob true, f fred , bob true, bt kim ?, bt bob b Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning : Claudien Learning From Interpretations • D • C : set of all clauses that can be part of set of data cases hypotheses H C (logically) valid iff Di D : H is logically true in Di H C logical solution iff H is a logically maximally general valid hypothesis H C probabilistic solution iff H is (logically) valid and the Bayesian network induced by B on D is acyclic Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Learning Task Given: • set D D1 , , Dn of data cases • a set H of Bayesian logic programs • a scoring function scoreD : H R Goal: probabilistic solution H * H • matches the data best according to score D Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Algorithm Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) m(ann,john) pc(eric) mc(john) Data cases bc(john) {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany pc(john) f(eric,john) Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(eric) pc(ann) m(ann,john) pc(eric) mc(john) bc(john) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany pc(john) f(eric,john) Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(eric) pc(ann) m(ann,john) pc(eric) mc(john) bc(john) Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany pc(john) f(eric,john) Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) pc(ann) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(eric) m(ann,john) pc(eric) mc(john) bc(john) Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany pc(john) f(eric,john) Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) pc(ann) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(eric) pc(eric) mc(john) m(ann,john) pc(john) bc(john) Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany f(eric,john) Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) pc(ann) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(eric) m(ann,john) pc(eric) mc(john) pc(john) bc(john) Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany f(eric,john) Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Original Bayesian logic program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) pc(ann) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(eric) m(ann,john) pc(eric) mc(john) pc(john) f(eric,john) bc(john) Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany ... Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning • • • • • Properties All relevant random variables are known First order equivalent of Bayesian network setting Hypothesis postulates true regularities in the data Logical solutions as inital hypotheses Highlights Background Knowledge Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Example Experiments mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Data: sampling from 2 families, each 1000 samples Score: LogLikelihood Goal: learn the definition of bt CLAUDIEN Bloodtype mc(X) | m(M,X). pc(X) | f(F,X). mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). highest score Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation Structural Learning Conclusion • EM-based and Gradient-based method to do ML parameter estimation • Link between ILP and learning Bayesian networks CLAUDIEN setting used to define and to traverse the search space Bayesian network scores used to evaluate hypotheses • • Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany Bayesian Logic Programs Examples and Language Semantics and Support Networks Learning Data Cases Parameter Estimation E Structural Learning Thanks ! Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany