Probabilistic Models of Relational Domains Daphne Koller Stanford University © Daphne Koller, 2003 Overview Relational logic Probabilistic relational models Inference in PRMs Learning PRMs Uncertainty about domain structure Summary © Daphne Koller, 2003 Attribute-Based Models & Relational Models © Daphne Koller, 2003 Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Diffic_CS101 Intell_Jane Difficulty Diffic_CS101 Intell_George These “instances” are not independent Grade_Jane_CS101 A Grade Intell_George Diffic_Geo101 Grade_George_Geo101 © Daphne Koller, 2003 Grade_George_CS101 C The University BN Difficulty_ CS101 Difficulty_ Geo101 Grade_ Jane_ CS101 Grade_ George_ CS101 Grade_ George_ Geo101 © Daphne Koller, 2003 Intelligence_ Jane Intelligence_ George The Genetics BN G_Harry G_Betty B_Harry G_Homer B_Betty G_Marge B_Homer B_Marge G_Bart G_Lisa G_Maggie B_Bart B_Lisa B_Maggie G = genotype B = bloodtype © Daphne Koller, 2003 G_Selma B_Selma Simple Approach Graphical model with shared parameters … and shared local dependency structure Want to encode this constraint: For human knowledge engineer For network learning algorithm How do we specify which nodes share params? shared (but different) structure across nodes? © Daphne Koller, 2003 Simple Approach II We can write a special-purpose program for each domain: genetic inheritance (family tree imposes constraints) university (course registrations impose constraints) Is there something more general? © Daphne Koller, 2003 Attribute-Based Worlds Smart_Jane & easy_CS101 GetA_Jane_CS101 Smart_Mike & easy_Geo101 GetA_Mike_Geo101 Smart_Jane & easy_Geo101 GetA_Jane_Geo101 Smart_Rick & easy_CS221 GetA_Rick_C World = assignment of values to attributes / truth values to propositional symbols © Daphne Koller, 2003 Object-Relational Worlds x,y(Smart(x) & Easy(y) & Take(x,y) Grade(A,x,y)) World = relational interpretation: Objects in the domain Properties of these objects Relations (links) between objects © Daphne Koller, 2003 Relational Logic General framework for representing: objects & their properties classes of objects with same model relations between objects Represent a model at the template level, and apply it to an infinite set of domains Given finite domain, each instantiation of the model is propositional, but the template is not © Daphne Koller, 2003 Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Professor Classes Student Intelligence Teaching-Ability Teach Has Attributes Course Difficulty In Registration Grade Satisfaction © Daphne Koller, 2003 Relations St. Nordaf University Prof. Smith Teaching-ability High Teaches Teaches Prof. Jones High Low Teaching-ability C B In-courseGrade Registered Satisfac Hate Like Intelligence Weak Welcome to George Geo101 Grade B C Welcome to Difficulty Easy Registered In-courseSatisfac Hate CS101 Intelligence Smart Grade A Difficulty Easy Hard © Daphne Koller, 2003 Hate Like In-courseSatisfac Registered Jane Relational Logic: Summary Vocabulary: Classes of objects: Individual objects in a class: George.Intelligence, Reg1.Grade Relationships between these objects George, Jane, … Attributes of these objects: Person, Course, Registration, … Of(Reg1,George), Teaches(CS101,Smith) A world specifies: A set of objects, each in a class The values of the attributes of all objects The relations that hold between the objects © Daphne Koller, 2003 Binary Relations Any relation can be converted into an object: R(x1,x2,…,xk) new “relation” object y, R1(x1,y), R2(x2,y),…, Rk(xk,y) E.g., registrations are “relation objects” Can restrict attention to binary relations R(x,y) © Daphne Koller, 2003 Relations & Links Binary relations can also be viewed as links: Specify the set of objects related to x via R R(x,y) y x.R1, x y.R2 E.g., Teaches(p,c) p.Courses = {courses c : Teaches(p,c)} c.Instructor = {professors p : Teaches(p,c)} © Daphne Koller, 2003 Probabilistic Relational Models: Relational Bayesian Networks © Daphne Koller, 2003 Probabilistic Models Uncertainty model: In attribute-based models, world specifies space of “possible worlds”; probability distribution over this space. assignment of values to fixed set of random variables In relational models, world specifies Set of domain elements Their properties Relations between them © Daphne Koller, 2003 PRM Scope Entire set of relational worlds is infinite and too broad Assume circumscribed class of sets of worlds x consistent with some type of background knowledge x PRM is a template defining P(x) for any such x Simplest class attribute-based PRMs: x = relational skeleton: finite set of entities E and relations between them x = all assignments of values to all attributes of entities in E PRM template defines distribution over x for any such x [K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99] © Daphne Koller, 2003 Relational Bayesian Network Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Professor Teaching-Ability Student Intelligence Course Difficulty A B C easy,low Reg Grade Satisfaction [K. & Pfeffer; © Daphne Koller,Poole; 2003 Ngo & Haddawy] easy,high hard,low hard,high 0% 20% 40% 60% 80% 100% RBN: Semantics x: set of objects & relations between them x: the set of all assignments of values to all attributes of all objects in x P(x) is defined by a ground Bayesian network: variables: attributes of all objects dependencies: determined by relational links in x dependency structure of RBN model © Daphne Koller, 2003 RBN Structure For each class X and attribute A, structure specifies parents for X.A X.B Course.Level X..B Reg.In.Difficulty X.A Course.Difficulty X.A Reg.Grade For any object x in class X, x.B is parent of x.A © Daphne Koller, 2003 : link or chain of links For any object x in class X, x..B is parent of x RBN Semantics Prof. Jones Teaching-ability Prof. Smith Teaching-ability Grade Welcome to Intelligence Satisfac George Geo101 Grade Welcome to to Welcome Difficulty Satisfac CS101 CS101 Grade Difficulty © Daphne Koller, 2003 Satisfac Intelligence Jane The Web of Influence Welcome to CS101 C 0% 0% 50% 50% Welcome to Geo101 © Daphne Koller, 2003 easy / hard A low high low / high 100% 100% Aggregate Dependencies avg A B C y n 0.8 0.2 0.4 0.6 0.1 0.9 Reg #5077 Grade Reg av g Satisfaction#5054 Grade Reg Reg Satisfaction#5639 Grade Grade Satisfaction Satisfaction [K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99] © Daphne Koller, 2003 Student Student Intelligence Jane Doe Intelligence Honors Honors Problem!!! Need dependencies on sets Aggregate Dependencies Professor Course Student Teaching-Ability Intelligence Stress-Level Honors Difficulty count Rating avg Reg Grade avg © Daphne Koller, 2003 Satisfaction sum, min, max, avg, mode, count Basic RBN: Summary RBN specifies A probabilistic dependency structure S: A set of parents X..B for each class attribute X.A A set of local probability models: Aggregator to use for each multi-valued dependency Set of CPD parameters X.A Given relational skeleton structure x, RBN induces a probability distribution over worlds Distribution defined via ground BN over attributes x.A P ( | x , S , ) P (x .A | parentsS ,x (x .A ), X .A ) X .A x Xx © Daphne Koller, 2003 Attributes Objects Extension: Class Hierarchy Subclasses inherit all attributes of parents, but may have additional ones For inherited attribute X.A, subclass can: inherit parent’s probabilistic model overwrite with local probabilistic model Example: Professor has subclasses assistant, associate, full Inherit distribution over Stress-Level Modify distribution over Salary [K. & Pfeffer ’98] © Daphne Koller, 2003 Extension: Class Hierarchies Hierarchies allow reuse in knowledge engineering and in learning Parameters and dependency models shared across more objects If class assignments specified in x, class hierarchy does not introduce complications [K. & Pfeffer ’98] © Daphne Koller, 2003 Coherence & Acyclicity Professor Teaching-Ability Stress-Level Reg Grade Satisfaction Smith.Stress-level depends probabilistically on itself [Friedman, Getoor, K. Pfeffer ’99] © Daphne Koller, 2003 For given skeleton x, PRM asserts dependencies between attributes of objects: y .B x , x .A defines coherent probability model over if x, is acyclic Guaranteeing Acyclicity How do we guarantee that a PRM is acyclic for every object skeleton x? PRM dependency structure S template-level dependency graph Y.B X.A Attribute stratification: if X..B Parents(X.A), and class of X. is Y class dependency graph acyclic x, acyclic for all x © Daphne Koller, 2003 Limitation of Stratification Father Person M-chromosome P-chromosome Blood-type Mother Person M-chromosome Person P-chromosome M-chromosome P-chromosome Blood-type Blood-type Person.M-chrom © Daphne Koller, 2003 Person.P-chrom Person.B-type Limitation of Stratification Person M-chromosome P-chromosome Blood-type Father Mother Person M-chromosome P-chromosome Person M-chromosome P-chromosome Blood-type Blood-type Prior knowledge: the Father-of relation is acyclic Dependence of Person.A on Person.Father.B cannot induce cycles © Daphne Koller, 2003 Guaranteeing Acyclicity With guaranteed acyclic relations, some cycles in the dependency graph are guaranteed to be safe. We color the edges in the dependency graph yellow: within single object X.A green: via g.a. relation X.B Person.M-chrom Y.B red: via other relations X.A Y.B Person.P-chrom Person.B-type © Daphne Koller, 2003 X.A A cycle is safe if it has a green edge it has no red edge Object-Oriented Bayesian Nets* OOBNs are RBN with only one type of relation One object can be a “part-of” another Objects can only interact with component parts Other types of relationships must be embedded into the “part-of” framework Defines “neat” hierarchy of related objects Provides clearly defined object interface between object x and its enclosing object y * Simplified view © Daphne Koller, 2003 [K. & Pfeffer ’97] OOBN Example Drivers, cars, collision must all be viewed as part of situation object Driver1 Driver2 Owner Owner Car1 Value Speed Car1 Damage © Daphne Koller, 2003 Car2 Speed Collision Severity Value Car2 Damage Part-of hierarchy Objects in the model are encapsulated within each other, forming a tree. Accident situation Driver1 Income Car1 Price Engine Speed Power © Daphne Koller, 2003 Severity Car2 Driver2 Speed Engine Price Income Power Lines of Communication Accident situation Driver1 Income Price Car1 Severity Engine Speed Speed Power © Daphne Koller, 2003 Car2 Engine Power Driver2 Price Income Probabilistic Relational Models: Relational Markov Networks © Daphne Koller, 2003 Why Undirected Models? Symmetric, non-causal interactions Patterns involving multiple entities E.g., web: categories of linked pages are correlated Cannot introduce direct edges because of cycles E.g., web: “triangle” patterns Directed edges not appropriate “Solution”: Impose arbitrary direction Not clear how to parameterize CPD for variables involved in multiple interactions Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001] © Daphne Koller, 2003 Markov Networks: Review ABC LLL LLH LHL LHH HLL HLH HHL HHH Potential Alice (A,B,C) Eve Bob Chris Dave 1 P(A, B, C, D, E) (A, B, C) (C,D) (D, E) (E, A) Z © Daphne Koller, 2003 Markov Networks: Review A Markov network is an undirected graph over some set of variables V Graph associated with a set of potentials i Each potential is factor over subset Vi Variables in Vi must be a (sub)clique in network 1 P (V ) Z © Daphne Koller, 2003 (V ) i i i Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Student Intelligence Reg Grade Course Difficulty Student2 Intelligence © Daphne Koller, 2003 Template potential Reg2 Grade CC CB CA BC Study BB BA AC AB AA Group 0 0.5 1 1.5 2 RMN Language Define clique templates All tuples {reg R1, reg R2, group G} s.t. In(G, R1), In(G, R2) Compatibility potential (R1.Grade, R2.Grade) Ground Markov network contains potential (r1.Grade, r2.Grade) for all appropriate r1, r2 © Daphne Koller, 2003 Ground MN (or Chain Graph) Grade Welcome to Intelligence Geo Study Group Geo101 Welcome to Difficulty Grade CS101 Difficulty Intelligence Grade CS Study Group Grade © Daphne Koller, 2003 George Jane Intelligence Jill PRM Inference © Daphne Koller, 2003 Inference: Simple Method Define ground network as in semantics Apply standard inference methods Problem: Very large models can be specified very easily Resulting ground network often highly connected Exact inference is typically intractable In practice, often must resort to approximate methods such as belief propagation © Daphne Koller, 2003 Exploiting Structure: Encapsulation Objects interact only in limited ways Can define object interface: Outputs: Object attributes influencing other objects Inputs: External attributes influencing object Object is independent of everything given interface Inference can be encapsulated within objects, with “communication” limited to interfaces Student Upbringing Intelligence GPA [K. & Pfeffer, © Daphne Koller,1997] 2003 Honors Personality Job-prospects Exploiting Structure: Encapsulation Marginalize object distribution onto interface Dependency graph over interfaces induced by Perform inference over interfaces If interaction graph has low tree-width, can use exact inference Inter-object dependencies And hence by the relational structure E.g., part-of hierarchy in some OOBNs If relational structure more complex, can use BP A form of Kikuchi BP, where cluster selection is guided by relational structure © Daphne Koller, 2003 Exploiting Structure: Reuse Objects from same class have same model For generic objects – no internal evidence – marginalize interface is the same Can reuse inference – a form of “lifting” George Jane Upbringing Intelligence GPA Honors Intelligence Personality Job-prospects [Pfeffer K. 1998] © Daphne&Koller, 2003 Upbringing GPA Honors Personality Job-prospects Exploiting Structure: Reuse Generic objects often play same role in model Multiple students that all take the same class Reuse: compute interface once Combinatorics: compute total contribution to probability in closed form P(k students like the class | teaching-ability = low) = P(generic student likes the class | teaching ability = low)k [Pfeffer K. 1998] © Daphne&Koller, 2003 Case study Battlefield situation assessment for missile units several locations many units each has detailed model Example object classes: Battalion Battery Vehicle Location Weather. © Daphne Koller, 2003 Example relations: At-Location Has-Weather Sub-battery/In-battalion Sub-vehicle/In-battery Effect of Exploiting Structure 6000 5000 running time in seconds Query: P(Next-Mission) flat BN no reuse with reuse with combinatorics 4000 3000 BN has 2400 nodes cost 20 minutes 2000 cost 9 sec 1000 0 1 [Pfeffer K. 1998] © Daphne&Koller, 2003 2 3 4 5 6 7 8 #vehicles of each type in each battery 9 10 PRM Learning © Daphne Koller, 2003 Outline Relational Bayesian networks Relational Markov networks Likelihood function ML parameter estimation EM Structure learning Parameter estimation Applications: Collective classification – web data Relational clustering – biological data © Daphne Koller, 2003 PRM Learning: Input Prof. Smith Teaching-ability High Teaches Teaches Prof. Jones High Teaching-ability Welcome to Input: a full world B In-courseGrade Registered Satisfac Hate Geo101 Welcome to Difficulty Easy George Grade C In-course Satisfac Hate Registered CS101 Difficulty Hard © Daphne Koller, 2003 Intelligence Weak Grade A Satisfac Hate In-course Intelligence Smart Registered Jane Likelihood Function L(S , : ) P ( | x , S , ) P (x .A | parentsS , (x .A ), X .A ) X .A x Xx Attributes Objects Likelihood of a BN with shared parameters Joint likelihood is a product of likelihood terms One for each attribute X.A and its family For each X.A, the likelihood function aggregates counts from all occurrences x.A in world [Friedman, Getoor, © Daphne Koller, 2003 K., Pfeffer, 1999] Likelihood Function: Multinomials Log-likelihood: log P ( | x , S , ) M (a , u ) log X .A uVal ( Pa ( X .A )) a Val ( X .A ) x |u Sufficient statistics: M (a , u ) {x X :x .A a , parentsS , (x .A ) u} © Daphne Koller, 2003 RBN Parameter Estimation MLE parameters: Pˆ( Reg .Grade A | Student .Intell hi , Course .Diff lo ) M ( Reg .Grade A , Student .Intell hi , Course .Diff lo ) M ( Reg .Grade *,Student .Intell hi , Course .Diff lo ) Bayesian estimation: Prior for each attribute X.A Posterior uses aggregated sufficient statistics © Daphne Koller, 2003 Learning w. Missing Data EM Algorithm applies essentially unchanged E-step computes expected sufficient statistics, aggregated over all objects in class M-step uses ML (or MAP) parameter estimation Key difference: In general, the hidden variables are not independent Computation of expected sufficient statistics requires inference over entire network [Same reasoning as for forward-backward algorithm in temporal models] © Daphne Koller, 2003 Learning w. Missing Data: EM [Dempster et al. 77] Students Courses A B C easy,low easy,high hard,low hard,high 0% 20% 40% 60% 80% 100% P(Registration.Grade | Course.Difficulty, Student.Intelligence) easy / hard © Daphne Koller, 2003 low / high Learning RBN Structure Define set of legal RBN structures Ones with legal class dependency graphs Define scoring function Bayesian score marginal likelihood prior Score (S : ) log[ P ( | S , x )P (S )] P ( | S , x ) L (S , S : )P (S )dS Product of family scores: One for each X.A Uses aggregated sufficient statistics Search for high-scoring legal structure [Friedman, Getoor, © Daphne Koller, 2003 K., Pfeffer, 1999] Learning RBN Structure All operations done at class level Dependency structure = parents for X.A Acyclicity checked using class dependency graph Score computed at class level Individual objects only contribute to sufficient statistics Can be obtained efficiently using standard DB queries © Daphne Koller, 2003 Exploiting Locality: Phased Search Phase 0: consider only dependencies within a class Course Course Reg Student Course © Daphne Koller, 2003 Reg Reg Student Student Exploiting Locality: Phased Search Phase 1: consider dependencies one link away Course Course Reg Student Course © Daphne Koller, 2003 Reg Reg Student Student Exploiting Locality: Phased Search Phase k: consider dependencies k links away Course Course Reg Student Course © Daphne Koller, 2003 Reg Reg Student Student Exploiting Locality: Phased Search Professor Popularity Course Teaching-Ability Student Difficulty Intelligence Rating Honors Reg Grade Satisfaction © Daphne Koller, 2003 TB Patients in San Francisco Strains Patients [Getoor, Rhee, K., Small, 2001] © Daphne Koller, 2003 Contacts TB Patients in San Francisco 1000 strains Suscept Gender POB Race HIV Age TB-type XRay Outbreak Strain Input: Relational database Output: PRM for domain Infected Treatment 2300 patients Patient Smear Subcase CareLoc CareLoc Contact © Daphne Koller, 2003 CloseCont 20000 contacts Relation Outline Relational Bayesian networks Relational Markov networks Likelihood function ML parameter estimation EM Structure learning Parameter estimation Applications: Collective classification – web data Relational clustering – biological data © Daphne Koller, 2003 Learning RMN Parameters Student Intelligence Reg Grade Course Difficulty Student2 Intelligence Template potential Reg2 CC CB CA BC Study BB BA AC AB AA Group Grade exp{ wAA f AA ... wCC fCC } Parameterize potentials as log-linear model [Taskar, Wong, Abbeel, K., 2002] © Daphne Koller, 2003 Learning RMN Parameters (w : ) log Pw ( | w ) wi fi ( ) log Z i Counts in Partition function For example: fAA () = # of tuples {reg r1, reg r2, group g} s.t. In(g, r1), In(g, r2); r1.Grade=A, r2.Grade=A © Daphne Koller, 2003 Learning RMN Parameters Parameter estimation is not closed form Convex problem unique global maximum Can use methods such as conjugate gradient # (Grade A , Grade A ) actual count w AA P (Grade A , Grade A ) - expected count Gradient process tries to find parameters s.t. expected counts = actual counts Computing expected counts requires inference over ground Markov network © Daphne Koller, 2003 Learning RMNs L = log P(Grades,Intelligence,Difficulty) easy / hard ABC low / high Grade Difficulty Intelligence Grade Intelligence (Reg1.Grade,Reg2.Grade) CC CB CA BC BB BA AC AB AA 0 0.5 1 1.5 2 Grade Difficulty Intelligence Grade © Daphne Koller, 2003 L # (Grade A, Grade A) AA P(Grade A, Grade A | Diffic ) Outline Relational Bayesian networks Relational Markov networks Likelihood function ML parameter estimation EM Structure learning Parameter estimation Applications: Collective classification – web data Relational clustering – biological data © Daphne Koller, 2003 Collective Classification Training Data Course Probabilistic Relational Model Student Reg Learning Model Structure New Data Conclusions Inference Example: Train on one year of student intelligence, course difficulty, and grades Given only grades in following year, predict all students’ intelligence © Daphne Koller, 2003 Discriminative Training Goal: Given values of observed variables .O=o, predict desired target values .T=t* Do not necessarily want the model to fit the joint distribution P(.O=o, .T=t*) To maximize classification accuracy, we can consider other optimization criteria Maximize conditional log likelihood P (T . t * | .O o ) Maximize margin log P (T . t * | .O o ) log[max t t * P (T . t | .O o )] [Taskar, Abbeel, K., 2002; Guestrin, ©Taskar, Daphne Koller, 2003 K. 2003] P(second highest probability label) Web KB Tom Mitchell Professor Project-of WebKB Project Member Advisor-of Sean Slattery Student [Craven et al.] © Daphne Koller, 2003 Web Classification Experiments WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations © Daphne Koller, 2003 Standard Classification Page Category Word1 Professor department extract information computer science machine learning … © Daphne Koller, 2003 Categories: faculty course project student other ...WordN Discriminatively trained model = Logistic Regression Standard Classification Page Category ...WordN...LinkWordN Word1 workin 0.18 g 0.16 0.14 with 0.12 Tom 0.1 Mitchell 0.08 … 0.06 0.04 0.02 0 © Daphne Koller, 2003 Logistic Power of Context Professor? © Daphne Koller, 2003 Student? Post-doc? Collective Classification To-Page From-Page Category Category Word1 © Daphne Koller, 2003 ...WordN FT SS SP SF SC PS PP PF PC FS FP FF FC CS CP CF CC Link ... WordN Word1 Compatibility (From,To) Collective Classification To-Page From-Page Category Category Word1 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 ...WordN Link Word1 ...WordN Classify all pages collectively, maximizing the joint label probability © Daphne Koller, 2003 Logistic Links More Complex Structure © Daphne Koller, 2003 More Complex Structure Faculty W1 Students C Wn S S © Daphne Koller, 2003 Courses Collective Classification: Results 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 35.4% relative reduction in error relative to strong flat approach [Taskar, Abbeel, K., 2002] © Daphne Koller, 2003 Logistic Links Section Link+Section Relational Clustering Unlabeled Relational Data Course Probabilistic Relational Model Stude nt Reg Learning Model Structure Clustering of instances Example: Given only students’ grades, cluster similar students © Daphne Koller, 2003 Movie Data © Daphne Koller, 2003 Internet Movie Database http://www.imdb.com Discovering Hidden Types Actor Director Type Type Movie Genres Type Year MPAA Rating [Taskar, Segal, K., 2001] © Daphne Koller, 2003 Rating #Votes Discovering Hidden Types Movies Actors Directors Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher … © Daphne Koller, 2003 … Biology 101: Pathways Pathways are sets of genes that act together to achieve a common function © Daphne Koller, 2003 Biology 101: Expression Gene 1 Coding Control Gene 2 Coding Control DNA RNA Protein © Daphne Koller, 2003 Cells express different subsets of their genes in different tissues and under different conditions Swi5 Transcription factor Finding Pathways: Attempt I Use protein-protein interaction data Pathway IIII II © Daphne Koller, 2003 Finding Pathways: Attempt I Use protein-protein interaction data Problems: Data is very noisy Structure is lost: Large connected component (3527/3589 genes) in interaction graph © Daphne Koller, 2003 Finding Pathways: Attempt II Use gene expression data Thousands of arrays available under different conditions Pathway I Clustering Pathway II © Daphne Koller, 2003 Finding Pathways: Attempt II Use gene expression data Thousands of arrays available under different conditions Problems: Expression is only ‘weak’ indicator of interaction Data is noisy Interacting pathways are not separable © Daphne Koller, 2003 Pathway I Pathway II Finding Pathways: Our Approach Use both types of data to find pathways Find “active” interactions using gene expression Find pathway-related co-expression using interactions Pathway I Pathway III Pathway II Pathway IV © Daphne Koller, 2003 Probabilistic Model Genes are partitioned into “pathways”: Expression component: Every gene is assigned to one of ‘k’ pathways Random variable for each gene with domain {1,…,k} Model likelihood is higher when genes in the same pathway have similar expression profiles Interaction component: Model likelihood is higher when genes in the same pathway interact [Segal, Wang, K., 2003] © Daphne Koller, 2003 Expression Component Naïve Bayes g.C=1 g.E1 0 g.E2 0 g.E3 0 Pathway of gene g … g.Ek 0 Expression level of gene g in m arrays © Daphne Koller, 2003 Protein Interaction Component Interacting genes are more likely to be in the same pathway Gene 1 g1.C (g.C,g.C) g1.C g2.C protein product interaction Gene 2 g2.C 1 1 1 2 2 2 3 3 3 1 1 2 3 1 2 3 1 2 3 Compatibility potential © Daphne Koller, 2003 Joint Probabilistic Model Gene 1 g1.C Gene 2 g2.C Gene 3 g3.C Gene 4 g4.C 1 1 1 1 2 2 2 3 3 3 1 2 3 1 2 3 1 2 3 1 g1.E2 g1.E3 g2.E1 g2.E2 g2.E3 g3.E1 g3.E2 g3.E3 g4.E1 g4.E2 g4.E3 1 0 0 © Daphne Koller, 2003 g1.E1 2 1 2 3 0 3 Path. I Learning Task Large Markov network with high connectivity Use loopy belief propagation 1 g1.C g2.C 1 g1.E 1 1 1 2 2 2 3 3 3 g2.E1 1 2 3 1 2 3 1 2 3 1 0 2 g2.E2 3 g3.C g1.E3 g1.E2 0 g2.E3 Path. I g3.E1 g3.E2 g3.E3 g4.E1 g4.E2 g4.E3 g4.C E-step: compute pathway assignments M-step: Estimate Gaussian distribution parameters Estimate compatibility potentials using cross-validation © Daphne Koller, 2003 Capturing Protein Complexes Independent data set of interacting proteins 400 Our method Standard expression clustering Num Complexes 350 300 250 200 150 124 complexes covered at 50% for our method 46 complexes covered at 50% for clustering 100 50 0 0 © Daphne Koller, 2003 10 20 30 40 50 60 70 80 90 100 Complex Coverage (%) RNAse Complex Pathway YHR081W RRP40 RRP42 MTR3 RRP45 RRP4 RRP43 DIS3 TRM7 SKI6 RRP46 CSL4 Includes all 10 known pathway genes Only 5 genes found by clustering © Daphne Koller, 2003 RRP40 TRM7 RRP43 DIS3 RRP42 CSL4 RRP45 YHR081W RRP46 RRP4 SKI6 MTR3 Interaction Clustering RNAse complex found by interaction clustering as part of cluster with 138 genes © Daphne Koller, 2003 Uncertainty about Domain Structure or PRMs are not just template BNs/MNs © Daphne Koller, 2003 Structural Uncertainty Class uncertainty: Relational uncertainty: To which class does an object belong What is the relational (link) structure Identity uncertainty: Which “names” refer to the same objects Also covers data association © Daphne Koller, 2003 Relational Uncertainty Probability distribution over graph structures Link existence model E.g., hyperlinks between webpages Each potential link is a separate event Its existence is a random variable Link reference model E.g., instructors teaching a course Distribution over # of endpoints for each outgoing link Each link has distribution over link endpoint e.g., instructor link for CS course likely to point to CS prof Many other models possible [Getoor, Friedman, K. Taskar, 2002] © Daphne Koller, 2003 Link Existence Model Background knowledge x is an object skeleton A set of entity objects PRM defines distribution over worlds Assignments of values to all attributes Existence of links between objects [Getoor, Friedman, K. Taskar, 2002] © Daphne Koller, 2003 Link Existence as a PRM Define objects for any potential links Potential link object for any pair of webpages w1, w2 Each potential link object has link existence attribute, denoting whether the link exists or not Link existence variables have probabilistic model Page-from Type Words Page-to Link Exists Type Words From.Type [Getoor, Friedman, K. Taskar, 2002] © Daphne Koller, 2003 Student Student Professor Professor To.Type False True Student Professor Student Professor 0.999 0.995 0.998 0.999 0001 0005 0002 0001 Why Are Link Models Useful? Predict which links exist in the world Which professor teaches each course Which student will register for which course Use known links to infer values of attributes Given that student registered for a hard class, is she more likely to be a graduate student Given that one page points to another, is it more likely to be a faculty page? © Daphne Koller, 2003 Predicting Relationships Tom Mitchell Professor Member WebKB Project Member Advisor-of Sean Slattery Student Predict and classify relationships between objects © Daphne Koller, 2003 Flat Model To-Page From-Page Word1 NONE advisor instructor TA member project-of © Daphne Koller, 2003 ... Word1 WordN Rel LinkWord1 Type ... LinkWordN ... WordN Collective Classification: Links To-Page From-Page Category Category ... ... Word1 Rel LinkWord1 [Taskar, Wong, Abbeel, K., 2002] © Daphne Koller, 2003 Word1 WordN Type ... LinkWordN WordN Link Model ... ... ... ... ... © Daphne Koller, 2003 ... Triad Model Professor Advisor TA Instructor Course © Daphne Koller, 2003 Student Link Prediction: Results ... ... 72.9% relative reduction in error ... relative to strong flat approach Error measured over links predicted to be present 30 25 20 15 Link presence cutoff is 10 at precision/recall 5 break-even point (30% for all models) 0 [Taskar, Wong, Abbeel, K., 2002] © Daphne Koller, 2003 Flat Links Triad Identity Uncertainty Model Background knowledge x is an object universe A set of potential objects PRM defines distribution over worlds Assignments of values to object attributes Partition of objects into equivalence classes Objects in same class have same attribute values [Pasula, Marthi, Milch, Russell, Shpitser, 2002] © Daphne Koller, 2003 Citation Matching Model* Link chain: Appears-in. Refers-to. Written-by Author Name Written-by Paper ObsTitle PubType Name Refers-to Citation Title Author-as-Cited Appears-in Text Each citation object associated with paper object Uncertainty over equivalence classes for papers If P1=P2, have same attributes & links Simplified ©*Daphne Koller, 2003 Title, PubType Authors Identity Uncertainty Depending on choice of equivalence classes: Number of objects changes Dependency structure changes No “nice” corresponding ground BN Algorithm: Each partition hypothesis defines simple BN Use MCMC over equivalence class partition Exact inference over resulting BN defines acceptance probability for Markov chain [Pasula, Marthi, Milch, Russell, Shpitser, 2002] © Daphne Koller, 2003 Identity Uncertainty Results Phrase Match 100 95 PRM+MCMC 61.5% relative reduction in error relative to state of the art 90 85 80 75 70 Face RL Reasoning Constraint Average Accuracy of citation recovery: % of actual citation clusters recovered perfectly [Pasula, Marthi, Milch, © Daphne Koller, 2003 Russell, Shpitser, 2002] Summary: PRMs … Inherit the advantages of graphical models: Coherent probabilistic semantics Exploit structure of local interactions Allow us to represent the world in terms of: Objects Classes of objects Properties of objects Relations © Daphne Koller, 2003 So What Do We Gain? Convenient language for specifying complex models “Web of influence”: subtle & intuitive reasoning A mechanism for tying parameters and structure within models across models Framework for learning from relational and heterogeneous data © Daphne Koller, 2003 So What Do We Gain? New way of thinking about models & problems Incorporating heterogeneous data by connecting related entities New problems: Collective classification Relational clustering Uncertainty about richer structures: Link graph structure Identity © Daphne Koller, 2003 But What Do We Really Gain? Simple PRMs relational logic w. fixed domain and only Induce a “propositional” BN Can augment language with additional expressivity Existentialjust quantifiers & functions Are PRMs a convenient language for Equality specifying attribute-based graphical models? Resulting language is inherently more expressive, allowing us to represent distributions over worlds where dependencies vary significantly [Getoor et al., Pasula et al.] worlds with different numbers of objects [Pfeffer et al., Pasula et al.] worlds with infinitely many objects [Pfeffer & K.] Big questions: Inference & Learning © Daphne Koller, 2003 Pieter Abbeel Nir Friedman Lise Getoor Carlos Guestrin Brian Milch Avi Pfeffer Eran Segal Ken Takusagawa Ben Taskar Haidong Wang Ming-Fai Wong http://robotics.stanford.edu/~koller/ http://dags.stanford.edu/ © Daphne Koller, 2003