Probabilistic Entity-Relationship Models, PRMs, and Plate Models David Heckerman, Chris Meek, and Daphne Koller Slides from SRL 2004 talk History/Motivation • Began with: Plates (stats) ~ PRMs (ML) • Found it to be important to distinguish between entities and relationships • Discovered the ER model (e.g., Ullman and Widom, Ch 2) • Created probabilistic version of ER model: PER model – PER Model is more expressive than Plate Model or PRM and helps to show their connections – PER Model provides a strong link to the db community by virtue of being built on top of ER Model Outline • Entity-Relationship (ER) Model • Probabilistic Entity-Relationship (PER) Model • Connections to plate model, PRM • Modeling issues ER Model • An abstract representation of data • The creation of an ER model is often the first step in the process of constructing a relational database. • Often constructed before any data has arrived (much like we construct models before collecting data). ER Model -- Example A university database maintains records on students and their IQs, courses and their difficulty, and the courses taken by students and the grades they receive. Course Entity classes Relationship class Diff Attribute classes Course entities: CS107, Stats10, … Takes Student Grade IQ Student entities: John, Mary, … Takes relations: (John,CS107), … Attributes: John.IQ, CS107.Diff… ER Model generates attributes ER Model Course Takes Student Skeleton Diff Grade IQ + Student Course john cs107 mary stat10 Takes Student Course john cs107 mary cs107 mary stat10 Attributes => cs107.Diff T(john,cs107).G stat10.Diff T(mary,cs107).G john.IQ T(mary.stat10).G mary.IQ PER Model -- Example Continuing the university database example, a student's grade in a course depends both on the student's IQ and on the difficulty of the course. Course Diff Arc classes Takes Grade Not shown: Local distribution class for grade Student IQ PER Model generates Bayes net PER Model Course Takes Student Skeleton Diff Grade IQ + Student Course john cs107 mary stat10 Takes Student Course john cs107 mary cs107 mary stat10 Attributes => cs107.Diff T(john,cs107).G stat10.Diff T(mary,cs107).G john.IQ T(mary.stat10).G mary.IQ Constraints on arc classes ER Model Course Skeleton Diff course[Dif f] course[Gra de] Takes Grade + student[IQ ] student[Gr ade] Student IQ Student Course john cs107 mary stat10 Takes Student Course john cs107 mary cs107 mary stat10 Attributes => cs107.Diff T(john,cs107).G stat10.Diff T(mary,cs107).G john.IQ T(mary.stat10).G mary.IQ More on constraints A database contains diseases and symptoms for a given patient. Both diseases and symptoms have labels from a common set of categories (e.g., cardiovascular, neuro, urinary). The possible causes of a symptom are diseases that have at least one category in common with that symptom. Disease Present R1 Category c R1 (d , c) R2 ( s, c) R2 Symptom Present More on constraints A constraint on the arc class from X.A to Y.B in a PER model is any first-order expression involving entities and relationship classes in the PER model such that the expression is bound when the tail and head entities are taken to be constants. To determine whether to draw an arc from x.A to y.B, we evaluate the first-order expression using the tail and head entities of the putative arc. (It must evaluate to true or false.) We draw the arc from x.A to y.B only if the expression is true. Disease Present R1 Category c R1 (d , c) R2 ( s, c) R2 Symptom Present Local distribution classes Disease Present R1 Category c R1 (d , c) R2 ( s, c) R2 Symptom Present E.g., Noisy OR Caveat Typically, a PER model is not based on the ER model of a database PER model, plate model, & PRM PER model Plate model Course Course Course Diff course[Dif f] course[Gra de] PRM Diff course[Dif f] course[Gra de] Diff Takes Takes Takes Grade student[IQ ] student[Gr ade] Student Grade student[IQ ] student[Gr ade] IQ IQ Student Course Student Grade Student IQ course[Dif f] course[Gra de] student[IQ ] student[Gr ade] Modeling issues • Restricted relationships • Self relationships • Probabilistic relationships Restricted relationship: Example Hierarchical model: A binary outcome O is measured on patients in multiple hospitals. Each patient is treated in exactly one hospital. It is believed that outcomes in any given hospital h are i.i.d. given binomial parameter h.q; and that these binomial parameters are themselves i.i.d. across hospitals given hyperparameters a. a a Hospital q hm.q In ( h, p ) In p11.O Patient … h1.q O … p1n1 .O pm1.O … pmnm .O Restricted, Self, and Uncertain Relationship: Example Full A student's grade in a course depends on whether an advisor of the student is a friend of a teacher of the course. F(p,pf) Friend Professor Teaches Course Teaches ( p, c) Advises ( p f , s) Diff c[D] c[G ] Takes Advises Grade s[IQ ] s[G ] Student IQ In the paper… (Google -> Heckerman -> Papers) • Formal definitions and theorems • Precise differences between PER models, plate models, and PRMs • Undirected PER models • PER models for asymmetric independence • Many more examples