Probabilistic Entity Relation

advertisement
Probabilistic Entity-Relationship
Models, PRMs, and Plate Models
David Heckerman, Chris Meek,
and Daphne Koller
Slides from SRL 2004 talk
History/Motivation
• Began with: Plates (stats) ~ PRMs (ML)
• Found it to be important to distinguish between
entities and relationships
• Discovered the ER model (e.g., Ullman and
Widom, Ch 2)
• Created probabilistic version of ER model: PER
model
– PER Model is more expressive than Plate Model or
PRM and helps to show their connections
– PER Model provides a strong link to the db
community by virtue of being built on top of ER Model
Outline
• Entity-Relationship (ER) Model
• Probabilistic Entity-Relationship (PER)
Model
• Connections to plate model, PRM
• Modeling issues
ER Model
• An abstract representation of data
• The creation of an ER model is often the
first step in the process of constructing a
relational database.
• Often constructed before any data has
arrived (much like we construct models
before collecting data).
ER Model -- Example
A university database maintains records on students and
their IQs, courses and their difficulty, and the courses
taken by students and the grades they receive.
Course
Entity
classes
Relationship
class
Diff
Attribute classes
Course entities:
CS107, Stats10, …
Takes
Student
Grade
IQ
Student entities:
John, Mary, …
Takes relations:
(John,CS107), …
Attributes:
John.IQ, CS107.Diff…
ER Model generates attributes
ER Model
Course
Takes
Student
Skeleton
Diff
Grade
IQ
+
Student
Course
john
cs107
mary
stat10
Takes
Student
Course
john
cs107
mary
cs107
mary
stat10
Attributes
=>
cs107.Diff
T(john,cs107).G
stat10.Diff
T(mary,cs107).G
john.IQ
T(mary.stat10).G
mary.IQ
PER Model -- Example
Continuing the university database example, a student's
grade in a course depends both on the student's IQ and
on the difficulty of the course.
Course
Diff
Arc classes
Takes
Grade
Not shown:
Local distribution
class for grade
Student
IQ
PER Model generates Bayes net
PER Model
Course
Takes
Student
Skeleton
Diff
Grade
IQ
+
Student
Course
john
cs107
mary
stat10
Takes
Student
Course
john
cs107
mary
cs107
mary
stat10
Attributes
=>
cs107.Diff
T(john,cs107).G
stat10.Diff
T(mary,cs107).G
john.IQ
T(mary.stat10).G
mary.IQ
Constraints on arc classes
ER Model
Course
Skeleton
Diff
course[Dif f] 
course[Gra de]
Takes
Grade
+
student[IQ ] 
student[Gr ade]
Student
IQ
Student
Course
john
cs107
mary
stat10
Takes
Student
Course
john
cs107
mary
cs107
mary
stat10
Attributes
=>
cs107.Diff
T(john,cs107).G
stat10.Diff
T(mary,cs107).G
john.IQ
T(mary.stat10).G
mary.IQ
More on constraints
A database contains diseases
and symptoms for a given
patient. Both diseases and
symptoms have labels from a
common set of categories
(e.g., cardiovascular, neuro,
urinary). The possible causes
of a symptom are diseases
that have at least one category
in common with that symptom.
Disease
Present
R1
Category
c R1 (d , c) 
R2 ( s, c)
R2
Symptom
Present
More on constraints
A constraint on the arc class
from X.A to Y.B in a PER model
is any first-order expression
involving entities and
relationship classes in the PER
model such that the expression
is bound when the tail and head
entities are taken to be
constants. To determine
whether to draw an arc from x.A
to y.B, we evaluate the
first-order expression using the
tail and head entities of the
putative arc. (It must evaluate to
true or false.) We draw the arc
from x.A to y.B only if the
expression is true.
Disease
Present
R1
Category
c R1 (d , c) 
R2 ( s, c)
R2
Symptom
Present
Local distribution classes
Disease
Present
R1
Category
c R1 (d , c) 
R2 ( s, c)
R2
Symptom
Present
E.g., Noisy OR
Caveat
Typically, a PER model is not based on the
ER model of a database
PER model, plate model, & PRM
PER model
Plate model
Course
Course
Course
Diff
course[Dif f] 
course[Gra de]
PRM
Diff
course[Dif f] 
course[Gra de]
Diff
Takes
Takes
Takes
Grade
student[IQ ] 
student[Gr ade]
Student
Grade
student[IQ ] 
student[Gr ade]
IQ
IQ
Student
Course
Student
Grade
Student
IQ
course[Dif f] 
course[Gra de]
student[IQ ] 
student[Gr ade]
Modeling issues
• Restricted relationships
• Self relationships
• Probabilistic relationships
Restricted relationship: Example
Hierarchical model: A binary outcome O is measured on patients in multiple
hospitals. Each patient is treated in exactly one hospital. It is believed that
outcomes in any given hospital h are i.i.d. given binomial parameter h.q; and
that these binomial parameters are themselves i.i.d. across hospitals given
hyperparameters a.
a
a
Hospital
q

hm.q
In ( h, p )
In
p11.O
Patient
…
h1.q
O
…
p1n1 .O
pm1.O
…
pmnm .O
Restricted, Self, and Uncertain Relationship:
Example
Full
A student's grade in
a course depends
on whether an
advisor of the
student is a friend
of a teacher of
the course.
F(p,pf)
Friend
Professor
Teaches
Course
Teaches ( p, c) 
Advises ( p f , s)
Diff
c[D]  c[G ]
Takes
Advises
Grade
s[IQ ]  s[G ]
Student
IQ
In the paper…
(Google -> Heckerman -> Papers)
• Formal definitions and theorems
• Precise differences between PER models,
plate models, and PRMs
• Undirected PER models
• PER models for asymmetric independence
• Many more examples
Download