Introduction to probability theory and graphical models

advertisement
Introduction to probability theory
and graphical models
Translational Neuroimaging Seminar on Bayesian Inference
Spring 2013
Jakob Heinzle
Translational Neuromodeling Unit (TNU)
Institute for Biomedical Engineering (IBT)
University and ETH Zürich
Literature and References
•
Literature:
•
Bishop (Chapters 1.2, 1.3, 8.1, 8.2)
•
MacKay (Chapter 2)
•
Barber (Chapters 1, 2, 3, 4)
•
Many images in this lecture are taken from the
above references.
Bayesian Inference - Introduction to probability theory
2
Probability distribution
A probability P(x=true) is defined on a sample space (domain) and
defines (for every possible) event in the sample space the certainty
of it to occur.
Sample space: dom(X)={0,1}
𝑝 π‘₯ = 1 = 0.4, 𝑝 π‘₯ = 0 = 0.6
𝑝 π‘₯ =1
π‘₯∈π‘‘π‘œπ‘š(π‘₯)
Probabilities sum to one.
Bishop, Fig. 1.11
Bayesian Inference - Introduction to probability theory
3
Probability theory: Basic rules
.
Sum rule* - 𝑝 𝑋 =
π‘Œ 𝑝(𝑋, π‘Œ)
P(X) is also called the marginal distribution
Product rule - 𝑝 𝑋, π‘Œ = 𝑝 𝑋 π‘Œ 𝑝(π‘Œ)
* According to Bishop
Bayesian Inference - Introduction to probability theory
4
Conditional and marginal
probability
𝑝(π‘₯|𝑦) ≡ 𝑝(π‘₯, 𝑦)/𝑝(𝑦)
𝑝 π‘₯ ≡
𝑦
𝑝(π‘₯, 𝑦)
Bayesian Inference - Introduction to probability theory
5
Conditional and marginal
probability
Bishop, Fig. 1.11
Bayesian Inference - Introduction to probability theory
6
Independent variables
𝑝 π‘₯, 𝑦 = 𝑝 π‘₯ 𝑝 𝑦
𝑝 π‘₯ 𝑦 = 𝑝(π‘₯)
Question for later: What does this mean for Bayes?
Bayesian Inference - Introduction to probability theory
7
Probability theory: Bayes’ theorem
𝑝 π‘Œπ‘‹ =
𝑝
π‘‹π‘Œ
𝑝(π‘Œ)
𝑝 𝑋
is derived from the product rule
𝑝 π‘Œ 𝑋 𝑝 𝑋 = 𝑝 𝑋, π‘Œ = 𝑝 𝑋 π‘Œ 𝑝(π‘Œ)
Bayesian Inference - Introduction to probability theory
8
Rephrasing and naming of Bayes’
rule
D: data, q: parameters, H: hypothesis we put into the model.
MacKay
Bayesian Inference - Introduction to probability theory
9
Example: Bishop Fig. 1.9
Box (B): blue (b) or red (r)
Fruit (F): apple (a) or orange (o)
p(B=r) = 0.4, p(B=b) = 0.6.
What is the probability of having
a red box if one has drawn an
orange?
Bishop, Fig. 1.9
Bayesian Inference - Introduction to probability theory
10
Probability density
𝑏
𝑝 π‘₯ ∈ π‘Ž, 𝑏
+∞
=
𝑝 π‘₯ 𝑑π‘₯
π‘Ž
𝑝 π‘₯ 𝑑π‘₯ = 1
−∞
Bayesian Inference - Introduction to probability theory
11
PDF and CDF
Bishop, Fig. 1.12
Bayesian Inference - Introduction to probability theory
12
Cumulative distribution
𝑧
𝑃(𝑧) =
𝑝 π‘₯ 𝑑π‘₯
−∞
Short example: How to use the cumulative distribution
to transform a uniform distribution!
Bayesian Inference - Introduction to probability theory
13
Marginal densities
p(π‘₯) =
𝑝
π‘₯,
𝑦
𝑑𝑦
π‘Œ
Integration instead of summing
Bayesian Inference - Introduction to probability theory
14
Two views on probability
●
Probability can …
–
… describe the frequency of outcomes in random
experiments οƒ  classical interpretation.
–
… describe the degree of belief about a particular
event οƒ  Bayesian viewpoint or subjective
interpretation of probability.
MacKay, Chapter 2
Bayesian Inference - Introduction to probability theory
15
Expectation of a function
𝔼𝑓 =
𝑋
𝑝 π‘₯ 𝑓(π‘₯)
Or
𝔼𝑓 =
𝑝 π‘₯ 𝑓 π‘₯ 𝑑π‘₯
𝑋
Bayesian Inference - Introduction to probability theory
16
Graphical models
1.
They provide a simple way to visualize the structure of a probabilistic model
and can be used to design and motivate new models.
2.
Insights into the properties of the model, including conditional independence
properties, can be obtained by inspection of the graph.
3.
Complex computations, required to perform inference and learning in
sophisticated models, can be expressed in terms of graphical manipulations, in
which underlying mathematical expressions are carried along implicitly.
Bishop, Chap. 8
Bayesian Inference - Introduction to probability theory
17
Graphical models overview
Directed Graph
Undirected Graph
Names: nodes (vertices), edges (links), paths, cycles,
loops, neighbours
For summary of definitions see Barber, Chapter 2
Bayesian Inference - Introduction to probability theory
18
Graphical models overview
Barber, Introduction
Bayesian Inference - Introduction to probability theory
19
Graphical models
𝑝(π‘Ž, 𝑏, 𝑐) = 𝑝 𝑐|π‘Ž, 𝑏 𝑝(π‘Ž, 𝑏)
= 𝑝 𝑐 π‘Ž, 𝑏 𝑝 𝑏 π‘Ž 𝑝(π‘Ž)
Bishop, Fig. 8.1
Bayesian Inference - Introduction to probability theory
20
Graphical models: parents and
children
Node a is a parent of node b, node b is a child of node a.
Bishop, Fig. 8.1
Bayesian Inference - Introduction to probability theory
21
Belief networks = Bayesian belief
networks = Bayesian Networks
In general:
Every probability distribution
can be expressed as a
Directed acyclic graph (DAG)
Important: No directed cycles!
Bishop, Fig. 8.2
Bayesian Inference - Introduction to probability theory
22
Conditional independence
A variable a is conditionally independent of b given c, if
𝑝 π‘Ž, 𝑏 𝑐 = 𝑝 π‘Ž 𝑏, 𝑐 𝑝 𝑏 𝑐 = 𝑝 π‘Ž 𝑐 𝑝 𝑏 𝑐
In bayesian networks conditional independence can be
tested by following some simple rules
Bayesian Inference - Introduction to probability theory
23
Conditional independence – tail-totail path
Is a independent of b?
No!
Yes!
Bishop, Chapter 8.2
Bayesian Inference - Introduction to probability theory
24
Conditional independence – headto-tail path
Is a independent of b?
No!
Yes!
Bishop, Chapter 8.2
Bayesian Inference - Introduction to probability theory
25
Conditional independence – headto-head path
Is a independent of b?
Yes!
No!
Bishop, Chapter 8.2
Bayesian Inference - Introduction to probability theory
26
Conditional independence –
notation
Bishop, Chapter 8.2
Bayesian Inference - Introduction to probability theory
27
Conditional independence – three
basic structures
Bishop, Chapter 8.2.2
Bayesian Inference - Introduction to probability theory
28
More conventions in graphical
notations
Regression model
Short form
=
Parameters explicit
=
Bishop, Chapter 8
Bayesian Inference - Introduction to probability theory
29
More conventions in graphical
notations
Complete model used
for prediction
Trained on data tn
οƒ 
Bishop, Chapter 8
Bayesian Inference - Introduction to probability theory
30
Summary – things to remember
• Probabilities and how to compute with the οƒ Product
rule, Bayes’ Rule, Sum rule
• Probability densities οƒ  PDF, CDF
• Conditional and Marginal distributions
• Basic concepts of graphical models οƒ  Directed vs.
Undirected, nodes and edges, parents and children.
• Conditional independence in graphs and how to check
it.
Bishop, Chapter 8.2.2
Bayesian Inference - Introduction to probability theory
31
Download