Introduction of Probabilistic Reasoning and Dynamic

advertisement
Introduction of Probabilistic
Reasoning and Bayesian
Networks
Hongtao Du
Group Presentation
Outline

Uncertain Reasoning

Probabilistic Reasoning

Bayesian Network (BN)

Dynamic Bayesian Network (DBN)
Reasoning

The activity of guessing the state of the
domain from prior knowledge and
observations.
Causal reasoning
 Diagnostic reasoning
 Combinations of these two

Uncertain Reasoning (Guessing)

Some aspects of the domain are often
unobservable and must be estimated
indirectly through other observations.

The relationships among domain events
are often uncertain, particularly the
relationship between the observables
and non-observables.

The observations themselves may be
unreliable.

Even though observable, very often we
do not have sufficient resource to
observe all relevant events.

Even though events relations are
certain, very often it is impractical to
analyze all of them
Probabilistic Reasoning

Methodology founded on the Bayesian
probability theory.

Events and objects in the real world are
represented by random variables.

Probabilistic models:




Bayesian reasoning
Evidence theory
Robust statistics
Recursive operators
Graphical Model

A tool that visually illustrate conditional
independence among variables in a
given problem.

Consisting of nodes (Random variables
or States) and edges (Connecting two
nodes, directed or undirected).

The lack of edge represents conditional
independence between variables.

Chain, Path, Cycle, Directed Acyclic
Graph (DAG), Parents and Children
Bayesian Network (BN)

Probabilistic network, belief network,
causal network.

A specific type of graphical model that is
represented as a Directed Acyclic Graph.

BN consists of




variables (nodes) V={1, 2, …, k}
A set of dependencies (edges) D
A set of probability distribution functions
(pdf) of each variable P
Assumptions




0  P( X )  1
P(X)=1 if and only if X is certain
If X and Y are mutually exclusive, then
P(X v Y) = P(X) + P(Y)
Joint probability P(X, Y)= P(X|Y) P(Y)
P(Y | X ) P( X )
P( X | Y ) 
P(Y )





X represents hypothesis
Y represents evidence
P(Y|X) is likelihood
P(X|Y) is the posterior probability
If X, Y are conditionally independent
P(X|Z, Y) = P(X|Z)

Given some certain evidence, BN
operates by propagating beliefs
throughout the network.
P(Z, Y, U, V) = P(Z) * P(Y|Z) * P(U|Y) * P(V|U)
n
P( X 1 ,, X n )   P[ X i | Pa( X i )]
i 1
where Pa( X i ) is the parents of node X i

Explaining away


If a node is observed, its parents become
dependent.
Two causes (parents) compete to explain the
observed data (child).
Tasks in Bayesian Network
Inference
 Learning

Inference

Inference is the task of computing the
probability of each state of a node in a
BN when other variables are known.

Method: dividing set of BN nodes into
non-overlapping subsets of conditional
independent nodes.
Example
Z L  X N  YM
L N M
X N  {x0 ,, xN 1} YM  { y0 ,, yM 1}
UK  ZL
Given Y is the observed variable.
Goal: find the conditional pdf P(U K | Y ) over U K
Case 1:
UK  Y
K
P(U K | Y )   (ui  yi )
i 1
x0
1
 ( x)  
0 otherwise
Case 2:
UK  X
P(U K , Y )
P(U K , Y )
P(U K | Y ) 

P(Y )
U P(U K , Y )
K
P(U K , Y ) 
 P( x,U
xX \U K
K
,Y )
Learning

Goal: completing the missing beliefs in
the network.

Adjusting the parameters of the
Bayesian network so that the pdfs
defined by the network sufficiently
describes statistical behavior of the
observed data.


M: a BN model
 : Parameter of probability of distribution
Z L : Observed data
Goal: Estimating  to maximize the posterior
probability P( M | Z L )
P( M )
P( M | Z L ) 
P(Z L |  , M ) P( | M )d

P( Z L ) 
Assume P(M | Z L ) is highly peaked
around maximum likelihood estimates  ML
P( M )
P( M | Z L ) 
P( Z L |  ML , M ) P( ML | M )
P( Z L )
 ML  arg maxlog P( Z L |  )

Dynamic Bayesian Network (DBN)

Bayesian network with time-series to represent
temporal dependencies.

Dynamically changing or evolving over time.

Directed graphical model of stochastic
processes.

Especially aiming at time series modeling.

Satisfying the Markovian condition:
The state of a system at time t depends only on its immediate
past state at time t-1.
Representation

t1

Time slice
t2
tk
The transition matrix that represent these time
dependencies is called Conditional Probability
Table (CPT).
Description

T: time boundary we are investigating
Y  { y0 ,, yT 1} : observable variables
X  {x0 ,, xT 1} : hidden-state variables
T 1
T 1
t 1
t 1
P( X , Y )   P( xt | xt 1 ) P( yt | xt )P( x0 )
P( xt | xt 1 ) : state transition pdfs, specifying time
dependencies between states.
P( yt | xt ) : observation pdfs, specifying dependencies of
observation nodes regarding to other nodes at time slice t.
P( x0 ) : initial state distribution.
Tasks in DBN

Inference

Decoding

Learning

Pruning
Inference

Estimating the pdf of unknown states
through given observations and initial
probability distributions.

Goal: finding
P( X 0T 1 | Y0T 1 )
Y0T 1  { y1 ,, yT 1} : a finite set of T consecutive observations
X 0T 1  {x1 ,, xT 1} : the set of corresponding hidden variables
x0
x0
x0
y0
y0
y0
Decoding

Finding the best-fitting probability
values for the hidden states that have
generated the known observations.

Goal: determine the sequence of hidden
states with highest probabilities.
Xˆ 0T 1  arg max P( X 0T 1 | Y0T 1 )
X 0T 1
Learning

Given a number of observations,
estimating parameters of DBN that best
fit the observed data.

Goal: Maximizing the joint probability
distribution.
log P( X

T 1
0
T 1
0
,Y
T 1
T 1
i 1
i 1
|  )   log P( xt | xt 1 )   log P( yt | xt )  log P( x0 )
: the model parameter vector
[log P( X 0T 1 , Y0T 1 |  )]
0

Pruning

An important but difficult task in DBN.

Distinguishing which nodes are
important for inference, and removing
the unimportant nodes.

Actions:



Deleting states from a particular node
Removing the connection between nodes
Removing a node from the network

Time slice t

W1 (t ),,Wq (t ) : designated world nodes, a subset
of the nodes, representing the part we want to
inspect.
V (t k ) , tk  t
If state of Wi (t ) is known, P(Wi (t )  si )  1 ,
then V (t k ) are no longer relevant to the overall

goal of the inference.
Thus, (1) delete all nodes V (t k )
(2) incorporate knowledge that Wi (t )  si
Future work

Probabilistic reasoning in multiagent
systems.

Different DBNs and applications.

Discussion of DBN problems.
Download