Introduction of Probabilistic Reasoning and Bayesian Networks Hongtao Du Group Presentation Outline Uncertain Reasoning Probabilistic Reasoning Bayesian Network (BN) Dynamic Bayesian Network (DBN) Reasoning The activity of guessing the state of the domain from prior knowledge and observations. Causal reasoning Diagnostic reasoning Combinations of these two Uncertain Reasoning (Guessing) Some aspects of the domain are often unobservable and must be estimated indirectly through other observations. The relationships among domain events are often uncertain, particularly the relationship between the observables and non-observables. The observations themselves may be unreliable. Even though observable, very often we do not have sufficient resource to observe all relevant events. Even though events relations are certain, very often it is impractical to analyze all of them Probabilistic Reasoning Methodology founded on the Bayesian probability theory. Events and objects in the real world are represented by random variables. Probabilistic models: Bayesian reasoning Evidence theory Robust statistics Recursive operators Graphical Model A tool that visually illustrate conditional independence among variables in a given problem. Consisting of nodes (Random variables or States) and edges (Connecting two nodes, directed or undirected). The lack of edge represents conditional independence between variables. Chain, Path, Cycle, Directed Acyclic Graph (DAG), Parents and Children Bayesian Network (BN) Probabilistic network, belief network, causal network. A specific type of graphical model that is represented as a Directed Acyclic Graph. BN consists of variables (nodes) V={1, 2, …, k} A set of dependencies (edges) D A set of probability distribution functions (pdf) of each variable P Assumptions 0 P( X ) 1 P(X)=1 if and only if X is certain If X and Y are mutually exclusive, then P(X v Y) = P(X) + P(Y) Joint probability P(X, Y)= P(X|Y) P(Y) P(Y | X ) P( X ) P( X | Y ) P(Y ) X represents hypothesis Y represents evidence P(Y|X) is likelihood P(X|Y) is the posterior probability If X, Y are conditionally independent P(X|Z, Y) = P(X|Z) Given some certain evidence, BN operates by propagating beliefs throughout the network. P(Z, Y, U, V) = P(Z) * P(Y|Z) * P(U|Y) * P(V|U) n P( X 1 ,, X n ) P[ X i | Pa( X i )] i 1 where Pa( X i ) is the parents of node X i Explaining away If a node is observed, its parents become dependent. Two causes (parents) compete to explain the observed data (child). Tasks in Bayesian Network Inference Learning Inference Inference is the task of computing the probability of each state of a node in a BN when other variables are known. Method: dividing set of BN nodes into non-overlapping subsets of conditional independent nodes. Example Z L X N YM L N M X N {x0 ,, xN 1} YM { y0 ,, yM 1} UK ZL Given Y is the observed variable. Goal: find the conditional pdf P(U K | Y ) over U K Case 1: UK Y K P(U K | Y ) (ui yi ) i 1 x0 1 ( x) 0 otherwise Case 2: UK X P(U K , Y ) P(U K , Y ) P(U K | Y ) P(Y ) U P(U K , Y ) K P(U K , Y ) P( x,U xX \U K K ,Y ) Learning Goal: completing the missing beliefs in the network. Adjusting the parameters of the Bayesian network so that the pdfs defined by the network sufficiently describes statistical behavior of the observed data. M: a BN model : Parameter of probability of distribution Z L : Observed data Goal: Estimating to maximize the posterior probability P( M | Z L ) P( M ) P( M | Z L ) P(Z L | , M ) P( | M )d P( Z L ) Assume P(M | Z L ) is highly peaked around maximum likelihood estimates ML P( M ) P( M | Z L ) P( Z L | ML , M ) P( ML | M ) P( Z L ) ML arg maxlog P( Z L | ) Dynamic Bayesian Network (DBN) Bayesian network with time-series to represent temporal dependencies. Dynamically changing or evolving over time. Directed graphical model of stochastic processes. Especially aiming at time series modeling. Satisfying the Markovian condition: The state of a system at time t depends only on its immediate past state at time t-1. Representation t1 Time slice t2 tk The transition matrix that represent these time dependencies is called Conditional Probability Table (CPT). Description T: time boundary we are investigating Y { y0 ,, yT 1} : observable variables X {x0 ,, xT 1} : hidden-state variables T 1 T 1 t 1 t 1 P( X , Y ) P( xt | xt 1 ) P( yt | xt )P( x0 ) P( xt | xt 1 ) : state transition pdfs, specifying time dependencies between states. P( yt | xt ) : observation pdfs, specifying dependencies of observation nodes regarding to other nodes at time slice t. P( x0 ) : initial state distribution. Tasks in DBN Inference Decoding Learning Pruning Inference Estimating the pdf of unknown states through given observations and initial probability distributions. Goal: finding P( X 0T 1 | Y0T 1 ) Y0T 1 { y1 ,, yT 1} : a finite set of T consecutive observations X 0T 1 {x1 ,, xT 1} : the set of corresponding hidden variables x0 x0 x0 y0 y0 y0 Decoding Finding the best-fitting probability values for the hidden states that have generated the known observations. Goal: determine the sequence of hidden states with highest probabilities. Xˆ 0T 1 arg max P( X 0T 1 | Y0T 1 ) X 0T 1 Learning Given a number of observations, estimating parameters of DBN that best fit the observed data. Goal: Maximizing the joint probability distribution. log P( X T 1 0 T 1 0 ,Y T 1 T 1 i 1 i 1 | ) log P( xt | xt 1 ) log P( yt | xt ) log P( x0 ) : the model parameter vector [log P( X 0T 1 , Y0T 1 | )] 0 Pruning An important but difficult task in DBN. Distinguishing which nodes are important for inference, and removing the unimportant nodes. Actions: Deleting states from a particular node Removing the connection between nodes Removing a node from the network Time slice t W1 (t ),,Wq (t ) : designated world nodes, a subset of the nodes, representing the part we want to inspect. V (t k ) , tk t If state of Wi (t ) is known, P(Wi (t ) si ) 1 , then V (t k ) are no longer relevant to the overall goal of the inference. Thus, (1) delete all nodes V (t k ) (2) incorporate knowledge that Wi (t ) si Future work Probabilistic reasoning in multiagent systems. Different DBNs and applications. Discussion of DBN problems.