SYMBOLIC BAYESIAN NETWORKS

advertisement
SYMBOLIC BAYESIAN NETWORKS
Edwin DIDAY1 , Richard EMILION2,
?
1. CEREMADE, University Paris-Dauphine
2. MAPMO, University of OrleĢans, France
? Contact author: richard.emilion@univ-orleans.fr
Keywords: Bayesian network, Conditional distribution, Dirichlet distribution, Independence test,
Symbolic.
Bayesian networks, see e.g. Darwich, A. (2009), are probabilistic directed acyclic graphs used for
system behavior modelling through conditional distributions. They generally deal with categorical
or real-valued random variables which are correlated. We consider the case of Bayesian networks
which deal with probability-distribution-valued random variables.
1
Statistical setting
Le X = (X1 , . . . , Xj , . . . , Xp ) be a random vector, p ≥ 1 being a integer and each Xj taking
values in the space of probability measures defined on a mesaured space (Vj , Vj ), j = 1, . . . , p.
Let (Xk,1 , . . . , Xk,j , . . . , Xk,p ) k = 1, . . . , K be a sample of size K of X.
It can be considered that k is a row index and j a column index.
2
Motivation
Actually the sample (Xk,1 , . . . , Xk,j , . . . , Xk,p ) k = 1, . . . , K is not observed but only estimated
from observed data. In symbolic data analysis (SDA), each observed
Qp data belong to a class among
K disjoint classes, say c1 , . . . , cK . They can be either vectors in j=1 Vj or in some Vj as seen in
the two examples below which illustrate two different situations.
The empirical distribution of the data in Vj which belong to class ck is an estimation of the probability distribution Xk,j . This distribution is considered as the j-th descriptor of class ck
2.1
Paired Samples
In the well-known Fisher’s iris data set, K = 3, c1 = ’setosa’, c2 = ’versicolor, c3 = ’virginica’,
p = 4. The observations are 50 iris in each of these 3 classes. The observed samples are paired
since each iris is described by a vector of 4 data. As an example, X3,2 is the probability distribution
of sepal width in ’virginica’ class.
2.2
Unpaired Samples
Let c1 , . . . , ck be K students and p professors that grade several students’ exams. Let Xk,j be the
distribution of student ck grades given by professor j. It is seen here that the samples are unpaired
since the exams and the number of exams can differ from one professor to another.
2.3
Dependencies
Clearly, in the case of paired samples, within each class, data of descriptor j are correlated to data
of descriptor j0 while this correlation is meaningless in the case of unpaired samples. However considering the K pairs of estimated distributions (Xk,j , Xk,j0 ), k = 1, . . . , K, j, j0 = 1, . . . , p, j 6= j0,
it is seen that the random distributions Xj and Xj0 can be correlated. This motivates us to consider
Bayesian networks dealing with probability distributions.
3
The case of finite sets
Assume that all the sets Vj are finite. Then each estimated distribution Xk,j is just a probability
vector of frequencies which size can differ from one index j to another. Bayesian networks are
built on testing the independence between Xj and Xj0 . Here, this can be done by using any independence test between two random vectors. We have used the indep.etest() function implemented
in the ’energy’ package for R. Szekely, G.J. - Rizzo, M.L. (2013)
Distributions and conditional distributions are estimated using kernels in the nonparametric case
while Dirichlet distributions are used in the parametric case.
4
The case of densities
Assume that each Vj is a measurable subsets of some Rdj and that Xk,j has a density fk,j w.r.t.
the Lebesgue measure. Then independence tests can be performed and conditional distributions
can be estimated using some functional data analysis methods Ramsey, J.O. - Silverman, B.W.
(2005). One can either use a finite number of coordinates on some basis, reducing the problem to
the preceding finite sets case, or use kernel estimators w.r.t. a distance on the space of functions.
References
Darwiche, A. (2009). Modeling and Reasoning with Bayesian Networks. Cambridge University
Press.
Ramsey, J.O. - Silverman, B.W. (2005). Functional Data Analysis. Springer.
Szekely, G.J. - Rizzo, M.L. (2013). The distance correlation t-test of independence in high dimension. J. Mult. Variate Anal. 17, 193–213. http://dx.doi.org/10.1016/j.jmva.
2013.02.012
Download