Graphical Models for Marked Point Processes based on Local Independence VANESSA DIDELEZ

advertisement
Graphical Models for Marked Point Processes
based on Local Independence
VANESSA DIDELEZ
University College London
Abstract
A new class of graphical models capturing the dependence structure of events
that occur in time is proposed. It is based on so–called local dependence, instead
of conditional dependence, as the former is better suited for representing dynamic
dependencies. Local dependence is an asymmetric concept similar to Granger causality and thus leads to properties of the graphs that cannot be treated with classical
graph theory. Consequently, a new graph separation is introduced and its implications for the underlying model are discussed. Besides this crucial difference to the
classical graphical models there is a number of analogies, e.g. the factorisation of
the likelihood. The benefits of local (in)dependence graphs are discussed with regard
to the reasoning about and understanding of dynamic dependencies as well as to
computational simplifications.
Keywords: Event history analysis; Conditional independence; Counting processes; Grangercausality; Graphoid; Multistate models.
1
Introduction
Graphical models are used to deal with complex data structures that arise in many different fields whenever the interrelationship of variables in a multivariate setting is investigated. The analysis of such data typically involves considerable computational effort and
the multifaceted results require careful interpretation. The underlying idea of graphical
models is to represent the conditional dependence structure of the variables by a graph, facilitating the interpretation and helping to reduce the computational effort by indicating
e.g. how the structure of the whole system can be split into smaller sub-graphs corresponding to simpler and lower dimensional sub-models. Graphical models, investigated
and refined over the last two decades, have proven to be a valuable tool for probabilistic
modelling and multivariate data analysis in such different fields as expert systems and
artificial intelligence (Cowell et al., 1999, Jordan, 1999), hierarchical Bayesian modelling
(cf. BUGS project http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml), causal reasoning (Pearl, 2000; Spirtes et al., 2000), as well as sociological, bio-medical, and econometric applications. For overviews and many different applications see for instance the
monographs by Whittaker (1990), Cox and Wermuth (1996), Edwards (2000).
In contrast, the application of graphical models to time dependent data, such as event
histories or time series is only slowly getting on its way. Dahlhaus (2000) has proposed a
way of graphically modelling multivariate time series, but this approach does not capture
Research Report No. 258, Department of Statistical Science, University College London.
Date: October 2005.
1
the dynamics, i.e. the way how the present or future of the system depends or is affected
by the past. In contrast, the idea of Eichler (1999, 2000) is, instead of representing conditional (in)dependence, to graphically model dependencies based on Granger–causality,
thus capturing in a more intuitive way the dynamic structure of multivariate time series.
As will be discussed later, Granger (non)-causality can be regarded as the discrete time
version of local (in)dependence that we will use to investigate the structure of marked
point processes.
For continuous time processes, one approach, called Dynamic Bayesian networks, is to
discretise time and provide directed acyclic graphs that encode the independence structure for the transitions from t to t + 1 (Dean and Kanazawa, 1989). The method proposed
by Nodelman et al. (2002, 2003) comes closest to the kind of graphs that we are suggesting. In their Continuous Time Bayesian Networks they propose to represent a multistate
Markov process with nodes corresponding to subprocesses and edges corresponding to
dependencies of transition rates on states of other subprocesses. Our approach is more
general in that we consider continuous time marked point processes and investigate the
properties of the resulting graphs in more detail.
Marked point processes are commonly used to model event history data. The term event
history data or life history data seems to originate from sociology where it is often of
interest to investigate the dynamics behind events such as finishing college, finding a job,
getting married, starting a family, durations of unemployment, illness etc. Recognition of
the importance of longitudinal studies for gaining insight into inherently dynamic systems
leads to more and more surveys being carried out making event history data publicly
available. But such data also occurs in medical contexts, e.g. in survival analysis with
intermediate events such as the onset of a side effect, change of medication etc. The
complex nature of such data does not need to be pointed out; for a discussion in particular
of the problem of time dependent confounding see Keiding (1999).
For a graphical representation of the dependencies in event history data we need to define a suitable notion of dynamic dependence. In this paper we use the so–called local
independence, first proposed by Schweder (1970) for the case of Markov processes and applied e.g. in Aalen et al. (1980). It has been generalised to processes with a Doob–Meyer
decomposition by Aalen (1987) who focusses on the bivariate case, i.e. local dependence
between two processes. The analogy of the bivariate case to Granger-non-causality has
been pointed out by Florens and Fougère (1996). Here we extend this approach to more
than two processes namely the counting processes associated with a marked point processes. The basic idea of local independence is that the intensity of one type of event is
independent of certain past events once we know about specific other past events. Note
that the notion of ‘local independence’ used by Allard et al. (2001) is a different (symmetric) one.
The outline of the paper is as follows. We first set out the necessary notations and assumptions for marked point processes in Section 2.1 followed by the formal definition of
local independence in Section 2.2, the emphasis here being on the generalisation to a version that allows to condition on the past of other processes and hence describes dynamic
dependencies for multivariate processes. Section 2.3 defines graphs that are appropriate
to represent local (in)dependence. These graphs are different from the popular directed
acyclic graphs (DAGs) in that they allow cycles. Also, a new notion of graph separation, called δ–separation, is introduced which is required because local independence is
2
asymmetric and hence δ–separation needs to be asymmetric.
The main definitions and results are given in Section 3. Local independence graphs are
defined and their properties investigated. In particular we prove that δ-separation can
indeed be used to read local independencies off the graph, i.e. we show the equivalence
of the dynamic versions of the pairwise local and global graphical Markov properties,
paralleling classical graphical models. Also, in Section 3.3, the likelihood of a process
with given local independence graph is investigated and it is shown how it factorises in a
useful way.
The potential of local independence graphs is discussed in Section 4.
Apart from providing all the proofs, the appendix introduces asymmetric graphoids in
order to investigate whether δ-separation and local independence satisfy the asymmetric
versions of the graphoid axioms that are known to hold under certain assumptions for the
classical graph sparation and conditional independence. This provides a deeper insight into
the properties of local independence and is used to carry out the proof of the equivalence
of the Markov properties.
2
Preliminaries
We start by setting out the notation and giving an overview over the basic notions required. Marked point processes and their relation to counting processes are exposed in
section 2.1 using the notation of Andersen et al. (1993). In section 2.2 the concept of
local independence is explained in some detail. The necessary terminology for graphs is
introduced in Section 2.3.
2.1
Marked point processes and counting processes
Let E = {e1 , . . . , eK }, K < ∞, denote the (finite) mark space, i.e. the set containing all
types of events of interest, and T the time space in which the observations take place. We
assume that time is measured continuously so that we have T = [0, τ ] or T = [0, τ ) where
τ < ∞. The marked point process (MPP) Y consists of events given by pairs of variables
(Ts , Es ), s = 1, 2, . . ., on a probability space (Ω, F , P ) where Ts ∈ T , 0 < T1 < T2 . . ., are
the times of occurrences of the respective types of events Es ∈ E.
The mark specific counting processes Nk (t) associated with an MPP are given by
Nk (t) =
∞
X
1{Ts ≤ t; Es = ek },
k = 1, . . . , K.
s=1
For the applications
we have in mind it is reasonable to assume that the cumulative countP
ing process k Nk is non–explosive. We denote the internal filtration of a marked point
process by Ft = σ{(Ts , Es ) | Ts ≤ t, Es ∈ E} which is equal to σ{(N1 (s), . . . , NK (s)) |
s ≤ t}. Further, for A ⊂ {1, . . . , K}, we define FtA = σ{Na (s) | a ∈ A, s ≤ t} yielding
the filtrations of subprocesses, in particular Ftk is the internal filtration of an individual
counting process Nk .
Under rather general assumptions (cf. Fleming and Harrington, 1991, p. 61), a Doob–
Meyer decomposition of Nk (t) into a compensator and a martingale exists. Both these
processes depend on the considered filtration which is here taken to be the internal
R t filtration of the whole MPP Y so that the Ft –compensator of Nk is given as Λk (t) = 0 Λk (ds)
3
with the interpretation
Λk (dt) = E(Nk (dt) | Ft− ).
(1)
The differences Nk − Λk are Ft –martingales. If the compensator is absolutely continuous
we denote the corresponding intensity process by λk (t).
It will be important to distinguish between the Ft –compensator Λk , based on the past
of the whole MPP, and the FtA –compensators ΛA
k , A ⊂ {1, . . . , K}, based on the past of
the subprocess on marks in A, i.e. ΛA
=
E(N
(dt)
| Ft− ). It can be computed using the
k
k
innovation theorem (Brémaud, 1981, pp. 83)
A
ΛA
k (dt) = E(Λk (dt) | Ft− ) ∀ t ∈ T ,
and a particular way of doing so is given in Arjas et al. (1992).
We assume that the multivariate counting process N = (N1 , . . . , NK ) has the property
that no jumps occur at the same time. Formally, this means that the Ft –martingales
Nk − Λk are orthogonal. It implies for the MPP that the marks E are distinct, i.e. they
are defined such that they cannot occur systematically at the same time.
2.2
Local independence
The bivariate case is defined as follows (cf. Aalen, 1987).
Definition 2.1 Local independence (bivariate).
Let Y be an MPP with E = {e1 , e2 } and N1 and N2 the associated counting processes on
(Ω, F , P ). Then, N1 is said to be locally independent of N2 over T if Λ1 (t) is measurable
w.r.t Ft1 for all t ∈ T . Otherwise we speak of local dependence.
The process N1 being locally independent of N2 is symbolised by N2 →
/ N1 . Equivalently
we will sometimes say that e1 is locally independent of e2 , or e2 →
/ e1 .
The essence of the above definition is that the compensator Λ1 (t), i.e. our ‘short–term’
prediction of N1 , remains the same under the reduced filtration Ft1 as compared to the full
one Ft . This implies that we do not lose any essential information by ignoring how often
and when event e2 has occurred before t. One could say that if N2 →
/ N1 then the presence
of N1 is conditionally independent of the past of N2 given the past of N1 , symbolising this
by
N1 (t)⊥
⊥ Ft2− |Ft1− ,
(2)
where A⊥
⊥ B|C means ‘A is conditionally independent of B given C’ (cf. Dawid, 1979).
For general processes, this is a stronger property than local independence but it holds for
marked point processes as their distributions are determined by the compensators. In addition N1 (t)⊥
⊥ N2 (t) will only hold if the two processes are mutually locally independent
of each other — hence the name local independence.
Note that the Ft1 –measurability of Λ1 (t) would trivially be true if e1 = e2 but, in such
a case, we would not want to speak of independence of e1 and e2 . This is the reason for
4
excluding jumps at the same time.
As can easily be checked, local (in)dependence needs be neither symmetric, reflexive nor
transitive. However, since in most practical situations a subprocess depends at least on
its own past we will assume that for all considered processes local dependence is reflexive
(i.e. a process is always locally dependent on itself). An example for a subprocess that
depends only on the history of a different subprocess and not on its own history is given
in Cox and Isham (1980, p. 122).
Example 1. Skin disease.
In a study with women of a certain age, Aalen et al. (1980) consider two events: occurrence of a particular skin disease and onset of menopause. Their analysis reveals that the
intensity for developing this skin disease is greater once the menopause has started than
before. In contrast, and as one would expect, the intensity for onset of menopause does
not change once the person has developed this skin disease. We can therefore say that
menopause is locally independent of this skin disease but not vice versa.
This particular analysis (Aalen et al., 1980) is based on a multistate model, with the states
‘0 = initial state’, ‘S = skin disease has occurred (but no menopause)’, ‘M = menopause
has started (but no skin disease)’ and ‘M/S = both menopause and skin disease have
started’. The assumptions about the transitions are that the transition rates α0→S , α0→M ,
αM →M/S , αS→M/S are non-zero but all others are zero. In particular α0→M/S = 0 reflects
the assumption that skin disease and menopause do not start systematically at the same
time, corresponding to the above ‘no jumps at the same time’ assumption. It is well
known that multistate (Markov) models can be represented using counting processes for
each type of transition. The local independence found in the study is then simply equivalent to α0→M = αS→M/S , while α0→S < αM →M/S .
A more theoretical example for local independence of a bivariate Poisson process (without
calling it like that) is given in Daley and Vere–Jones (2003, p. 250).
Let us now turn to the case of more than two types of events. This requires conditioning
on the past of other processes as follows.
Definition 2.2 Local independence (multivariate).
Let N = (N1 , . . . , NK ) be a multivariate counting processes associated with an MPP. Let
further A, B, C be disjoint subsets of {1, . . . , K}. We then say that a subprocess NB is
locally independent of NA given NC over T if all FtA∪B∪C –compensators Λk , k ∈ B, are
measurable with respect to FtB∪C for all t ∈ T . This is denoted by NA →
/ NB | NC or
briefly A →
/ B | C. Otherwise, NB is locally dependent on NA given NC , i.e. A → B | C.
Conditioning on a subset C thus means that we retain the information about whether and
when events with marks in C have occurred in the past when considering the intensities
for marks in B. If these intensities are independent of the information on whether and
when events with marks in A have occurred we have conditional local independence. In
analogy to (2) the above definition of multivariate local independence is equivalent to
NB (t)⊥
⊥ FtA− |FtB∪C
−
5
∀t ∈ T .
(3)
Example 2. Graft versus host disease.
Keiding et al. (2001) consider patients who have received bone marrow transplantation
as a treatment for leukaemia, some of which develope graft versus host disease (GvHD),
where the transplanted immune cells attack the host tissues. The events of interest are
two types of GvHD, acute and chronic, as well as relapse and death, where the latter
both are terminal events (once a relapse occurs the patient is taken out of the study).
Patients can develope acute GvHD before it becomes chronic, or can develope chronic
GvHD without going through the acute stage first. Using a multistate model, one of the
findings was that the intensity for relapse after having only gone through chronic GvHD
is the same as after having gone through both acute and chronic GvHD. In addition,
the intensity for relapse after developing only acute GvHD was the same as when not
having had any GvHD yet. We may therefore say that relapse is locally independent of
acute GvHD given chronic GvHD, and given that the person is still alive of course. In
other words, the intensity for relapse is fully explained by knowing whether or not the
patient has developed chronic GvHD, knowledge of acute GvHD not being relevant. Note
that if we ignore whether or not the patient has reached the chronic stage, relapse will
not be (marginally) locally independent of acute GvHD because acute GvHD increases
the intensity for chronic GvHD. This example illustrates that the bivariate Definition 2.1
captures a different property than the multivariate Definition 2.2 where we condition on
the past of other processes just like conditional independence of variables neither implies
nor follows from marginal independence.
An additional finding of the study was that the event of death is not locally independent
of any of the other events. Its intensity increases clearly when a patient has gone through
both, acute and chronic GvHD, as compared to only acute or only chronic.
In order to see the relation with local independence, we briefly review Granger non–
causality (Granger, 1969). Let XV (t) = {XV (t) | t ∈ Z} with XV (t) = (X1 (t), . . . , XK (t))
be a multivariate time series, where V = {1, . . . , K} is the index set. For any A ⊂ V we
define XA = {XA (t)} as the multivariate subprocess with components Xa , a ∈ A. Further
let X A (t) = {XA (s) | s ≤ t}. Then, for disjoint subsets A, B ⊂ V we say that XA is
strongly Granger-non causal for XB up to horizon h ∈ IN if
XB (t + k)⊥
⊥ X A (t) | X V \A (t),
k = 1, . . . , h, for all t ∈ Z. The interpretation for h = 1 is similar as for local independence, i.e. the present value of XB is independent of the past of XA given its own past
and the one of all other components C = V \(A ∪ B), in analogy to (3).
Finally, in this section, we want to indicate how the definition of local independence can
be generalised to stopped processes. This can be of interest in particular when there are
absorbing states such as death. In that case all other events will be locally dependent
on this one because all intensities are zero once death has occurred. However, the dependence is ‘trivial’ and not of much interest. Let T be a Ftk –stopping time, e.g. the first
time an event of type ek occurs, and let NT = (N1T , . . . , NKT ) be a multivariate counting
process stopped at T . Then the compensators of NjT are given by ΛTj , j ∈ V , and local
independence can be formalised as follows.
Definition 2.3 Local independence for stopped processes.
Let NT = (N1T , . . . , NKT ) be a multivariate counting processes associated to an MPP
6
and stopped at time T . Then we say that A →
/ B | C if there exist FtB∪C –measurable
predictable processes Λ̃k , k ∈ B, such that the FtA∪B∪C —compensators of NTB are given
by ΛTk (t) = Λ̃k 1{t ≤ T }, k ∈ B.
The local independencies in a stopped process have to be interpreted as being valid as
long as t ≤ T .
2.3
Directed graphs and δ–separation
An obvious way of representing the local independence structure of an MPP by a graph is
to depict the marks as vertices and to use an arrow as symbol for local dependence. This
leads to directed graphs that may have more than two directed edges between a pair of
vertices in case of mutual local dependence, and that may have cycles. These graphs will
be introduced in some detail below as they differ considerably from the usual directed
acyclic graphs (DAGs) or undirected graphs (e.g. Cowell et al., 1999).
Graph notation and terminology.
A graph is a pair G = (V, E), where V = {1, . . . , K} is a finite set of vertices and E is a
set edges. The graph is said to be undirected if E ⊂ E u (V ) = {{j, k}|j, k ∈ V, j 6= k}. It
is called directed if E ⊂ E d (V ) = {(j, k)|j, k ∈ V, j 6= k}.
Undirected edges {j, k} are depicted by lines, j — k, and directed edges (j, k) by arrows,
−→
j −→ k. If (j, k) ∈ E and (k, j) ∈ E this is shown by stacked arrows, j ←− k.
For A ⊂ V the induced subgraph GA of a directed or undirected graph G is defined as
(A, EA ) with EA = E ∩ E d (A) or EA = E ∩ E u (A), respectively. In an undirected graph
a subgraph GA is called complete if all nodes in A are linked by edges, i.e. {j, k} ∈ V for
all j, k ∈ A, j 6= k. A maximal complete subgraph is called a clique of G.
Consider a directed or undirected graph G = (V, E). An ordered (n + 1)–tuple (j0 , . . . , jn )
of distinct vertices is called an undirected path from j0 to jn if {ji−1 , ji } ∈ E and a directed path if (ji−1 , ji ) ∈ E for all i = 1, . . . , n. A (directed) path of length n with j0 = jn
is called a (directed) cycle. A subgraph π = (V ′ , E ′ ) of G with V ′ = {j0 , . . . , jn } and
E ′ = {e1 , . . . , en } ⊂ E is called a trail between j0 and jn if ei = (ji−1 , ji ) or ei = (ji , ji−1 )
or ei = {ji , ji−1 } for all i = 1, . . . , n.
For directed graphs we further require the following almost self-explanatory notations. Let
A ⊂ V . The parents of A are given by pa(A) = {k ∈ V \A|∃ j ∈ A : (k, j) ∈ E} whereas
the children are ch(A) = {k ∈ V \A|∃ j ∈ A : (j, k) ∈ E}. The set cl(A) = pa(A) ∪ A is
called the closure of A. The ancestors of A are defined as an(A) = {k ∈ V \A|∃ j ∈ A with
a directed path from k to j} while the descendants are de(A) = {k ∈ V \A|∃ j ∈ A with a
directed path from j to k}. Consequently, nd(A) = V \(de(A)∪A) are the non–descendants
of A.
If pa(A) = ∅, then A is called ancestral. For general A ⊂ V , An(A) is the smallest ancestral set containing A, given as A ∪ an(A).
Example 3. Directed graph.
Figure 1 is an example of a general directed graph, where V = {1, 2, 3, 4, 5}. It contains
directed cycles, e.g. 2 → 1 → 2 or 2 → 5 → 1 → 2. A directed path from 5 to 2 is given
7
1
5
2
4
3
Figure 1: General directed graph
via 1, while there are many trails between 5 and 2, e.g. 5 → 4 ← 3 → 2. The parents of 2,
for instance, are pa(2) = {1, 3} while the children are ch(2) = {1, 5}, hence the node 1 is
a parent and a child of 2 at the same time. The ancestors of 2 are an(2) = {1, 3, 5} while
the descendants are de(2) = {1, 4, 5}. The only ancestral sets in this graph are given by
{3}, {1, 2, 3, 5} and, trivially, V itself.
Graph separation.
In undirected graphs we say that subsets A, B ⊂ V are separated by C ⊂ V if any path
from elements in A to elements in B is intersected by C. This is symbolised by A⊥
⊥g B|C.
For directed (not necessarily acyclic) graphs as introduced above we define a similar type
of separation that can be based on the moral graph.
The undirected version of a directed graph G is defined as G∼ = (V, E ∼ ) with E ∼ =
{{j, k}| (j, k) ∈ E or (k, j) ∈ E}, i.e. G∼ is the undirected graph replacing all arrows by
undirected edges. The moral graph Gm = (V, E m ) is given by inserting undirected edges
between any two vertices that have a common child (if they are not already joined anyway)
and then taking the undirected version of the resulting graph, i.e. E m = {{j, k}|∃ i ∈ V :
j, k ∈ paG (i)} ∪ E ∼ .
Now, we can define a separation for directed graphs which will allow to read properties
of the local independence structure off these graphs as will be addressed in Sections 3.1
and 3.2.
Definition 2.4 δ–Separation.
Let G = (V, E) be a directed graph. For B ⊂ V , let GB denote the graph obtained by
deleting all directed edges of G starting in B, i.e. GB = (V, E B ) with E B = E\{(j, k)|j ∈
B, k ∈ V }.
Then, we say for pairwise disjoint subsets A, B, C ⊂ V that C δ–separates A from B
m
in G if A⊥
⊥g B|C in the undirected graph (GB
An(A∪B∪C) ) , i.e. the moral subgraph on
An(A ∪ B ∪ C), where the directed edges starting in B have previously been deleted.
In case that A, B and C are not disjoint we define that C δ–separates A from B if C\B
δ–separates A\(B ∪ C) from B. We define that the empty set is always separated from B.
Additionally, we define that the empty set δ–separates A from B if A and B are unconm
nected in (GB
An(A∪B) ) .
8
1
5
2
4
3
Figure 2: Moral graph to check if {3, 5} δ–separates {1, 2} from {4}.
Note that except for the fact that we delete edges starting in B, δ-separation parallels the
one for DAGs. However, it is the initial edge deletion which makes δ–separation asymmetric. Asymmetry is required because we aim at modelling an asymmetric concept of
dependence, local dependence. Further properties of δ–separation are discussed in Appendix 5.2.
1
5
2
4
3
Figure 3: Moral graph to check which sets are δ–separated from {5}.
Example 3 ctd. Directed graph.
For the graph in Figure 1 it seems that node 4 is only ‘directly influenced’ by 3 and 5. So
let us check if {3, 5} δ–separates {1, 2} from {4}. No edges out of 4 need to be removed
as there are none, but one edge between 3 and 5 has to be added due to their common
child 4. This results in the graph in Figure 2 where the extra edge is indicated as dotted
line and a box indicates that arrows out of node 4 have to be deleted. We can see that
indeed the required separation holds.
We might further ask which sets are δ–separated from 5. This can be read off the graph
in Figure 3. The two edges out of 5 have been deleted and no edge has to be added as any
vertices with common children are already joint. Hence we can see that {1, 2} δ–separate
3 as well as 4 from 5, but also that 3 alone separates 4 from 5.
Finally consider a situation where δ–separation does not hold. Any directed path from 5
to 2 goes via 1, one might therefore think that {1} δ–separates {5} from {2}. The relevant
9
moral graph is given in Figure 4. Note that the node 4 has been deleted as it is not in the
minimal ancestral set of {1, 2, 5}. An edge between 3 and 5 has to be added due to their
common child node 1. This latter edge however creates a path from 5 to 2 which is not
intersected by 1 and hence the desired δ–separation does not hold.
1
5
2
3
Figure 4: Moral graph to check if {1} δ–separates {5} from {2}.
An equivalent way of reading off δ–separation is formulated below using the notion of
blocked trails, similar to d–separation for DAGs (Pearl, 1988; Verma and Pearl, 1990).
For a directed graph we say that a trail between j and k is blocked by C if it contains a
vertice γ such that either (i) directed edges of the trail do not meet head–to–head at γ
and γ ∈ C, or (ii) directed edges of the trail meet head–to–head at γ and γ as well as all
its descendants are no elements of C. Otherwise the trail is called active.
Proposition 2.5 Trail condition for δ–separation
Let G = (V, E) be a directed graph and A, B, C pairwise disjoint subsets of V . Define
that any allowed trail from A to B contains no edge of the form (b, k), b ∈ B, k ∈ V \B.
For disjoint subsets A, B, C of V , we have that C δ–separates A from B if and only if all
allowed trails from A to B are blocked by C.
The proof is given in Appendix 5.1.
Example 3 ctd. Directed graph.
All allowed trails from 3 to 5 are
3 → 1 → 5,
3 → 1 → 2 → 5,
3 → 1 ← 2 → 5,
3 → 2 → 5,
3 → 2 → 1 → 5,
3 → 2 ← 1 → 5.
In each of these, arrows do not meet head–to–head at either {1} or {2}, hence {1, 2}
blocks every allowed path from 3 to 5.
Checking all allowed trails in this way seems rather cumbersome, so we will henceforth
mainly use the moralisation criterion according to Definition 2.4.
10
3
Local independence graphs
Having introduced the idea of local independence and suitable graphs we can now bring
both together and present the main definitions and results of this paper. In Section 3.1 we
define local independence graphs and in Section 3.2 it is explored what properties can be
read of such graphs. In analogy to conditional independence graphs we consider pairwise,
local, and global Markov properties. It is then shown in Section 3.3 that local independence
graphs induce a specific factorisation of the distribution which is again similar to the one
known for conditional independence graphs, in particular DAGs. Finally, some possible
extensions of local independence graphs are indicated in Section 3.4.
3.1
Definition of local independence graphs
The following defining property (4) is called the pairwise dynamic Markov property, where
we say dynamic to emphasise the difference to graphs based on conditional independence.
Definition 3.1 Local independence graph
Let NV = (N1 , . . . , NK ) be a multivariate counting process associated to an MPP Y with
mark space E = {e1 , . . . , eK }. Let further G = (V, E) be a directed graph, V = {1, . . . , K}.
Then, G is called the local independence graph of Y if for all j, k ∈ V
(j, k) ∈
/E
⇔
{j} →
/ {k}|V \{j, k}.
(4)
Example 1 ctd. Skin disease.
The local independence graph for the relation between menopause and skin disease is
very simple, cf. Figure 5. The absence of an arrow from skin disease to menopause indicates that the intensity for start of menopause does not change once the skin disease
has occurred. Nothing more can be read off such a graph when only two events are being
considered. Like for classical graphical models the interesting case is when there are more
than two nodes and we can investigate separations. However, notice that even with this
simple example there is no way of expressing the local independence using a graph based
on conditional independence for the two times T1 = ‘time of occurrence of skin disease’
and T2 = ‘time of occurrence of menopause’ as these are simply dependent.
skin disease
menopause
Figure 5: Local independence graph for skin disease example.
Example 2 ctd. Graft versus host disease.
The relation between acute and chronic GvHD on the one hand and relapse and death on
the other hand is depicted in Figure 6. Figure 6(a) contains dotted arrows to indicate that,
because relapse and death are terminal events, the intensities for all other events logically
change (to zero) once either death or relapse have occurred. Figure 6(b) shows the graph
for the stopped process, stopping at the time when death or relapse occur, whichever is
earlier. As mentioned before, when interpreting this graph one has to remember that any
11
Acute
GvHD
Relapse
Chronic
GvHD
Death
Acute
GvHD
Relapse
Chronic
GvHD
(a)
Death
(b)
Figure 6: Local independence graph for GvHD example. (a) shows the original graph, (b)
shows the graph for the process stopped at the time when relapse or death occurs.
properties read off the graph are only valid as long as the patient is still alive and has not
yet relapsed.
The absence of a directed edge from acute GvHD to relapse, in both graphs of Figure 6,
reflects the local independence identified earlier, i.e. given we know whether or not the
patient has reached the chronic stage (as long as he or she is still alive) the intensity for
relapse does not change when knowing whether the patient has gone through the acute
stage. The directed edges present in the graph indicate the following. Firstly, acute GvHD
increases the intensity for chronic GvHD so that there must be an arrow from acute to
chronic GvHD. Secondly, it is impossible to go from the chronic back to the acute stage
so the intensity for the latter is then zero and hence we need an arrow from chronic to
acute GvHD. The intensity for the event of death is different for each of the four possible
previous histories: no GvHD, only acute, only chronic GvHD or both versions. Therefore
there are directed edges from both GvHD nodes.
3.2
Dynamic Markov properties
In order to investigate what else can be read off local independence graphs we next present
two further, slightly stronger properties than the pairwise dynamic Markov property.
Definition 3.2 Local dynamic Markov property
Let G = (V, E) be a directed graph. For an MPP Y the property
∀k∈V :
V \cl(k) →
/ {k} | pa(k),
(5)
is called the local dynamic Markov property w.r.t. G.
The above property (5) could for instance be violated if two components in pa(k) are a.s.
identical. Let for instance N1 = N2 a.s. Then, it holds for any N3 that N1 →
/ N3 | N2 as well
as N2 →
/ N3 | N1 but not necessarily (N1 , N2 ) →
/ N3 as would be claimed by property (5).
12
However, N1 = N2 a.s. is prevented by the orthogonality assumption because for a.s. sure
identical counting processes the martingales resulting from a Doob–Meyer decomposition
are not orthogonal (Andersen et al., 1993, pp. 73). As shown in Appendix 5.3, the exact
condition for the property (5) to follow from (4) is that
FtA ∩ FtB = FtA∩B
∀ A, B ⊂ V,
∀t ∈ T ,
(6)
where we define F ∅ = {∅, Ω}. Property (6) formalises the intuitive condition that the
components of N are different enough to ensure that common events are necessarily due
to common components.
An even stronger dynamic Markov property is the global dynamic Markov property which
links δ–separation and local independence.
Definition 3.3 Global dynamic Markov property
Let NV = (N1 , . . . , NK ) be a multivariate counting process associated to an MPP Y and
G = (V, E) a directed graph. For disjoint subsets A, B, C ⊂ V the property
C δ–separates A from B in G
⇒
A→
/ B | C.
(7)
is called the global dynamic Markov property w.r.t. G.
The significance of the global Markov property is that it provides a way to verify whether
a subset C ⊂ V \(A ∪ B) is given such that A →
/ B|C, i.e. local independence is preserved
even when ignoring the information in the events V \(A ∪ B ∪ C). This would imply that
FtB∪C is sufficient to assess the local independence NA →
/ NB |NC . Clearly, this property
is useful to reduce complexity in a multivariate setting.
All prerequisites for the main result in this section have now been gathered.
Theorem 3.4 Equivalence of dynamic Markov properties
Let Y be a marked point process with local independence graph G = (V, E). Under the
assumption of (6) and further regularity conditions (cf. Appendix 5.3), the pairwise, local
and global dynamic Markov properties are equivalent.
The proof is given in Appendix 5.3.
Example 2 ctd. Graft versus host disease.
Given what we have said before, we should find that the nodes C = {‘chronic’, ‘death’}
δ–separate A =‘acute’ from B =‘relapse’ in the local independence graph, Figure 6(a).
In Figure 7(a) we remove the arrows out of ‘relapse’ and then, in Figure 7(b), we obtain
the undirected version (note that no edges need to be added as all nodes with common
children are already joint). In Figure 7(b) we can easily verify that ‘acute’ and ‘relapse’
are separated by the other two nodes, but ‘acute’ and ‘relapse’ are not separated if we do
not condition on ‘chronic’.
If the same graphical check is performed on the graph in Figure 6(b), it looks as if ‘chronic
GvHD’ is the only separating node, but it has to be kept in mind that this graph is only
valid as long as the patient is alive.
13
Acute
GvHD
Relapse
Chronic
GvHD
Death
Acute
GvHD
Relapse
Chronic
GvHD
(a)
Death
(b)
Figure 7: Checking for δ–separation by (a) removing arrows out of ‘relapse’ and verifying
separations in the undirected version (b).
The main point of Theorem 3.4 is that it allows to check if local independencies are preserved when some components of the multivariate process are ignored as illustrated in the
following example.
Example 4. Home visits.
This example is not taken from the literature but similar studies have been carried out
(e.g. Vass et al., 2002, 2004). In some countries programmes exist to assist the elderly
through regular home visits by a nurse. This is meant to reduce unnecessary hospitalisations while increasing the quality of life for the person allowing them to stay in a familiar
environment and giving them the feeling to be in control. It is hoped that this also increases the survival time. Assume that data is available only on the number and times of
home visits, hospitalisation and the time of death. In this case one might find that the
home visits have a negative effect on survival. This is because typically the nurses will
increase their visits if the patient’s health is deteriorating which might be a precursor of
death. An indicator for deteriorating health could e.g. be increasing dementia. The graph
in Figure 8(a) shows the local independence graph for the three observed processes, again
omitting the trivial arrows out of the node ‘death’. However, the true underlying structure might be as depicted in Figure 8(b), where once information on the dementia process
is included (in addition to hospitalisation) information about past home visits does not
change the intensity for death. It can be seen from the graph that analysing the effect
of home visits on survival including information on hospitalisation is not enough because
the node ‘hospitalisation’ does not separate ‘home visits’ from ‘death’. One can regard
‘dementia’ as an unobserved confounder creating an association between ‘home visits’ and
‘death’.
More surprisingly, ‘hospitalisation’ would still not be a separating set even if the home
visits were modulated, e.g. to take place exactly once a week like in a controlled study.
They would then be locally independent of any other process as they would be controlled
externally (this is called an exogenous process in the econometrics literature). But as can
14
Home
visits
Hospitalisation
Death
(a)
Home
visits
Hospitalisation
Death
Dementia
(b)
Figure 8: Local independence graph for home visits example; (a) as one might expect for
the three observed processes if (b) is the underlying true structure.
be seen from Figure 9, especially from the moralised graph in part (b), ‘hospitalisation’
still does not separate ‘home visits’ from ‘death’ because both have an arrow into ‘hospitalisation’ which creates the moral edge between these two nodes. For example, if a
patient is hospitalised due to a home visit then it is implied that this was less likely due
to dementia; hence a history of hospitalisation with preceding home visit predicts survival
differently from a hospitalisation without preceding homevisit.
Notice that for the discrete time case it is known (Robins, 1986, 1987, 1997) that in a
situation like Figure 8(b) we cannot draw any conclusions about the causal effect of home
visits on survival. In contrast, from Figure 9 this is possible even when no information on
dementia is available. However, standard methods that just model the intensity for survival
with time varying covariates for the times of previous home visits and hospitalisations will
typically give misleading results due to the conditional association between ‘home visits’
and ‘death’ given ‘hospitalisation’.
3.3
Likelihood factorisation
In order to discuss properties and implications of the likelihood for graphical MPPs we
will regard the data consisting of times and types of events (t1 , e1 ), (t2 , e2 ), . . . , (tn , en ) as
a realisation of the history process Ht = {(Ts , Es )|Ts ≤ t}. As for filtrations, Ht− denotes
the strict pre–t history process. Additionally, HtA , A ⊂ {1, . . . , K}, defined as
HtA = {(Ts , Es ) | Ts ≤ t and ∃ k ∈ A : Es = ek , s = 1, 2 . . .}
denotes the history process restricted to the marks in A. Any set of marked points for
which it holds that ts = tu , s 6= u, implies es = eu can be a history, i.e. a realization of
Ht . Note that (up to completion by null sets) the different filtrations can be regarded as
being generated by the history processes, i.e. FtA = σ{HtA }, A ⊂ {1, . . . , K}.
15
Home
visits
Hospitalisation
Home
visits
Hospitalisation
Death
Dementia
Death
Dementia
(a)
(b)
Figure 9: Home visits example assuming modulated home visits; (a) shows that ‘home
visits’ is locally independent of any other process as it is controlled externally; (b) shows
the moral graph.
Before deriving the likelihood for a given local independence graph, we recall the likelihood
for the general case. Based on the mark specific compensators Λk (t) and intensity process
λk (t) the corresponding crude quantities are given by
Λ(t) =
K
X
Λk (t)
and
λ(t) =
K
X
λk (t).
k=1
k=1
These
are the compensator and intensity process of the cumulative counting process
P
N
.
part is denoted by
k In case that Λ is not absolutely continuous the continuous
k
P
C
−
Λ and the jumps by ∆Λ, i.e. ∆Λ(t) = Λ(t) − Λ(t ) and Λ(t) = s≤t ∆Λ(s) + ΛC (t). The
jumps ∆Nk (t) of the counting processes are defined analogously. We assume ΛC possesses
an intensity. Also note that by the assumptions made earlier there can only be countably
many jumps ∆Λ. The likelihood process L(t|Ht ) for a marked point process is then given
as follows (cf. Arjas, 1989)
Y
Y
(1 − ∆Λ(s))1−∆N (s) · exp(−ΛC (t)).
L(t|Ht ) =
ΛEs (dTs ) ·
Ts ≤t
s≤t
In the absolutely continuous case this simplifies to
Z t
Y
L(t|Ht ) =
λEs (Ts ) · exp −
λ(s) ds .
(8)
0
Ts ≤t
As noted by Arjas (1989), the likelihood for MPPs has already a product form over time,
i.e. it is constructed as the product of the likelihood for an occurrence or non–occurrence
of an event given the past history.
16
If G is the local independence graph of Y we can rewrite (8) as follows:
L(t|Ht ) =
K Y
Y
λk (Ts )1{Es =ek } · exp −
=
k=1


Y
Ts(k) ≤t
λk (s) ds
0 k=1
k=1 Ts ≤t
K
Y
Z tX
K
!

Z t
λk (Ts(k) ) · exp −
λk (s) ds  ,
0
where Ts(k) with Es(k) = ek are the occurrence times of mark ek . The inner product of
the above can be regarded as the mark specific likelihood and is denoted by Lk (t|Ht ). By
the definition of a local independence graph and the equivalence of the pairwise and local
cl(k)
dynamic Markov properties under condition (6) we have that λk (s) is Ft –measurable,
where cl(k) is the closure of node k. It follows that
cl(k)
Lk (t|Ht ) = Lk (t|Ht
),
(9)
i.e. the mark specific likelihood Lk based on the whole past remains the same if the
available information is restricted to how often and when those marks that are parents of
cl(k)
ek in the graph and ek itself have occurred in the past, symbolised by Ht .
If the compensator is not absolutely continuous we still have that
K Y
Y
Y
1−∆N (s)
(1 − ∆Λ(s))
=
(1 − ∆Λk (s))1−∆Nk (s) .
s≤t
k=1 s≤t
To see this, recall that ∆Λk (t) = E(∆Nk (t) | Ft− ) = P (∆Nk (t) = 1|Ft− ) so that a
discontinuity ∆Λk (t) 6= 0 implies that there is a positive probability for event ek to occur
at time t. Since we make the assumption that no two events occur at the same time we
have that for any t ∈ T where ∆Λ(t) 6= 0 there exists an ek ∈ E such that ∆Λ(t) = ∆Λk (t).
It follows that under (6) the likelihood factorises as
Y
cl(k)
Lk (t|Ht ),
(10)
L(t|Ht ) =
k∈V
where
cl(k)
Lk (t|Ht
)=
Y
Ts(k) ≤t
Λk (dTs(k) ) ·
Y
(1 − ∆Λk (s))1−∆Nk (s) · exp(−ΛC
k (t)).
s≤t
This parallels the factorisation for DAGs where the joint density is decomposed into the
univariate conditional distributions given the parents. Here, we replace the parents by the
closure because we also need to condition on the past of a component itself which is not
required in the static case.
Example 5. Independent censoring.
The above reasoning is actually quite well known in the context of independent censoring.
In a simple survival analysis with possible right censoring there are two events: death
and censoring. The latter being independent means that the intensity for survival can be
17
assumed to be the same regardless of whether we know that censoring has taken place
or not (cf. Andersen et al., p.139), i.e. survival is locally independent of censoring. We
can then use the above argument to justify that we only use the likelihood based on the
observed survival times, the partial likelihood, not needing to specify a model for censoring. Note that censoring is allowed to locally depend on survival and it usually does in
the sense that long survival times are more likely to be censored than short ones. One
might wish to further elaborate the reasons for censoring by including some unobservable
processes, e.g. patients moving house, in the local independence graph so that it can be
verified whether they affect both, censoring and survival, which would imply a violation
of the independent censoring assumption.
Two consequences of the above factorisation regarding the relation of local and conditional
independence are given next. For the general case of marked point processes it holds that
for disjoint A, B, C ⊂ V , where C separates A and B, i.e. A⊥
⊥g B | C, in (GAn(A∪B∪C) )m ,
we have
FtA ⊥
⊥ FtB | FtC
∀t ∈ T .
(11)
This can be seen as follows. The factorisation (10) implies that the marginal likelihood
for the marked point process discarding events not in An(A ∪ B ∪ C), i.e. {(Ts , Es ) |
s = 1, 2, . . . ; Es ∈ EAn(A∪B∪C) }, is given by just integrating out the contributions of
ek ∈
/ EAn(A∪B∪C) yielding
Y
cl(k)
An(A∪B∪C)
Lk (t | Ht ).
L(t | Ht
)=
k∈An(A∪B∪C)
Hence, the likelihood may be written as product over factors that only depend on cl(k) =
{k}∪ pa(k), k ∈ An(A∪B ∪C). Let C = {cl(k) | k ∈ An(A∪B ∪C)} be the set containing
all such sets. Then, we have
Y
An(A∪B∪C)
L(t | Ht
)=
Lc (t | Htc ).
c∈C
Further, the sets C can be taken to be the cliques of the graph (GAn(A∪B∪C) )m . The above
thus corresponds to the factorisation property of undirected graphs (Lauritzen, 1996, p.
35) and implies the conditional independence (11). A property similar to (11) has been
noted by Schweder (1970, Theorems 3 and 4) for Markov processes.
Example 3 ctd. GvHD.
Applying (11) to Figure 7, we can see that
relapse
Ft
chronic, death
⊥
⊥ Ftacute | Ft
∀ t ≤ T,
where T is the (stopping) time when relapse or death occur.
With a similar argument we can reformulate (2) and (3): for any B ⊂ V we have
V \cl(B)
NB (t)⊥
⊥ Ft−
cl(B)
| Ft−
,
(12)
i.e. the presence of NB is independent of the past of NV \cl(B) given the past of Ncl(B) .
18
3.4
Extensions
Local independence graphs can easily be extended to include time–fixed covariates, such
as sex, age and socioeconomic background of patients. The filtration at the start, F0 , then
has to be enlarged to include the information on these variables. They can be represented
by additional nodes in the graph with the restrictions that the subgraph on the non–
dynamic nodes must be a DAG and no directed edges are allowed to point from processes
to time–fixed covariates. A process being locally independent of a time–fixed covariate
then simply means that the intensity does not depend on this particular covariate given
all the other covariates and information on the past of all processes, and δ–separation can
still be applied to find further independencies. Similarly, these graphs can be combined
with chain graphs where the processes have to be nodes in the terminal chain component;
for an application of this latter idea see Gottard (2002).
The nodes in a local independence graph do not necessarily have to stand for only one
mark (or the associated counting process), marks can be combined into one node. This
might be of interest when there are logical dependencies. For example if a particular illness is considered then the events ‘falling ill’ and ‘recovering’ from this illness are trivially
locally dependent. If one node is used to represent a collection of marks corresponding to
a multivariate subprocess of the whole multivariate counting process then an arrow into
this node will mean that the intensity of at least one (but not necessarily all) of these
marks depends on the origin of the arrow. However, some interesting information could
be lost. For instance, if events such as ‘giving birth to 1st child’, ‘giving birth to 2nd
child’ etc. are considered, it might be relevant whether a woman is married or not when
considering the event of ‘giving birth to 1st child’ but it might not be relevant anymore
when considering ‘giving birth to 2nd child.’
A similar issue arises if it is found that certain events can systematically occur at the
same time. For instance, in some social contexts, ‘moving out of the parents house’ is
often equivalent to ‘getting married’. In this case we propose to regard the joined marks
as a new mark of its own and include it with an additional node in the graph. Again there
will be some logical dependencies but the possibly different histories leading to ‘moving
out of the parents house without getting married’ and ‘moving out of the parents house
and getting married’ could be read off the graph.
In many data situations the mark space is not finite, e.g. when measuring the magnitude
of electrical impulse, the amount of income in a new job or the dosage of a drug. One could
then discretise the mark space e.g. in ‘finding a well paid job’ and ‘finding a badly paid
job’. However, it must be suspected that too many of these types of events will generate
too many logical dependencies that are of no interest and will make the graphs crammed.
As we have seen in some of the examples it is sometimes sensible to consider stopped
processes in order to avoid having to represent logical and uninteresting dependencies.
More generally one might want to relax in Definition 2.2 the requirement ‘for all t ∈ T ’
and instead consider suitably defined intervals based on stopping times. E.g. it might
be the case that the independence structure is very different between the time of finishing education and starting the first job than before or after that. This deserves further
19
investigation.
4
Discussion and conclusions
The well–known benefits of the classical graphical models for the non-dynamic case lie
mainly in (a) replacing algebraic manipulation by graphical ones, and as a consequence
thereof (b) facilitation of structuring and communicating complex data situations, in particular when addressing causal inference, as well as (c) the various ways in which they
allow reduction of computational complexity, hence accelerating calculations, e.g. for expert systems or model search.
In the following we discuss the potential of local independence graphs in these regards.
Graphical manipulations. The main point of graphical models is that properties of
the graph correspond to properties of the statistical model, so that by ‘looking’ at the
graph and manipulating the graph (e.g. separation, moralisation) we can easily read off
information that would otherwise require some (usually intricate) algebra on the joint
distribution, or in case of processes, on the intensities or likelihood. With respect to local
independence graphs we refer especially to the global Markov property which allows to
assess local independencies within subprocesses, without need to even consider the specific intensities of these subprocesses in more detail. The same can be said for properties
(11) and (12). These results hold for very general counting processes. Once more specific
classes of processes, such as e.g. Markov processes or parametric models, are considered,
it can be expected that more properties of the local independence graphs might become
useful, but this requires further investigation.
Facilitation of structuring and communicating. We see the main benefit of the
graphs in situations where the local independence structure of the system is at least
partially known so that specific questions about the remaining structure can be addressed.
The equivalence in Theorem 3.4 means that in order to draw the graph we can focus on
every type of event individually and elicit which previous events are relevant, possibly
including the event censoring. Once the graph is constructed in this way it can be further
investigated what separations hold and e.g. what the effect of ignoring processes, that are
maybe difficult or impossible to observe (cf. Example 4), would be. Hence, δ–separation
allows to identify situations where even though some important processes or events cannot
directly be observed there is enough information in the observable events to address the
aims of the investigation.
Unobservable events are of particular interest when we aim at causal inference as they may
consitute confounders. Causal inference is of course a topic of its own and it is not the aim
of this paper to go into much detail in this respect, but for the following few comments.
Local independence graphs represent dependencies in Λk (dt) = E(Nk (dt)|Ft− ), where
conditioning is on having observed Ft− — hence these dependencies are associations and,
as we have seen in Example 4, they depend very much on whether we condition on Ft− or
different subsets (or even extensions) thereof. Causal inference is about predicting Nk (dt)
after intervening in Ft− , e.g. by fixing the times of certain events like in the home visits
example when modulating the visits. It is well known that conditioning on observation
is not the same as conditioning on intervention (Pearl, 2000; Lauritzen, 2000). Hence,
20
without making some further assumptions, the arrows in local independence graphs do
not necessarily represent causal dependencies — the intensity of an event being dependend
on whether another event has been observed before does not imply causation in the same
way as correlation does not imply causation. Such further assumptions could be that
all ‘relevant’ events (or processes) have been taken into account, like originally proposed
by Granger in order to justify the use of the term ‘causality’ for what is now known as
Granger-causality. E.g. if, in Figure 8(b), we are satisfied that by including ‘dementia’ all
relevant processes have been taken into account, then we could say that homevisits are
not directly causal but indirectly causal for ‘death’ (or survival) as there is a directed
path from ‘home visits’ to ‘death’. Obviously, in this particular example, there are many
other relevant processes, like other illnesses or e.g. death of the partner, that might be
relevant.
In general, ‘including all relevant’ processes is too vague a guideline to by useful, and
it is also not necessary. Instead, the literature on (non-dynamic) graphical models and
causality has identified different sufficient graphical criteria that allow causal inference
and it would be desirable to transfer these to the dynamic case. A similar framework
based on local independence graphs would require more prerequisites than we have given
in this paper and such criteria would need to take into account that the graphs can contain
cycles. Hence this is a topic for further research.
For non–graphical approaches to causal reasoning in a continuous time event history setting consider Eerola (1994), Lok (2001), Arjas and Parner (2004) and Fosen et al. (2004).
Reduction of computational complexity. One important step towards reducing computational effort is to reduce dimensionality. Many methods based on graphical models
are doing exactly this. They are based on factorisation, model or parametric collapsibility
and separation. Expert systems perform fast calculations by exploiting the local structure
of graphical models, decomposing it into its cliques. Model search can be accelerated,
e.g. by testing the presence or absence of an edge within the smallest relevant subset of
variables which can be determined by graphical criteria.
For local independence graphs, we have formulated a separation criterion and deduced a
factorisation of the likelihood, both of which can be exploited to reduce dimensionality.
For Markov processes, for example, Nodelman et al. (2003) show how the graphical structure can be exploited to find the graph when it is not known from background knowledge.
A further topic of research is the question of collapsibility. Graphical collapsibility means
that the local independence structure of a subprocess remains the same after ignoring the
other components of the MPP. Ideally, we would also want to remain in the same model
class, i.e. if the whole process is Markov we would like the subprocess to have the same
independence structure and also be Markov. A first attempt at tackling the question of
collapsibility of local independence graphs has been made in Didelez (2000, pp.102) but
further investigation is still required.
Finally, it should be said that local independence graphs are not a stand–alone method as
they only encode the presence or absence of a dependence. One would typically want to
further quantify and analyse the dependencies that are found. Also, they do not provide
any particular inference method, apart from simplifying the likelihood. Local independence graphs can be combined with non–, semi– or fully parametric methods but more
research is required to investigate how the graphical representation of the local indepen21
dence structure can simplify inference.
5
Appendix
First of all we prove that the trail condition of Proposition 2.5 is indeed equivalent to
Definition 2.4.
Then we discuss some further properties of δ–separation as well as local independence
which will finally be used to prove Theorem 3.4.
5.1
Proof of Proposition 2.5
m
We have to show A⊥
⊥g B|C in (GB
An(A∪B∪C) ) if and only if every allowed trail from A to
B in G is blocked by C.
We start with some preliminary remarks. Let π be an allowed trail from A to B in G.
Note first that the situation of (j, k) ∈ E and (k, j) ∈ E for j, k on π implies that
there are two different trails, one for each directed edge. Second, a trail always consists
of distinct vertices. From both statements it follows that cycles within a trail are not
m
possible. Further, we know that if there is a trail π m between A and B in (GB
An(A∪B∪C) )
then there exists an allowed trail π d from A to B in G on the same vertices or containing
additional ones which are in G common children of some of the vertices in π m . In reverse,
if there is an allowed trail π d from A to B in G then either there exists a trail π m in
m
(GB
An(A∪B∪C) ) which differs from the former in that it can omit some of the vertices on
π d because they are common children of some other vertices on π d (this is called a short
cut). Or the trail π d contains vertices that are not in An(A ∪ B ∪ C) from where it follows
that there exists a vertex γ on π d with γ 6∈ An(A ∪ B ∪ C) where the directed edges on
this trail meet head–to–head.
m
d
Let now A⊥
⊥g B|C in (GB
An(A∪B∪C) ) and π be an allowed trail in G from A to B. If
this trail contains vertices which are not in An(A ∪ B ∪ C) then it follows from the above
considerations that there exists a vertex γ on π d where the edges meet head–to–head and
which is not in C as well as all its descendants. Thus, this trail is blocked by C. Otherwise,
m
there exists a trail π m between A and B in (GB
An(A∪B∪C) ) with the same vertices or some
short–cuts which is intersected by C. In case that no short–cuts are possible, there are
no vertices in π d where edges meet head–to–head and therefore there has to be at least
one such vertex γ ∈ C. In the other case, it holds for all γ ∈ {pa(η)| η on π d and edges
meet head–to–head at η}, that no edges meet head–to–head in γ. Since π m is intersected
by C, it follows that at least one of the vertices in π d where no edges meet head–to–head
is element of C. Thus, this trail is also blocked by C.
Finally, let any allowed trail from A to B be blocked by C and consider a trail π m between
m
d
A and B in (GB
An(A∪B∪C) ) . For the corresponding allowed trail π it either holds that
at least one of its vertices where no edges meet head–to–head is an element of C. In this
case it has to be in π m , so that we have that the latter is intersected by C. Or there exists
a vertex γ on π d where edges meet head–to–head and γ as well as all its descendants are
not elements of C. Note that γ has to be an ancestor of A ∪ B ∪ C because otherwise its
parents would not be linked and no elements of π d . Thus, there is a directed path from γ
to A or to B (not to C by definition of γ). Since one of these directed paths would yield
an allowed trail from A to B without vertices where edges meet head–to–head it has to
22
be intersected by C which is again not possible because then some of the descendants of γ
would be elements of C. This argumentation applies to all vertices in π d where edges meet
head–to–head. Thus, any allowed trail from A to B in G which yields a path between A
m
and B in (GB
An(A∪B∪C) ) always includes at least one vertex which is an element of C and
where no edges meet head–to–head, and therefore any such path is intersected by C.
5.2
Asymmetric graphoids
As both, δ–separation and local independence, are new notions of graph separation and
independence, respectively, and both are asymmetric the properties known for the classic graph separation and for stochastic independence are not useful. We therefore develop a new framework generalising the notion of graphoid axioms (Pearl and Paz, 1987;
Pearl, 1988; Dawid, 1998) to the asymmetric case. These characterise ternary relations
(A ir B | C), which can be read as A is irrelevant for B given C. It is then shown which of
these asymmetric properties hold for δ–separation and local independence, respectively.
Some properties are only satisfied if we make some further assumptions as will be detailed.
The main aim is to use these properties in order to prove Theorem 3.4 in Section 5.3.
We start by generalising the graphoid axioms to the case of an asymmetric ternary relation
on a general lattice. A space V equipped with a semi–order ’≤’ is called a lattice if for
any elements A, B ∈ V there exists a least upper bound denoted by A ∨ B such that for
all C ∈ V with A ≤ C and B ≤ C we have (A ∨ B) ≤ C. The largest lower bound is
similarly given by A ∧ B. Due to the asymmetry we formulate a left and a right version
of each graphoid axiom as follows.
Definition 5.1 Asymmetric (semi)graphoid
Let (A ir B | C) be a ternary relation on a lattice (V, ≤). The following properties are
called the asymmetric semigraphoid properties:
Left redundancy: A ir B | A
Right redundancy: A ir B | B
Left decomposition: A ir B | C, D ≤ A ⇒ D ir B | C
Right decomposition: A ir B | C, D ≤ B ⇒ A ir D | C
Left weak union: A ir B | C, D ≤ A ⇒ A ir B | (C ∨ D)
Right weak union: A ir B | C, D ≤ B ⇒ A ir B | (C ∨ D)
Left contraction: A ir B | C and D ir B | (A ∨ C) ⇒ (A ∨ D) ir B | C
Right contraction: A ir B | C and A ir D | B ∨ C ⇒ A ir (B ∨ D) | C
If additionally the following properties hold we have an asymmetric graphoid:
Left intersection: A ir B | C and C ir B | A ⇒ (A ∨ C) ir B | (A ∧ C)
Right intersection: A ir B | C and A ir C | B ⇒ A ir (B ∨ C) | (B ∧ C).
23
Note that for the case that V is the power set of V and A, B, C ∈ V, and if left redundancy,
decomposition and contraction hold then
A ir B | C ⇔ A\C ir B | C.
(13)
To see this note that it follows directly from left decomposition that A ir B | C ⇒
A\C ir B | C. To show A\C ir B | C ⇒ A ir B | C, note that trivially A\C ir B |
C ∪ (C ∩ A). Additionally, it follows from left redundancy (i.e. C ir B | C) and left
decomposition that (C ∩ A) ir B | C. Left contraction now yields the desired result.
Similarly we can show that if right redundancy, decomposition, and contraction hold then
A ir B | C ⇔ A ir B\C | C.
(14)
Properties (13) and (14) permit to generalise results for disjoint sets to the case of overlapping sets. The following corollary, for instance, exploits this to reformulate the intersection
property in a more familiar way.
Corollary 5.2 Alternative intersection property
Let V be the power set of V and A, B, C ∈ V. Given a ternary relation on V that satisfies
all ‘left’ versions of Definition 5.1 (except weak union) then for pairwise disjoint sets
A, B, C, D ∈ V
A ir B | (C ∪ D) and C ir B | (A ∪ D) ⇒ (A ∪ C) ir B | D.
(15)
If all ‘right’ versions of Definition 5.1 (except weak union) are satisfied then
A ir B | (C ∪ D) and A ir C | (B ∪ D) ⇒ A ir (B ∪ C) | D.
(16)
Proof:
As the conditions for property (13) are given we have that A ir B | (C ∪ D) ⇔ (A ∪ C ∪
D) ir B | (C ∪D) from where it follows with left decomposition that (A∪D) ir B | (C ∪D).
With the same argument we get C ir B | (A∪D) ⇒ (C∪D) ir B | (A∪D). Left intersection
yields (A ∪ C ∪ D) ir B | D which is again equivalent to (A ∪ C) ir B | D because of (13).
Implication (16) can be shown analogously.
5.2.1
Graphoid properties satisfied by δ–separation
The interpretation of δ–separation as irrelevance relation should be that if C δ–separates
A from B in G then A is irrelevant for B given C. This is denoted by A irδ B | C. The
semi–order is given by the set inclusion ’⊂’, the join and meet operations by union and
intersection, respectively.
Proposition 5.3 Graphoid properties for δ–separation
Let G be a directed graph. The δ–separation satisfies the following graphoid axioms:
(i) left redundancy,
(ii) left decomposition,
(iii) left and right weak union,
24
(iv) left and right contraction,
(v) left and right intersection.
Proof:
(i) We have by definition for non–disjoint sets that A irδ B | A ⇔ A\(B ∪ A) irδ B | A\B
⇔ ∅ irδ B | A\B which is again by definition always true.
It is easily checked that for properties (ii), (iii), and left contraction we can assume that
all involved sets are pairwise disjoint without loss of generality.
m
m
(ii) It follows from A⊥
⊥g B | C in (GB
⊥g B | C in (GB
An(A∪B∪C) ) that D⊥
An(A∪B∪C) )
m
for D ⊂ A since decomposition holds for ⊥
⊥g . Further, (GB
An(B∪C∪D) ) is a subgraph of
m
m
(GB
⊥g B | C in (GB
An(A∪B∪C) ) or has even less edges. Thus, D⊥
An(B∪C∪D) ) .
m
⊥g B |
(iii) Left weak union can easily be shown: A⊥
⊥g B | C in (GB
An(A∪B∪C) ) implies A⊥
m
m
B
(C ∪ D) in (GB
An(A∪B∪C∪D) ) = (GAn(A∪B∪C) ) because ordinary weak union holds for
graph separation and because An(A ∪ B ∪ C ∪ D) =An(A ∪ B ∪ C) for D ⊂ A. Right
weak union is trivial since application of the definition for overlapping sets yields that
(C ∪ D)\B = C.
(iv) For left contraction we have to show that
m
A⊥
⊥g B | C in (GB
An(A∪B∪C) )
(17)
m
D⊥
⊥g B | (A ∪ C) in (GB
An(A∪B∪C∪D) )
(18)
and
m
imply (A ∪ D)⊥
⊥g B | C in (GB
An(A∪B∪C∪D) ) . Since contraction holds for ordinary graph
m
separation it is sufficient to show that A⊥
⊥g B | C in (GB
An(A∪B∪C∪D) ) . Thus, we have to
show that there is no path between A and B in the latter graph not intersected by C. Let
m
F be the additional vertices in (GB
An(A∪B∪C∪D) ) , i.e. F =An(D)\An(A∪B ∪C). If F = ∅
left contraction trivially holds. Otherwise we do not only have these additional vertices
but also the additional edges due to directed edges starting in F and due to marrying
parents of children in F . By definition of F we have that for all vertices f ∈ F there exists
a directed path in G from f to some d ∈ D which is not intersected by An(A ∪ B ∪ C).
m
From this it follows first of all that any path between B and F in (GB
An(A∪B∪C∪D) ) has
to be intersected by A ∪ C because otherwise (18) would be violated. Thus, there is no
new path between A and B through F not intersected by A ∪ C. Secondly, for all parents
k in GB of some f ∈ F there can be no path between k and B not intersected by A ∪ C
m
in (GB
An(A∪B∪C∪D) ) for the same reason. Thus, marrying parents of F does not yield a
new path from or through pa(F ) to B not intersected by A ∪ C. A path from A to B
not intersected by C would therefore have to be a direct one. This is in turn not possible
because of (17) and because A and B trivially have no common children in F (in GB ).
In order to show right contraction in full generality we have to apply the definition for
overlapping sets. Let A∗ = A\(B ∪ C) and C ∗ = C\B, then we have to show that from
m
A∗ ⊥
⊥g B | C ∗ in (GB
An(A∪B∪C) )
(19)
m
A∗ \D⊥
⊥g D | (B ∪ C ∗ )\D in (GD
An(A∪B∪C∪D) )
(20)
and
25
B∪D
it follows that A∗ \D⊥
⊥g (B ∪ D) | C ∗ \D in (GAn
)m . From (19) we get by
(A∪B∪C∪D)
m
left decomposition that A∗ \D⊥
⊥g B | C ∗ in (GB
An(A∪B∪C) ) . From this it follows by
similar arguments as applied in the proof for left contraction that A∗ \D⊥
⊥g B | C ∗ in
m
B∪C
m
B
(GAn(A∪B∪C∪D) ) . The same separation also holds in (GAn(A∪B∪C∪D) ) since the latter
graph results by deleting some edges of the former. Thus, we have A∗ \D⊥
⊥g B | C ∗ in
B∪D
B∪D
(GAn
)m . From (20) we get that A∗ \D⊥
⊥g D | (B ∪ C ∗ )\D in (GAn
)m
(A∪B∪C∪D)
(A∪B∪C∪D)
since the latter graph has again the same or less edges than the original one. With
à = A∗ \D we can show by the properties of ordinary graph separation (decomposition,
contraction, and intersection) that Ã⊥
⊥g B | C ∗ and Ã⊥
⊥g D | (B\D) ∪ (C ∗ \D) imply
Ã⊥
⊥g (B ∪ D) | C ∗ \D as desired. This can be seen as follows: With decomposition we get
Ã⊥
⊥g B\D | (C ∗ \D) ∪ (C ∗ ∩ D) and Ã⊥
⊥g (C ∗ ∩ D) | (C ∗ \D) ∪ (B\D). Intersection and
decomposition yield Ã⊥
⊥g B\D | C ∗ \D. This, together with Ã⊥
⊥g D | (B\D) ∪ (C ∗ \D)
and the contraction property, provides the desired result.
(v) We omit this proof as these properties are not actually required for the proof of Theorem 3.4 (cf. Didelez, 2000, pp. 69).
Note that with the above results both properties (13) and (15) hold for δ–separation.
Right redundancy does not hold and hence neither does (14) in general. However, we
have the following result.
Lemma 5.4 Special case of right decomposition for δ–separation
Given a directed graph G, it holds that:
A irδ B | C, D ⊂ B ⇒ A irδ D | (C ∪ B)\D
Proof:
Due to property (13) we can assume that A ∩ C = ∅. Let A∗ = A\B and C ∗ = C\B.
m
∗
Then, we have to show that A∗ ⊥
⊥g B | C ∗ in (GB
⊥g D | C ∗ ∪(B\D)
An(A∪B∪C) ) implies A ⊥
m
m
∗
⊥g D | C ∗ ∪ (B\D) in (GB
in (GD
An(A∪B∪C) ) holds due to
An(A∪B∪C) ) . Note that A ⊥
weak union and decomposition of ordinary graph separation. Changing the graph to
m
(GD
An(A∪B∪C) ) means that all edges that are present in G as directed edges starting
in B\D are added. Additionally, those edges have to be added which result from vertices
in B\D having common children with other vertices. Since all these new edges involve
m
vertices in B\D there can be no additional path between A∗ and D in (GD
An(A∪B∪C) )
not intersected by C ∗ ∪ (B\D).
Although (14) does not hold in full generality it is easily checked that (16) holds, we omit
the details.
5.2.2
Graphoid properties satisfied by local independence
An obvious way to translate local independence as irrelevance relation is by letting A ir B |
C stand for A →
/ B | C, A, B, C ⊂ V . The semi–order is again given by the set inclusion
’⊂’, the join and meet operations by union and intersection, respectively.
Proposition 5.5 Graphoid properties of local independence
The following properties hold for local independence as defined in Definition 2.2:
26
(i) left redundancy,
(ii) left decomposition,
(iii) left and right weak union,
(iv) left contraction,
(v) right intersection.
Proof:
(i) Left redundancy holds since obviously the FtA∪B –compensators of NB are FtA∪B –
measurable, i.e. if the past of NA is known, then the past of NA is of course irrelevant.
(ii) Left decomposition holds since the FtA∪B∪C –compensators Λk (t), k ∈ B, are FtB∪C –
measurable by assumption so that the same must hold for the FtB∪C∪D –compensators
Λk (t), k ∈ B, for D ⊂ A.
(iii) Left and right weak union also trivially hold since adding information on the past of
components that are already uninformative (left) or included (right) does not change the
compensator.
(iv) Left contraction holds since we have that the FtA∪B∪C∪D –compensators Λk , k ∈ B, are
by assumption FtA∪B∪C –measurable and these are again by assumption FtB∪C –measurable.
(Right contraction is not so easily seen and we omit the proof as it is not required for the
proof of Theorem 3.4.)
(v) The property of right intersection can be checked by noting that in the definition of
local independence the filtration w.r.t. which the intensity process should be measurable
is always generated at least by the process itself.
With the above Proposition (13) holds. In contrast, it is clear by the definition of local
independence that right redundancy does not hold because otherwise any process would
always be locally independent of any other process given its own past. It follows that the
equivalence (14) does not hold. However, it is always true that A →
/ B |C ⇒A→
/ B\C | C.
For symmetric irrelevance relations it is the property of intersection that makes a semigraphoid a graphoid. Its importance for the equivalence of pairwise, local and global
Markov properties in undirected conditional independence graphs is well–known (cf. Lauritzen, 1996). As shown later, it is of similar importance for local independence graphs
hence we now give a condition for left intersection to hold.
Proposition 5.6 Left intersection for local independence
Under the assumption of (6) local independence satisfies left intersection.
Proof:
Left intersection assumes that the FtA∪B∪C –compensators Λk (t), k ∈ B, are FtB∪C – as well
B∪(A∩C)
as FtA∪B –measurable. With (6) we get that they are Ft
–measurable which yields
the desired result.
The remaining property, right decomposition, requires special consideration because it
makes a statement about the irrelevance of a process NA after discarding part of the
possibly relevant information NB\D . If the irrelevance of NA is due to knowing the past
27
of NB\D then it will not necessarily be irrelevant anymore if the latter is discarded. Right
decomposition therefore only holds under specific restrictions on the relation between the
potentially irrelevant process and the relevant process to be discarded.
One particular such situation is given in the following proposition.
Proposition 5.7 Conditions for right decomposition of local independence
P
Consider a marked point process and assume that the cumulative counting process
Nk
is non–explosive, that ΛC admits an intensity and that there are only countably many
jumps ∆Λ. Let A, B, C, D ⊂ V with (B ∩ A)\(C ∪ D) = ∅. Right decomposition holds
under the conditions that
B→
/ A\(C ∪ D) | (C ∪ D)
(21)
A→
/ {k} | C ∪ B or B →
/ {k} | (C ∪ D ∪ A)
(22)
and
for all k ∈ C\D.
Proof:
In this proof we proceed somewhat informally for the sake of simplicity. The formal proof
is based on the results of Arjas et al. (1992) and given in Didelez (2000, pp. 72).
Redefine A∗ = A\(C ∪ D), B ∗ = B\D and C ∗ = C\D. Then the above assumptions
ensure that A∗ ∩ B ∗ = ∅ and in a local independence graph G on A ∪ B ∪ C ∪ D according
to Definition 3.1 we have that pa(B ∗ ) ⊆ C ∪ D, pa(A∗ ) ⊆ C ∪ D, A∗ has no children in
D and that A∗ and B ∗ have no common children in C ∗ . Hence, C ∪ D must separate A∗
and B ∗ in Gm . Thus, by (12) and (11), the assumptions of this proposition imply
ND (t)⊥
⊥ FtA− | FtB− ∪C
∗
∗
∗ ∪D
as well as
FtA ⊥
⊥ FtB | FtC∪D .
∗
∗
We want to show that the FtA∪C∪D –compensator Λ̃D (t) of ND (t) is FtC∪D –measurable.
With the above and interpretation (1) we have
Λ̃D (dt) =
=
=
=
) = E(ND (dt) | FtA− ∪C
E(ND (dt) | FtA∪C∪D
−
∗
∗
∗
∗
∗
E(E(ND (dt) | FtA− ∪B ∪C ∪D ) | FtA− ∪C ∪D )
∗
∗
∗
∗
E(E(ND (dt) | FtB− ∪C ∪D ) | FtA− ∪C ∪D )
∗
),
E(ND (dt) | FtC− ∪D ) = E(ND (dt) | FtC∪D
−
∗
∗ ∪D
)
as desired.
5.3
Proof of Theorem 3.4
It is easily checked that (7) ⇒ (5) ⇒ (4): First, pa(k) always δ–separates V \(pa(k) ∪ {k})
from {k} in G, hence (5) is just a special case of (7). Second, (5) ⇒ (4) holds by left
weak union and left decomposition of local independence. Also, it is easy to see that the
equivalence of the pairwise and local dynamic Markov properties immediately follows from
28
left intersection assuming (6), left weak union and left decomposition. Thus, the following
proof considers situations where A, B, C do not form a partition of V or pa(B) ⊂
/ C. The
structure of the proof corresponds to the one given by Lauritzen (1996, p. 34) for the
equivalence of the Markov properties in undirected conditional independence graphs. Due
to the asymmetry of local independence, however, this version is more involved.
Assume that (4) holds and that C δ–separates A from B in the local independence graph.
We have to show that A →
/ B | C, i.e. the FtA∪B∪C –compensators Λk (t), k ∈ B, are
FtB∪C –measurable. The proof is via backward induction on the number |C| of vertices in
the separating set. If |C| = |V | − 2 then both, A and B consist of only one element and
(7) trivially holds. If |C| < |V | − 2 then either A or B consist of more than one element.
Let us first consider the case that A, B, C is a partition of V and none of them is empty.
If | A |> 1 let α ∈ A. Then, by left weak union and left decomposition of δ–separation we
have that C ∪ (A\{α}) δ–separates {α} from B, i.e.
{α} irδ B | C ∪ (A\{α})
and C ∪ {α} δ–separates A\{α} from B in G, i.e.
A\{α} irδ B | (C ∪ {α}).
Therefore, we have by the induction hypothesis that
{α} →
/ B | C ∪ (A\{α}) and A\{α} →
/ B | (C ∪ {α}).
From this it follows with the modified version of left intersection as given in (15) (which
can be applied because of the assumption that (6) holds) that A →
/ B | C as desired.
If | B |> 1 let {β} ∈ B. With Lemma 5.4 we have that C ∪ (B\{β}) δ–separates A from
{β}, i.e.
A irδ {β} | C ∪ (B\{β})
and C ∪ {β} δ–separates A from B\{β}, i.e.
A irδ B\{β} | (C ∪ {β}).
Therefore, we have again by the induction hypothesis that
A→
/ {β} | C ∪ (B\{β}) and A →
/ B\{β} | (C ∪ {β}).
(23)
The above immediately implies (by definition of local independence) that A →
/ B | C.
Note that this argumentation could not be applied to the first part of the proof because
A→
/ B | (C ∪ {α}) does not imply A →
/ B | C.
Let us now consider the case that A, B, C ⊂ V are disjoint but no partition of V . First,
we assume that they are a partition of An(A ∪ B ∪ C), i.e. that A ∪ B ∪ C is an ancestral
set. Let γ ∈ V \(A ∪ B ∪ C), i.e. γ is no ancestor of A ∪ B ∪ C. Thus, every allowed trail
(cf. Proposition 2.5) from γ to B is blocked by A ∪ C since any such trail includes an edge
(k, b) for some b ∈ B where no edges meet head–to–head in k and k ∈ A ∪ C. Therefore,
we get
{γ} irδ B | (A ∪ C).
Application of left contraction, weak union, and decomposition for δ–separation yields
A irδ B | (C ∪ {γ}).
29
It follows with the induction hypothesis that
A→
/ B | (C ∪ {γ}) as well as {γ} →
/ B | (A ∪ C).
With left intersection as given by (15) in Corollary 5.2 and left decomposition for local
independence we get the desired result.
Finally, let A, B, C be disjoint subsets of V and A ∪ B ∪ C not necessarily an ancestral
set. Choose γ ∈ an(A ∪ B ∪ C) and define G̃B = GB
⊥g B | C in (G̃B )m
An(A∪B∪C) . Since A⊥
we know from the properties of ordinary graph separation that
(i) either {γ}⊥
⊥g B | (A ∪ C) in (G̃B )m
(ii) or A⊥
⊥g {γ} | (B ∪ C) in (G̃B )m .
(i) In the first case {γ} irδ B | (A ∪ C) and it follows from left contraction that
(A ∪ {γ}) irδ B | C.
Application of left weak union and left decomposition yields A irδ B | (C ∪ {γ}). With the
induction hypothesis we therefore get
A→
/ B | (C ∪ {γ}) and {γ} →
/ B | (A ∪ C).
Left intersection according to (15) and left decomposition for local independence yields
A→
/ B | C.
(ii) The second case is the most complicated and the proof makes now use of right decomposition for local independence under the conditions given in Proposition 5.7. First,
we have from (ii) that A⊥
⊥ G {γ} | B ∪ C in (GAn(A∪B∪C) )m since the additional edges
starting in B can only yield additional paths between A and γ that must be intersected
by B. Since deleting further edges out of γ does not create new paths, it holds
A irδ {γ} | (B ∪ C).
With A irδ B | C, application of right contraction for δ–separation yields A irδ (B ∪ {γ}) |
C. Now, we can apply Lemma 5.4 to get A irδ B | (C ∪ {γ}) from where it follows with
the induction hypothesis that
A→
/ B | (C ∪ {γ}) and A →
/ {γ} | (B ∪ C).
With right intersection for local independence we get A →
/ (B ∪ {γ}) | C. In addition,
{γ} →
/ A | (B∪C) by the same arguments as given above for A →
/ {γ} | (B∪C). In order to
apply Proposition 5.7 we still have to show that for all k ∈ C either A irδ {k} | (C∪B∪{γ})
or {γ} irδ {k} | (C ∪ B ∪ A) which by the induction hypothesis implies the corresponding
local independencies. To see this, assume that there exists a vertice k ∈ C for which
neither holds. With the trail condition we then have that in GAn(A∪B∪C) there exists an
allowed trail from A and γ, respectively, to k such that every vertex where edges do not
meet head–to–head are not in (C ∪ B ∪ {γ})\{k} and (C ∪ B ∪ A)\{k}, respectively,
and every vertex where edges meet head–to–head or some of their descendants are in
(C ∪ B ∪ {γ})\{k} respective (C ∪ B ∪ A)\{k}. This would yield a path between A and
γ which is not blocked by C ∪ B (note that k is a head–to–head node on this trail) in
GAn(A∪B∪C) . This in turn contradicts the separation of A and γ by B ∪ C in GB
An(A∪B∪C)
30
because the edges starting in B cannot contribute to this trail. Consequently we can apply
right decomposition and get the desired result.
Acknowledgements:
I wish to thank Iris Pigeot for her persistent encouragement and advice, and Niels Keiding for valuable comments and suggestions. Financial support by the German Research
Foundation (SFB 386) is also gratefully acknowledged.
References
Aalen, O. O. (1987). Dynamic Modelling and Causality. Scandinavian Actuarial Journal , 177 – 190.
Aalen, O. O., Ø. Borgan, N. Keiding, and J. Thormann (1980). Interaction between
Life History Events: Nonparametric Analysis of Prospective and Retrospective Data
in the Presence of Censoring. Scandinavian Journal of Statistics 7, 161 – 171.
Allard, D., Brix A. and J. Chadoeuf (2001). Testing local independence between two
point processes. Biometrics 57 508 – 517.
Andersen, P. K., Ø. Borgan, R. D. Gill, and N. Keiding (1993). Statistical Models Based
on Counting Processes. Springer, New York.
Arjas, E. (1989). Survival Models and Martingale Dynamics (with Discussion). Scandinavian Journal of Statistics 16, 177 – 225.
Arjas, E., P. Haara, and I. Norros (1992). Filtering the Histories of a Partially Observed
Marked Point Process. Stochastic Processes and Applications 40, 225 – 250.
Arjas, E., and J. Parner (2004). Causal Reasoning from Longitudinal Data. Scandinavian Journal of Statistics 31, 171 – 201.
Brémaud, P. (1981). Point Processes and Queues: Martingale Dynamics. Springer, New
York.
Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (1999). Probabilistic
Networks and Expert Systems. Springer, New York.
Cox, D. R. and V.S. Isham (1980). Point Processes. Chapman and Hall, London.
Cox, D. R. and N. Wermuth (1996). Multivariate Depencencies — Models, Analysis
and Interpretation. Chapman and Hall, London.
Dahlhaus, R. (2000). Graphical Interaction Models for Multivariate Time Series.
Metrika 51, 157 – 172.
Daley, D. J. and D. Vere-Jones (2003). An introduction to the theory of point processes.
Springer, London.
Dawid, A. P. (1979). Conditional Independence in Statistical Theory. Journal of the
Royal Statistical Society, Series B 41, 1 – 31.
Dawid, A. P. (1998). Conditional Independence. In S. Kotz, C. B. Read, and D. L.
Banks (Eds.), Encyclopedia of Statistical Sciences, Update Volume 2, pp. 146 – 155.
Wiley–Interscience, New York.
31
Dean, T. and K. Kanazawa (1989)). A model for reasoning about persistence and
causation. Comp. Intelligence 5, 142 – 150.
Didelez, V. (2000). Graphical models for event history data based on local independence.
Logos, Berlin.
Edwards, D. (2000). Introduction to Graphical Modelling (2nd ed.). Springer, New York.
Eerola, M. (1994). Probabilistic Causality in Longitudinal Studies, Volume 92 of Lecture
Notes in Statistics. Springer, Berlin.
Eichler, M. (1999). Graphical Models in Time Series Analysis. Ph. D. thesis, University
of Heidelberg.
Eichler, M. (2000). Causality Graphs for Stationary Time Series. Technical report,
University of Heidelberg.
Fleming, T. R. and D. P. Harrington (1991). Counting Processes and Survival Analysis.
Wiley, New York.
Florens, J. P. and D. Fougère (1996). Noncausality in Continuous Time. Econometrica 64, 1195 – 1212.
Fosen, J., Borgan, O., Weedon-Fekjoer, H. and O. O. Aalen (2004) Path analysis for
survival data with recurrent events. Technical Report, University of Oslo, Norway.
Gottard, A. (2002) Graphical duration models. Technical Report 2002/07, University
of Florence, Italy.
Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and
Cross–spectral Methods. Econometrica 37, 424 – 438.
Jordan, M. (1999). Learning in Graphical Models. MIT Press, Cambridge.
Keiding, N. (1999). Event history analysis and inference from observational epidemiology. Statistics in Medicine 18, 2353 – 2363.
Keiding, N., Klein, J. P. and M. M. Horowitz (2001). Multi-state models and outcome
prediction in bone marrow transplantation. Statistics in Medicine 20, 1871 – 1885.
Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.
Lauritzen, S. L. (2000). Causal Inference from Graphical Models. In O. E. BarndorffNielsen, D. R. Cox, and C. Klüppelberg (Eds.), Complex Stochastic Systems. Chapman and Hall, London.
Lok, J. J (2001). Statistical modelling of causal effects in time. Ph. D. thesis, Free
University of Amsterdam.
Nodelman, U., Shelton, C. R. and D. Koller (2002). Continuous Time Bayesian Networks. Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, 378 – 387. Morgan Kaufman, San Francisco.
Nodelman, U., Shelton, C. R. and D. Koller (2003). Learning Continuous Time Bayesian
Networks. Proceedings of the 19th Annual Conference on Uncertainty in Artificial
Intelligence, 451 – 458. Morgan Kaufman, San Francisco.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, San
Mateo.
32
Pearl, J. (2000). Causality. Models, Reasoning, and Inference. Cambridge University
Press, Cambridge.
Pearl, J. and A. Paz (1987). Graphoids: A Graph–Based Logic for Reasoning about
Relevancy Relations. In B. Du Boulay, D. Hogg, and L. Steel (Eds.), Advances in
Artificial Intelligence – II, pp. 357 – 363.
Robins J. (1986). A new approach to causal inference in mortality studies with sustained
exposure period — applications to control of the healthy workers survivor effect.
Mathematical Modelling 7, 1393 – 1512.
Robins J. (1987). Addendum to “A new approach to causal inference in mortality
studies with sustained exposure period — applications to control of the healthy
workers survivor effect”. Computers and Mathematics with Applications 14, 923 –
945.
Robins J. (1997). Causal inference from complex longitudinal data. In M. Berkane,
Latent Variable Modeling and Applications to Causality. Lecture Notes in Statistics
120. Springer, New York.
Schweder, T. (1970). Composable Markov Processes. Journal of Applied Probability 7,
400 – 410.
Spirtes, P., C. Glymour, and R. Scheines (2000). Causation, Prediction, and Search
(2nd ed.). MIT Press, Cambridge, Massachesetts.
Vass, M., Avlund, K., Hendriksen, C., Andersen C. K. and N. Keiding (2002). Preventive home visits to older people in Denmark: methodology of a randomized controlled
study. Aging Clin. Exp. Res 14, 509 – 515.
Vass, M., Avlund, K., Kvist, K., Hendriksen, C., Andersen C. K. and N. Keiding (2004).
Structured home visits to older people. Are they only of benefit for women? A
randomised controlled trial. Scand. J. Prim. Health Care 22, 106 – 111.
Verma, T. and J. Pearl (1990). Causal Networks: Semantics and Expressiveness. In
R. D. Shachter, T. S. Levitt, L. N. Kanal, and J. F. Lemmer (Eds.), Uncertainty
and Artificial Intelligence, pp. 69 – 76. North Holland, Amsterdam.
Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.
33
Download