Graphical Models for Marked Point Processes based on Local Independence VANESSA DIDELEZ University College London Abstract A new class of graphical models capturing the dependence structure of events that occur in time is proposed. It is based on so–called local dependence, instead of conditional dependence, as the former is better suited for representing dynamic dependencies. Local dependence is an asymmetric concept similar to Granger causality and thus leads to properties of the graphs that cannot be treated with classical graph theory. Consequently, a new graph separation is introduced and its implications for the underlying model are discussed. Besides this crucial difference to the classical graphical models there is a number of analogies, e.g. the factorisation of the likelihood. The benefits of local (in)dependence graphs are discussed with regard to the reasoning about and understanding of dynamic dependencies as well as to computational simplifications. Keywords: Event history analysis; Conditional independence; Counting processes; Grangercausality; Graphoid; Multistate models. 1 Introduction Graphical models are used to deal with complex data structures that arise in many different fields whenever the interrelationship of variables in a multivariate setting is investigated. The analysis of such data typically involves considerable computational effort and the multifaceted results require careful interpretation. The underlying idea of graphical models is to represent the conditional dependence structure of the variables by a graph, facilitating the interpretation and helping to reduce the computational effort by indicating e.g. how the structure of the whole system can be split into smaller sub-graphs corresponding to simpler and lower dimensional sub-models. Graphical models, investigated and refined over the last two decades, have proven to be a valuable tool for probabilistic modelling and multivariate data analysis in such different fields as expert systems and artificial intelligence (Cowell et al., 1999, Jordan, 1999), hierarchical Bayesian modelling (cf. BUGS project http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml), causal reasoning (Pearl, 2000; Spirtes et al., 2000), as well as sociological, bio-medical, and econometric applications. For overviews and many different applications see for instance the monographs by Whittaker (1990), Cox and Wermuth (1996), Edwards (2000). In contrast, the application of graphical models to time dependent data, such as event histories or time series is only slowly getting on its way. Dahlhaus (2000) has proposed a way of graphically modelling multivariate time series, but this approach does not capture Research Report No. 258, Department of Statistical Science, University College London. Date: October 2005. 1 the dynamics, i.e. the way how the present or future of the system depends or is affected by the past. In contrast, the idea of Eichler (1999, 2000) is, instead of representing conditional (in)dependence, to graphically model dependencies based on Granger–causality, thus capturing in a more intuitive way the dynamic structure of multivariate time series. As will be discussed later, Granger (non)-causality can be regarded as the discrete time version of local (in)dependence that we will use to investigate the structure of marked point processes. For continuous time processes, one approach, called Dynamic Bayesian networks, is to discretise time and provide directed acyclic graphs that encode the independence structure for the transitions from t to t + 1 (Dean and Kanazawa, 1989). The method proposed by Nodelman et al. (2002, 2003) comes closest to the kind of graphs that we are suggesting. In their Continuous Time Bayesian Networks they propose to represent a multistate Markov process with nodes corresponding to subprocesses and edges corresponding to dependencies of transition rates on states of other subprocesses. Our approach is more general in that we consider continuous time marked point processes and investigate the properties of the resulting graphs in more detail. Marked point processes are commonly used to model event history data. The term event history data or life history data seems to originate from sociology where it is often of interest to investigate the dynamics behind events such as finishing college, finding a job, getting married, starting a family, durations of unemployment, illness etc. Recognition of the importance of longitudinal studies for gaining insight into inherently dynamic systems leads to more and more surveys being carried out making event history data publicly available. But such data also occurs in medical contexts, e.g. in survival analysis with intermediate events such as the onset of a side effect, change of medication etc. The complex nature of such data does not need to be pointed out; for a discussion in particular of the problem of time dependent confounding see Keiding (1999). For a graphical representation of the dependencies in event history data we need to define a suitable notion of dynamic dependence. In this paper we use the so–called local independence, first proposed by Schweder (1970) for the case of Markov processes and applied e.g. in Aalen et al. (1980). It has been generalised to processes with a Doob–Meyer decomposition by Aalen (1987) who focusses on the bivariate case, i.e. local dependence between two processes. The analogy of the bivariate case to Granger-non-causality has been pointed out by Florens and Fougère (1996). Here we extend this approach to more than two processes namely the counting processes associated with a marked point processes. The basic idea of local independence is that the intensity of one type of event is independent of certain past events once we know about specific other past events. Note that the notion of ‘local independence’ used by Allard et al. (2001) is a different (symmetric) one. The outline of the paper is as follows. We first set out the necessary notations and assumptions for marked point processes in Section 2.1 followed by the formal definition of local independence in Section 2.2, the emphasis here being on the generalisation to a version that allows to condition on the past of other processes and hence describes dynamic dependencies for multivariate processes. Section 2.3 defines graphs that are appropriate to represent local (in)dependence. These graphs are different from the popular directed acyclic graphs (DAGs) in that they allow cycles. Also, a new notion of graph separation, called δ–separation, is introduced which is required because local independence is 2 asymmetric and hence δ–separation needs to be asymmetric. The main definitions and results are given in Section 3. Local independence graphs are defined and their properties investigated. In particular we prove that δ-separation can indeed be used to read local independencies off the graph, i.e. we show the equivalence of the dynamic versions of the pairwise local and global graphical Markov properties, paralleling classical graphical models. Also, in Section 3.3, the likelihood of a process with given local independence graph is investigated and it is shown how it factorises in a useful way. The potential of local independence graphs is discussed in Section 4. Apart from providing all the proofs, the appendix introduces asymmetric graphoids in order to investigate whether δ-separation and local independence satisfy the asymmetric versions of the graphoid axioms that are known to hold under certain assumptions for the classical graph sparation and conditional independence. This provides a deeper insight into the properties of local independence and is used to carry out the proof of the equivalence of the Markov properties. 2 Preliminaries We start by setting out the notation and giving an overview over the basic notions required. Marked point processes and their relation to counting processes are exposed in section 2.1 using the notation of Andersen et al. (1993). In section 2.2 the concept of local independence is explained in some detail. The necessary terminology for graphs is introduced in Section 2.3. 2.1 Marked point processes and counting processes Let E = {e1 , . . . , eK }, K < ∞, denote the (finite) mark space, i.e. the set containing all types of events of interest, and T the time space in which the observations take place. We assume that time is measured continuously so that we have T = [0, τ ] or T = [0, τ ) where τ < ∞. The marked point process (MPP) Y consists of events given by pairs of variables (Ts , Es ), s = 1, 2, . . ., on a probability space (Ω, F , P ) where Ts ∈ T , 0 < T1 < T2 . . ., are the times of occurrences of the respective types of events Es ∈ E. The mark specific counting processes Nk (t) associated with an MPP are given by Nk (t) = ∞ X 1{Ts ≤ t; Es = ek }, k = 1, . . . , K. s=1 For the applications we have in mind it is reasonable to assume that the cumulative countP ing process k Nk is non–explosive. We denote the internal filtration of a marked point process by Ft = σ{(Ts , Es ) | Ts ≤ t, Es ∈ E} which is equal to σ{(N1 (s), . . . , NK (s)) | s ≤ t}. Further, for A ⊂ {1, . . . , K}, we define FtA = σ{Na (s) | a ∈ A, s ≤ t} yielding the filtrations of subprocesses, in particular Ftk is the internal filtration of an individual counting process Nk . Under rather general assumptions (cf. Fleming and Harrington, 1991, p. 61), a Doob– Meyer decomposition of Nk (t) into a compensator and a martingale exists. Both these processes depend on the considered filtration which is here taken to be the internal R t filtration of the whole MPP Y so that the Ft –compensator of Nk is given as Λk (t) = 0 Λk (ds) 3 with the interpretation Λk (dt) = E(Nk (dt) | Ft− ). (1) The differences Nk − Λk are Ft –martingales. If the compensator is absolutely continuous we denote the corresponding intensity process by λk (t). It will be important to distinguish between the Ft –compensator Λk , based on the past of the whole MPP, and the FtA –compensators ΛA k , A ⊂ {1, . . . , K}, based on the past of the subprocess on marks in A, i.e. ΛA = E(N (dt) | Ft− ). It can be computed using the k k innovation theorem (Brémaud, 1981, pp. 83) A ΛA k (dt) = E(Λk (dt) | Ft− ) ∀ t ∈ T , and a particular way of doing so is given in Arjas et al. (1992). We assume that the multivariate counting process N = (N1 , . . . , NK ) has the property that no jumps occur at the same time. Formally, this means that the Ft –martingales Nk − Λk are orthogonal. It implies for the MPP that the marks E are distinct, i.e. they are defined such that they cannot occur systematically at the same time. 2.2 Local independence The bivariate case is defined as follows (cf. Aalen, 1987). Definition 2.1 Local independence (bivariate). Let Y be an MPP with E = {e1 , e2 } and N1 and N2 the associated counting processes on (Ω, F , P ). Then, N1 is said to be locally independent of N2 over T if Λ1 (t) is measurable w.r.t Ft1 for all t ∈ T . Otherwise we speak of local dependence. The process N1 being locally independent of N2 is symbolised by N2 → / N1 . Equivalently we will sometimes say that e1 is locally independent of e2 , or e2 → / e1 . The essence of the above definition is that the compensator Λ1 (t), i.e. our ‘short–term’ prediction of N1 , remains the same under the reduced filtration Ft1 as compared to the full one Ft . This implies that we do not lose any essential information by ignoring how often and when event e2 has occurred before t. One could say that if N2 → / N1 then the presence of N1 is conditionally independent of the past of N2 given the past of N1 , symbolising this by N1 (t)⊥ ⊥ Ft2− |Ft1− , (2) where A⊥ ⊥ B|C means ‘A is conditionally independent of B given C’ (cf. Dawid, 1979). For general processes, this is a stronger property than local independence but it holds for marked point processes as their distributions are determined by the compensators. In addition N1 (t)⊥ ⊥ N2 (t) will only hold if the two processes are mutually locally independent of each other — hence the name local independence. Note that the Ft1 –measurability of Λ1 (t) would trivially be true if e1 = e2 but, in such a case, we would not want to speak of independence of e1 and e2 . This is the reason for 4 excluding jumps at the same time. As can easily be checked, local (in)dependence needs be neither symmetric, reflexive nor transitive. However, since in most practical situations a subprocess depends at least on its own past we will assume that for all considered processes local dependence is reflexive (i.e. a process is always locally dependent on itself). An example for a subprocess that depends only on the history of a different subprocess and not on its own history is given in Cox and Isham (1980, p. 122). Example 1. Skin disease. In a study with women of a certain age, Aalen et al. (1980) consider two events: occurrence of a particular skin disease and onset of menopause. Their analysis reveals that the intensity for developing this skin disease is greater once the menopause has started than before. In contrast, and as one would expect, the intensity for onset of menopause does not change once the person has developed this skin disease. We can therefore say that menopause is locally independent of this skin disease but not vice versa. This particular analysis (Aalen et al., 1980) is based on a multistate model, with the states ‘0 = initial state’, ‘S = skin disease has occurred (but no menopause)’, ‘M = menopause has started (but no skin disease)’ and ‘M/S = both menopause and skin disease have started’. The assumptions about the transitions are that the transition rates α0→S , α0→M , αM →M/S , αS→M/S are non-zero but all others are zero. In particular α0→M/S = 0 reflects the assumption that skin disease and menopause do not start systematically at the same time, corresponding to the above ‘no jumps at the same time’ assumption. It is well known that multistate (Markov) models can be represented using counting processes for each type of transition. The local independence found in the study is then simply equivalent to α0→M = αS→M/S , while α0→S < αM →M/S . A more theoretical example for local independence of a bivariate Poisson process (without calling it like that) is given in Daley and Vere–Jones (2003, p. 250). Let us now turn to the case of more than two types of events. This requires conditioning on the past of other processes as follows. Definition 2.2 Local independence (multivariate). Let N = (N1 , . . . , NK ) be a multivariate counting processes associated with an MPP. Let further A, B, C be disjoint subsets of {1, . . . , K}. We then say that a subprocess NB is locally independent of NA given NC over T if all FtA∪B∪C –compensators Λk , k ∈ B, are measurable with respect to FtB∪C for all t ∈ T . This is denoted by NA → / NB | NC or briefly A → / B | C. Otherwise, NB is locally dependent on NA given NC , i.e. A → B | C. Conditioning on a subset C thus means that we retain the information about whether and when events with marks in C have occurred in the past when considering the intensities for marks in B. If these intensities are independent of the information on whether and when events with marks in A have occurred we have conditional local independence. In analogy to (2) the above definition of multivariate local independence is equivalent to NB (t)⊥ ⊥ FtA− |FtB∪C − 5 ∀t ∈ T . (3) Example 2. Graft versus host disease. Keiding et al. (2001) consider patients who have received bone marrow transplantation as a treatment for leukaemia, some of which develope graft versus host disease (GvHD), where the transplanted immune cells attack the host tissues. The events of interest are two types of GvHD, acute and chronic, as well as relapse and death, where the latter both are terminal events (once a relapse occurs the patient is taken out of the study). Patients can develope acute GvHD before it becomes chronic, or can develope chronic GvHD without going through the acute stage first. Using a multistate model, one of the findings was that the intensity for relapse after having only gone through chronic GvHD is the same as after having gone through both acute and chronic GvHD. In addition, the intensity for relapse after developing only acute GvHD was the same as when not having had any GvHD yet. We may therefore say that relapse is locally independent of acute GvHD given chronic GvHD, and given that the person is still alive of course. In other words, the intensity for relapse is fully explained by knowing whether or not the patient has developed chronic GvHD, knowledge of acute GvHD not being relevant. Note that if we ignore whether or not the patient has reached the chronic stage, relapse will not be (marginally) locally independent of acute GvHD because acute GvHD increases the intensity for chronic GvHD. This example illustrates that the bivariate Definition 2.1 captures a different property than the multivariate Definition 2.2 where we condition on the past of other processes just like conditional independence of variables neither implies nor follows from marginal independence. An additional finding of the study was that the event of death is not locally independent of any of the other events. Its intensity increases clearly when a patient has gone through both, acute and chronic GvHD, as compared to only acute or only chronic. In order to see the relation with local independence, we briefly review Granger non– causality (Granger, 1969). Let XV (t) = {XV (t) | t ∈ Z} with XV (t) = (X1 (t), . . . , XK (t)) be a multivariate time series, where V = {1, . . . , K} is the index set. For any A ⊂ V we define XA = {XA (t)} as the multivariate subprocess with components Xa , a ∈ A. Further let X A (t) = {XA (s) | s ≤ t}. Then, for disjoint subsets A, B ⊂ V we say that XA is strongly Granger-non causal for XB up to horizon h ∈ IN if XB (t + k)⊥ ⊥ X A (t) | X V \A (t), k = 1, . . . , h, for all t ∈ Z. The interpretation for h = 1 is similar as for local independence, i.e. the present value of XB is independent of the past of XA given its own past and the one of all other components C = V \(A ∪ B), in analogy to (3). Finally, in this section, we want to indicate how the definition of local independence can be generalised to stopped processes. This can be of interest in particular when there are absorbing states such as death. In that case all other events will be locally dependent on this one because all intensities are zero once death has occurred. However, the dependence is ‘trivial’ and not of much interest. Let T be a Ftk –stopping time, e.g. the first time an event of type ek occurs, and let NT = (N1T , . . . , NKT ) be a multivariate counting process stopped at T . Then the compensators of NjT are given by ΛTj , j ∈ V , and local independence can be formalised as follows. Definition 2.3 Local independence for stopped processes. Let NT = (N1T , . . . , NKT ) be a multivariate counting processes associated to an MPP 6 and stopped at time T . Then we say that A → / B | C if there exist FtB∪C –measurable predictable processes Λ̃k , k ∈ B, such that the FtA∪B∪C —compensators of NTB are given by ΛTk (t) = Λ̃k 1{t ≤ T }, k ∈ B. The local independencies in a stopped process have to be interpreted as being valid as long as t ≤ T . 2.3 Directed graphs and δ–separation An obvious way of representing the local independence structure of an MPP by a graph is to depict the marks as vertices and to use an arrow as symbol for local dependence. This leads to directed graphs that may have more than two directed edges between a pair of vertices in case of mutual local dependence, and that may have cycles. These graphs will be introduced in some detail below as they differ considerably from the usual directed acyclic graphs (DAGs) or undirected graphs (e.g. Cowell et al., 1999). Graph notation and terminology. A graph is a pair G = (V, E), where V = {1, . . . , K} is a finite set of vertices and E is a set edges. The graph is said to be undirected if E ⊂ E u (V ) = {{j, k}|j, k ∈ V, j 6= k}. It is called directed if E ⊂ E d (V ) = {(j, k)|j, k ∈ V, j 6= k}. Undirected edges {j, k} are depicted by lines, j — k, and directed edges (j, k) by arrows, −→ j −→ k. If (j, k) ∈ E and (k, j) ∈ E this is shown by stacked arrows, j ←− k. For A ⊂ V the induced subgraph GA of a directed or undirected graph G is defined as (A, EA ) with EA = E ∩ E d (A) or EA = E ∩ E u (A), respectively. In an undirected graph a subgraph GA is called complete if all nodes in A are linked by edges, i.e. {j, k} ∈ V for all j, k ∈ A, j 6= k. A maximal complete subgraph is called a clique of G. Consider a directed or undirected graph G = (V, E). An ordered (n + 1)–tuple (j0 , . . . , jn ) of distinct vertices is called an undirected path from j0 to jn if {ji−1 , ji } ∈ E and a directed path if (ji−1 , ji ) ∈ E for all i = 1, . . . , n. A (directed) path of length n with j0 = jn is called a (directed) cycle. A subgraph π = (V ′ , E ′ ) of G with V ′ = {j0 , . . . , jn } and E ′ = {e1 , . . . , en } ⊂ E is called a trail between j0 and jn if ei = (ji−1 , ji ) or ei = (ji , ji−1 ) or ei = {ji , ji−1 } for all i = 1, . . . , n. For directed graphs we further require the following almost self-explanatory notations. Let A ⊂ V . The parents of A are given by pa(A) = {k ∈ V \A|∃ j ∈ A : (k, j) ∈ E} whereas the children are ch(A) = {k ∈ V \A|∃ j ∈ A : (j, k) ∈ E}. The set cl(A) = pa(A) ∪ A is called the closure of A. The ancestors of A are defined as an(A) = {k ∈ V \A|∃ j ∈ A with a directed path from k to j} while the descendants are de(A) = {k ∈ V \A|∃ j ∈ A with a directed path from j to k}. Consequently, nd(A) = V \(de(A)∪A) are the non–descendants of A. If pa(A) = ∅, then A is called ancestral. For general A ⊂ V , An(A) is the smallest ancestral set containing A, given as A ∪ an(A). Example 3. Directed graph. Figure 1 is an example of a general directed graph, where V = {1, 2, 3, 4, 5}. It contains directed cycles, e.g. 2 → 1 → 2 or 2 → 5 → 1 → 2. A directed path from 5 to 2 is given 7 1 5 2 4 3 Figure 1: General directed graph via 1, while there are many trails between 5 and 2, e.g. 5 → 4 ← 3 → 2. The parents of 2, for instance, are pa(2) = {1, 3} while the children are ch(2) = {1, 5}, hence the node 1 is a parent and a child of 2 at the same time. The ancestors of 2 are an(2) = {1, 3, 5} while the descendants are de(2) = {1, 4, 5}. The only ancestral sets in this graph are given by {3}, {1, 2, 3, 5} and, trivially, V itself. Graph separation. In undirected graphs we say that subsets A, B ⊂ V are separated by C ⊂ V if any path from elements in A to elements in B is intersected by C. This is symbolised by A⊥ ⊥g B|C. For directed (not necessarily acyclic) graphs as introduced above we define a similar type of separation that can be based on the moral graph. The undirected version of a directed graph G is defined as G∼ = (V, E ∼ ) with E ∼ = {{j, k}| (j, k) ∈ E or (k, j) ∈ E}, i.e. G∼ is the undirected graph replacing all arrows by undirected edges. The moral graph Gm = (V, E m ) is given by inserting undirected edges between any two vertices that have a common child (if they are not already joined anyway) and then taking the undirected version of the resulting graph, i.e. E m = {{j, k}|∃ i ∈ V : j, k ∈ paG (i)} ∪ E ∼ . Now, we can define a separation for directed graphs which will allow to read properties of the local independence structure off these graphs as will be addressed in Sections 3.1 and 3.2. Definition 2.4 δ–Separation. Let G = (V, E) be a directed graph. For B ⊂ V , let GB denote the graph obtained by deleting all directed edges of G starting in B, i.e. GB = (V, E B ) with E B = E\{(j, k)|j ∈ B, k ∈ V }. Then, we say for pairwise disjoint subsets A, B, C ⊂ V that C δ–separates A from B m in G if A⊥ ⊥g B|C in the undirected graph (GB An(A∪B∪C) ) , i.e. the moral subgraph on An(A ∪ B ∪ C), where the directed edges starting in B have previously been deleted. In case that A, B and C are not disjoint we define that C δ–separates A from B if C\B δ–separates A\(B ∪ C) from B. We define that the empty set is always separated from B. Additionally, we define that the empty set δ–separates A from B if A and B are unconm nected in (GB An(A∪B) ) . 8 1 5 2 4 3 Figure 2: Moral graph to check if {3, 5} δ–separates {1, 2} from {4}. Note that except for the fact that we delete edges starting in B, δ-separation parallels the one for DAGs. However, it is the initial edge deletion which makes δ–separation asymmetric. Asymmetry is required because we aim at modelling an asymmetric concept of dependence, local dependence. Further properties of δ–separation are discussed in Appendix 5.2. 1 5 2 4 3 Figure 3: Moral graph to check which sets are δ–separated from {5}. Example 3 ctd. Directed graph. For the graph in Figure 1 it seems that node 4 is only ‘directly influenced’ by 3 and 5. So let us check if {3, 5} δ–separates {1, 2} from {4}. No edges out of 4 need to be removed as there are none, but one edge between 3 and 5 has to be added due to their common child 4. This results in the graph in Figure 2 where the extra edge is indicated as dotted line and a box indicates that arrows out of node 4 have to be deleted. We can see that indeed the required separation holds. We might further ask which sets are δ–separated from 5. This can be read off the graph in Figure 3. The two edges out of 5 have been deleted and no edge has to be added as any vertices with common children are already joint. Hence we can see that {1, 2} δ–separate 3 as well as 4 from 5, but also that 3 alone separates 4 from 5. Finally consider a situation where δ–separation does not hold. Any directed path from 5 to 2 goes via 1, one might therefore think that {1} δ–separates {5} from {2}. The relevant 9 moral graph is given in Figure 4. Note that the node 4 has been deleted as it is not in the minimal ancestral set of {1, 2, 5}. An edge between 3 and 5 has to be added due to their common child node 1. This latter edge however creates a path from 5 to 2 which is not intersected by 1 and hence the desired δ–separation does not hold. 1 5 2 3 Figure 4: Moral graph to check if {1} δ–separates {5} from {2}. An equivalent way of reading off δ–separation is formulated below using the notion of blocked trails, similar to d–separation for DAGs (Pearl, 1988; Verma and Pearl, 1990). For a directed graph we say that a trail between j and k is blocked by C if it contains a vertice γ such that either (i) directed edges of the trail do not meet head–to–head at γ and γ ∈ C, or (ii) directed edges of the trail meet head–to–head at γ and γ as well as all its descendants are no elements of C. Otherwise the trail is called active. Proposition 2.5 Trail condition for δ–separation Let G = (V, E) be a directed graph and A, B, C pairwise disjoint subsets of V . Define that any allowed trail from A to B contains no edge of the form (b, k), b ∈ B, k ∈ V \B. For disjoint subsets A, B, C of V , we have that C δ–separates A from B if and only if all allowed trails from A to B are blocked by C. The proof is given in Appendix 5.1. Example 3 ctd. Directed graph. All allowed trails from 3 to 5 are 3 → 1 → 5, 3 → 1 → 2 → 5, 3 → 1 ← 2 → 5, 3 → 2 → 5, 3 → 2 → 1 → 5, 3 → 2 ← 1 → 5. In each of these, arrows do not meet head–to–head at either {1} or {2}, hence {1, 2} blocks every allowed path from 3 to 5. Checking all allowed trails in this way seems rather cumbersome, so we will henceforth mainly use the moralisation criterion according to Definition 2.4. 10 3 Local independence graphs Having introduced the idea of local independence and suitable graphs we can now bring both together and present the main definitions and results of this paper. In Section 3.1 we define local independence graphs and in Section 3.2 it is explored what properties can be read of such graphs. In analogy to conditional independence graphs we consider pairwise, local, and global Markov properties. It is then shown in Section 3.3 that local independence graphs induce a specific factorisation of the distribution which is again similar to the one known for conditional independence graphs, in particular DAGs. Finally, some possible extensions of local independence graphs are indicated in Section 3.4. 3.1 Definition of local independence graphs The following defining property (4) is called the pairwise dynamic Markov property, where we say dynamic to emphasise the difference to graphs based on conditional independence. Definition 3.1 Local independence graph Let NV = (N1 , . . . , NK ) be a multivariate counting process associated to an MPP Y with mark space E = {e1 , . . . , eK }. Let further G = (V, E) be a directed graph, V = {1, . . . , K}. Then, G is called the local independence graph of Y if for all j, k ∈ V (j, k) ∈ /E ⇔ {j} → / {k}|V \{j, k}. (4) Example 1 ctd. Skin disease. The local independence graph for the relation between menopause and skin disease is very simple, cf. Figure 5. The absence of an arrow from skin disease to menopause indicates that the intensity for start of menopause does not change once the skin disease has occurred. Nothing more can be read off such a graph when only two events are being considered. Like for classical graphical models the interesting case is when there are more than two nodes and we can investigate separations. However, notice that even with this simple example there is no way of expressing the local independence using a graph based on conditional independence for the two times T1 = ‘time of occurrence of skin disease’ and T2 = ‘time of occurrence of menopause’ as these are simply dependent. skin disease menopause Figure 5: Local independence graph for skin disease example. Example 2 ctd. Graft versus host disease. The relation between acute and chronic GvHD on the one hand and relapse and death on the other hand is depicted in Figure 6. Figure 6(a) contains dotted arrows to indicate that, because relapse and death are terminal events, the intensities for all other events logically change (to zero) once either death or relapse have occurred. Figure 6(b) shows the graph for the stopped process, stopping at the time when death or relapse occur, whichever is earlier. As mentioned before, when interpreting this graph one has to remember that any 11 Acute GvHD Relapse Chronic GvHD Death Acute GvHD Relapse Chronic GvHD (a) Death (b) Figure 6: Local independence graph for GvHD example. (a) shows the original graph, (b) shows the graph for the process stopped at the time when relapse or death occurs. properties read off the graph are only valid as long as the patient is still alive and has not yet relapsed. The absence of a directed edge from acute GvHD to relapse, in both graphs of Figure 6, reflects the local independence identified earlier, i.e. given we know whether or not the patient has reached the chronic stage (as long as he or she is still alive) the intensity for relapse does not change when knowing whether the patient has gone through the acute stage. The directed edges present in the graph indicate the following. Firstly, acute GvHD increases the intensity for chronic GvHD so that there must be an arrow from acute to chronic GvHD. Secondly, it is impossible to go from the chronic back to the acute stage so the intensity for the latter is then zero and hence we need an arrow from chronic to acute GvHD. The intensity for the event of death is different for each of the four possible previous histories: no GvHD, only acute, only chronic GvHD or both versions. Therefore there are directed edges from both GvHD nodes. 3.2 Dynamic Markov properties In order to investigate what else can be read off local independence graphs we next present two further, slightly stronger properties than the pairwise dynamic Markov property. Definition 3.2 Local dynamic Markov property Let G = (V, E) be a directed graph. For an MPP Y the property ∀k∈V : V \cl(k) → / {k} | pa(k), (5) is called the local dynamic Markov property w.r.t. G. The above property (5) could for instance be violated if two components in pa(k) are a.s. identical. Let for instance N1 = N2 a.s. Then, it holds for any N3 that N1 → / N3 | N2 as well as N2 → / N3 | N1 but not necessarily (N1 , N2 ) → / N3 as would be claimed by property (5). 12 However, N1 = N2 a.s. is prevented by the orthogonality assumption because for a.s. sure identical counting processes the martingales resulting from a Doob–Meyer decomposition are not orthogonal (Andersen et al., 1993, pp. 73). As shown in Appendix 5.3, the exact condition for the property (5) to follow from (4) is that FtA ∩ FtB = FtA∩B ∀ A, B ⊂ V, ∀t ∈ T , (6) where we define F ∅ = {∅, Ω}. Property (6) formalises the intuitive condition that the components of N are different enough to ensure that common events are necessarily due to common components. An even stronger dynamic Markov property is the global dynamic Markov property which links δ–separation and local independence. Definition 3.3 Global dynamic Markov property Let NV = (N1 , . . . , NK ) be a multivariate counting process associated to an MPP Y and G = (V, E) a directed graph. For disjoint subsets A, B, C ⊂ V the property C δ–separates A from B in G ⇒ A→ / B | C. (7) is called the global dynamic Markov property w.r.t. G. The significance of the global Markov property is that it provides a way to verify whether a subset C ⊂ V \(A ∪ B) is given such that A → / B|C, i.e. local independence is preserved even when ignoring the information in the events V \(A ∪ B ∪ C). This would imply that FtB∪C is sufficient to assess the local independence NA → / NB |NC . Clearly, this property is useful to reduce complexity in a multivariate setting. All prerequisites for the main result in this section have now been gathered. Theorem 3.4 Equivalence of dynamic Markov properties Let Y be a marked point process with local independence graph G = (V, E). Under the assumption of (6) and further regularity conditions (cf. Appendix 5.3), the pairwise, local and global dynamic Markov properties are equivalent. The proof is given in Appendix 5.3. Example 2 ctd. Graft versus host disease. Given what we have said before, we should find that the nodes C = {‘chronic’, ‘death’} δ–separate A =‘acute’ from B =‘relapse’ in the local independence graph, Figure 6(a). In Figure 7(a) we remove the arrows out of ‘relapse’ and then, in Figure 7(b), we obtain the undirected version (note that no edges need to be added as all nodes with common children are already joint). In Figure 7(b) we can easily verify that ‘acute’ and ‘relapse’ are separated by the other two nodes, but ‘acute’ and ‘relapse’ are not separated if we do not condition on ‘chronic’. If the same graphical check is performed on the graph in Figure 6(b), it looks as if ‘chronic GvHD’ is the only separating node, but it has to be kept in mind that this graph is only valid as long as the patient is alive. 13 Acute GvHD Relapse Chronic GvHD Death Acute GvHD Relapse Chronic GvHD (a) Death (b) Figure 7: Checking for δ–separation by (a) removing arrows out of ‘relapse’ and verifying separations in the undirected version (b). The main point of Theorem 3.4 is that it allows to check if local independencies are preserved when some components of the multivariate process are ignored as illustrated in the following example. Example 4. Home visits. This example is not taken from the literature but similar studies have been carried out (e.g. Vass et al., 2002, 2004). In some countries programmes exist to assist the elderly through regular home visits by a nurse. This is meant to reduce unnecessary hospitalisations while increasing the quality of life for the person allowing them to stay in a familiar environment and giving them the feeling to be in control. It is hoped that this also increases the survival time. Assume that data is available only on the number and times of home visits, hospitalisation and the time of death. In this case one might find that the home visits have a negative effect on survival. This is because typically the nurses will increase their visits if the patient’s health is deteriorating which might be a precursor of death. An indicator for deteriorating health could e.g. be increasing dementia. The graph in Figure 8(a) shows the local independence graph for the three observed processes, again omitting the trivial arrows out of the node ‘death’. However, the true underlying structure might be as depicted in Figure 8(b), where once information on the dementia process is included (in addition to hospitalisation) information about past home visits does not change the intensity for death. It can be seen from the graph that analysing the effect of home visits on survival including information on hospitalisation is not enough because the node ‘hospitalisation’ does not separate ‘home visits’ from ‘death’. One can regard ‘dementia’ as an unobserved confounder creating an association between ‘home visits’ and ‘death’. More surprisingly, ‘hospitalisation’ would still not be a separating set even if the home visits were modulated, e.g. to take place exactly once a week like in a controlled study. They would then be locally independent of any other process as they would be controlled externally (this is called an exogenous process in the econometrics literature). But as can 14 Home visits Hospitalisation Death (a) Home visits Hospitalisation Death Dementia (b) Figure 8: Local independence graph for home visits example; (a) as one might expect for the three observed processes if (b) is the underlying true structure. be seen from Figure 9, especially from the moralised graph in part (b), ‘hospitalisation’ still does not separate ‘home visits’ from ‘death’ because both have an arrow into ‘hospitalisation’ which creates the moral edge between these two nodes. For example, if a patient is hospitalised due to a home visit then it is implied that this was less likely due to dementia; hence a history of hospitalisation with preceding home visit predicts survival differently from a hospitalisation without preceding homevisit. Notice that for the discrete time case it is known (Robins, 1986, 1987, 1997) that in a situation like Figure 8(b) we cannot draw any conclusions about the causal effect of home visits on survival. In contrast, from Figure 9 this is possible even when no information on dementia is available. However, standard methods that just model the intensity for survival with time varying covariates for the times of previous home visits and hospitalisations will typically give misleading results due to the conditional association between ‘home visits’ and ‘death’ given ‘hospitalisation’. 3.3 Likelihood factorisation In order to discuss properties and implications of the likelihood for graphical MPPs we will regard the data consisting of times and types of events (t1 , e1 ), (t2 , e2 ), . . . , (tn , en ) as a realisation of the history process Ht = {(Ts , Es )|Ts ≤ t}. As for filtrations, Ht− denotes the strict pre–t history process. Additionally, HtA , A ⊂ {1, . . . , K}, defined as HtA = {(Ts , Es ) | Ts ≤ t and ∃ k ∈ A : Es = ek , s = 1, 2 . . .} denotes the history process restricted to the marks in A. Any set of marked points for which it holds that ts = tu , s 6= u, implies es = eu can be a history, i.e. a realization of Ht . Note that (up to completion by null sets) the different filtrations can be regarded as being generated by the history processes, i.e. FtA = σ{HtA }, A ⊂ {1, . . . , K}. 15 Home visits Hospitalisation Home visits Hospitalisation Death Dementia Death Dementia (a) (b) Figure 9: Home visits example assuming modulated home visits; (a) shows that ‘home visits’ is locally independent of any other process as it is controlled externally; (b) shows the moral graph. Before deriving the likelihood for a given local independence graph, we recall the likelihood for the general case. Based on the mark specific compensators Λk (t) and intensity process λk (t) the corresponding crude quantities are given by Λ(t) = K X Λk (t) and λ(t) = K X λk (t). k=1 k=1 These are the compensator and intensity process of the cumulative counting process P N . part is denoted by k In case that Λ is not absolutely continuous the continuous k P C − Λ and the jumps by ∆Λ, i.e. ∆Λ(t) = Λ(t) − Λ(t ) and Λ(t) = s≤t ∆Λ(s) + ΛC (t). The jumps ∆Nk (t) of the counting processes are defined analogously. We assume ΛC possesses an intensity. Also note that by the assumptions made earlier there can only be countably many jumps ∆Λ. The likelihood process L(t|Ht ) for a marked point process is then given as follows (cf. Arjas, 1989) Y Y (1 − ∆Λ(s))1−∆N (s) · exp(−ΛC (t)). L(t|Ht ) = ΛEs (dTs ) · Ts ≤t s≤t In the absolutely continuous case this simplifies to Z t Y L(t|Ht ) = λEs (Ts ) · exp − λ(s) ds . (8) 0 Ts ≤t As noted by Arjas (1989), the likelihood for MPPs has already a product form over time, i.e. it is constructed as the product of the likelihood for an occurrence or non–occurrence of an event given the past history. 16 If G is the local independence graph of Y we can rewrite (8) as follows: L(t|Ht ) = K Y Y λk (Ts )1{Es =ek } · exp − = k=1 Y Ts(k) ≤t λk (s) ds 0 k=1 k=1 Ts ≤t K Y Z tX K ! Z t λk (Ts(k) ) · exp − λk (s) ds , 0 where Ts(k) with Es(k) = ek are the occurrence times of mark ek . The inner product of the above can be regarded as the mark specific likelihood and is denoted by Lk (t|Ht ). By the definition of a local independence graph and the equivalence of the pairwise and local cl(k) dynamic Markov properties under condition (6) we have that λk (s) is Ft –measurable, where cl(k) is the closure of node k. It follows that cl(k) Lk (t|Ht ) = Lk (t|Ht ), (9) i.e. the mark specific likelihood Lk based on the whole past remains the same if the available information is restricted to how often and when those marks that are parents of cl(k) ek in the graph and ek itself have occurred in the past, symbolised by Ht . If the compensator is not absolutely continuous we still have that K Y Y Y 1−∆N (s) (1 − ∆Λ(s)) = (1 − ∆Λk (s))1−∆Nk (s) . s≤t k=1 s≤t To see this, recall that ∆Λk (t) = E(∆Nk (t) | Ft− ) = P (∆Nk (t) = 1|Ft− ) so that a discontinuity ∆Λk (t) 6= 0 implies that there is a positive probability for event ek to occur at time t. Since we make the assumption that no two events occur at the same time we have that for any t ∈ T where ∆Λ(t) 6= 0 there exists an ek ∈ E such that ∆Λ(t) = ∆Λk (t). It follows that under (6) the likelihood factorises as Y cl(k) Lk (t|Ht ), (10) L(t|Ht ) = k∈V where cl(k) Lk (t|Ht )= Y Ts(k) ≤t Λk (dTs(k) ) · Y (1 − ∆Λk (s))1−∆Nk (s) · exp(−ΛC k (t)). s≤t This parallels the factorisation for DAGs where the joint density is decomposed into the univariate conditional distributions given the parents. Here, we replace the parents by the closure because we also need to condition on the past of a component itself which is not required in the static case. Example 5. Independent censoring. The above reasoning is actually quite well known in the context of independent censoring. In a simple survival analysis with possible right censoring there are two events: death and censoring. The latter being independent means that the intensity for survival can be 17 assumed to be the same regardless of whether we know that censoring has taken place or not (cf. Andersen et al., p.139), i.e. survival is locally independent of censoring. We can then use the above argument to justify that we only use the likelihood based on the observed survival times, the partial likelihood, not needing to specify a model for censoring. Note that censoring is allowed to locally depend on survival and it usually does in the sense that long survival times are more likely to be censored than short ones. One might wish to further elaborate the reasons for censoring by including some unobservable processes, e.g. patients moving house, in the local independence graph so that it can be verified whether they affect both, censoring and survival, which would imply a violation of the independent censoring assumption. Two consequences of the above factorisation regarding the relation of local and conditional independence are given next. For the general case of marked point processes it holds that for disjoint A, B, C ⊂ V , where C separates A and B, i.e. A⊥ ⊥g B | C, in (GAn(A∪B∪C) )m , we have FtA ⊥ ⊥ FtB | FtC ∀t ∈ T . (11) This can be seen as follows. The factorisation (10) implies that the marginal likelihood for the marked point process discarding events not in An(A ∪ B ∪ C), i.e. {(Ts , Es ) | s = 1, 2, . . . ; Es ∈ EAn(A∪B∪C) }, is given by just integrating out the contributions of ek ∈ / EAn(A∪B∪C) yielding Y cl(k) An(A∪B∪C) Lk (t | Ht ). L(t | Ht )= k∈An(A∪B∪C) Hence, the likelihood may be written as product over factors that only depend on cl(k) = {k}∪ pa(k), k ∈ An(A∪B ∪C). Let C = {cl(k) | k ∈ An(A∪B ∪C)} be the set containing all such sets. Then, we have Y An(A∪B∪C) L(t | Ht )= Lc (t | Htc ). c∈C Further, the sets C can be taken to be the cliques of the graph (GAn(A∪B∪C) )m . The above thus corresponds to the factorisation property of undirected graphs (Lauritzen, 1996, p. 35) and implies the conditional independence (11). A property similar to (11) has been noted by Schweder (1970, Theorems 3 and 4) for Markov processes. Example 3 ctd. GvHD. Applying (11) to Figure 7, we can see that relapse Ft chronic, death ⊥ ⊥ Ftacute | Ft ∀ t ≤ T, where T is the (stopping) time when relapse or death occur. With a similar argument we can reformulate (2) and (3): for any B ⊂ V we have V \cl(B) NB (t)⊥ ⊥ Ft− cl(B) | Ft− , (12) i.e. the presence of NB is independent of the past of NV \cl(B) given the past of Ncl(B) . 18 3.4 Extensions Local independence graphs can easily be extended to include time–fixed covariates, such as sex, age and socioeconomic background of patients. The filtration at the start, F0 , then has to be enlarged to include the information on these variables. They can be represented by additional nodes in the graph with the restrictions that the subgraph on the non– dynamic nodes must be a DAG and no directed edges are allowed to point from processes to time–fixed covariates. A process being locally independent of a time–fixed covariate then simply means that the intensity does not depend on this particular covariate given all the other covariates and information on the past of all processes, and δ–separation can still be applied to find further independencies. Similarly, these graphs can be combined with chain graphs where the processes have to be nodes in the terminal chain component; for an application of this latter idea see Gottard (2002). The nodes in a local independence graph do not necessarily have to stand for only one mark (or the associated counting process), marks can be combined into one node. This might be of interest when there are logical dependencies. For example if a particular illness is considered then the events ‘falling ill’ and ‘recovering’ from this illness are trivially locally dependent. If one node is used to represent a collection of marks corresponding to a multivariate subprocess of the whole multivariate counting process then an arrow into this node will mean that the intensity of at least one (but not necessarily all) of these marks depends on the origin of the arrow. However, some interesting information could be lost. For instance, if events such as ‘giving birth to 1st child’, ‘giving birth to 2nd child’ etc. are considered, it might be relevant whether a woman is married or not when considering the event of ‘giving birth to 1st child’ but it might not be relevant anymore when considering ‘giving birth to 2nd child.’ A similar issue arises if it is found that certain events can systematically occur at the same time. For instance, in some social contexts, ‘moving out of the parents house’ is often equivalent to ‘getting married’. In this case we propose to regard the joined marks as a new mark of its own and include it with an additional node in the graph. Again there will be some logical dependencies but the possibly different histories leading to ‘moving out of the parents house without getting married’ and ‘moving out of the parents house and getting married’ could be read off the graph. In many data situations the mark space is not finite, e.g. when measuring the magnitude of electrical impulse, the amount of income in a new job or the dosage of a drug. One could then discretise the mark space e.g. in ‘finding a well paid job’ and ‘finding a badly paid job’. However, it must be suspected that too many of these types of events will generate too many logical dependencies that are of no interest and will make the graphs crammed. As we have seen in some of the examples it is sometimes sensible to consider stopped processes in order to avoid having to represent logical and uninteresting dependencies. More generally one might want to relax in Definition 2.2 the requirement ‘for all t ∈ T ’ and instead consider suitably defined intervals based on stopping times. E.g. it might be the case that the independence structure is very different between the time of finishing education and starting the first job than before or after that. This deserves further 19 investigation. 4 Discussion and conclusions The well–known benefits of the classical graphical models for the non-dynamic case lie mainly in (a) replacing algebraic manipulation by graphical ones, and as a consequence thereof (b) facilitation of structuring and communicating complex data situations, in particular when addressing causal inference, as well as (c) the various ways in which they allow reduction of computational complexity, hence accelerating calculations, e.g. for expert systems or model search. In the following we discuss the potential of local independence graphs in these regards. Graphical manipulations. The main point of graphical models is that properties of the graph correspond to properties of the statistical model, so that by ‘looking’ at the graph and manipulating the graph (e.g. separation, moralisation) we can easily read off information that would otherwise require some (usually intricate) algebra on the joint distribution, or in case of processes, on the intensities or likelihood. With respect to local independence graphs we refer especially to the global Markov property which allows to assess local independencies within subprocesses, without need to even consider the specific intensities of these subprocesses in more detail. The same can be said for properties (11) and (12). These results hold for very general counting processes. Once more specific classes of processes, such as e.g. Markov processes or parametric models, are considered, it can be expected that more properties of the local independence graphs might become useful, but this requires further investigation. Facilitation of structuring and communicating. We see the main benefit of the graphs in situations where the local independence structure of the system is at least partially known so that specific questions about the remaining structure can be addressed. The equivalence in Theorem 3.4 means that in order to draw the graph we can focus on every type of event individually and elicit which previous events are relevant, possibly including the event censoring. Once the graph is constructed in this way it can be further investigated what separations hold and e.g. what the effect of ignoring processes, that are maybe difficult or impossible to observe (cf. Example 4), would be. Hence, δ–separation allows to identify situations where even though some important processes or events cannot directly be observed there is enough information in the observable events to address the aims of the investigation. Unobservable events are of particular interest when we aim at causal inference as they may consitute confounders. Causal inference is of course a topic of its own and it is not the aim of this paper to go into much detail in this respect, but for the following few comments. Local independence graphs represent dependencies in Λk (dt) = E(Nk (dt)|Ft− ), where conditioning is on having observed Ft− — hence these dependencies are associations and, as we have seen in Example 4, they depend very much on whether we condition on Ft− or different subsets (or even extensions) thereof. Causal inference is about predicting Nk (dt) after intervening in Ft− , e.g. by fixing the times of certain events like in the home visits example when modulating the visits. It is well known that conditioning on observation is not the same as conditioning on intervention (Pearl, 2000; Lauritzen, 2000). Hence, 20 without making some further assumptions, the arrows in local independence graphs do not necessarily represent causal dependencies — the intensity of an event being dependend on whether another event has been observed before does not imply causation in the same way as correlation does not imply causation. Such further assumptions could be that all ‘relevant’ events (or processes) have been taken into account, like originally proposed by Granger in order to justify the use of the term ‘causality’ for what is now known as Granger-causality. E.g. if, in Figure 8(b), we are satisfied that by including ‘dementia’ all relevant processes have been taken into account, then we could say that homevisits are not directly causal but indirectly causal for ‘death’ (or survival) as there is a directed path from ‘home visits’ to ‘death’. Obviously, in this particular example, there are many other relevant processes, like other illnesses or e.g. death of the partner, that might be relevant. In general, ‘including all relevant’ processes is too vague a guideline to by useful, and it is also not necessary. Instead, the literature on (non-dynamic) graphical models and causality has identified different sufficient graphical criteria that allow causal inference and it would be desirable to transfer these to the dynamic case. A similar framework based on local independence graphs would require more prerequisites than we have given in this paper and such criteria would need to take into account that the graphs can contain cycles. Hence this is a topic for further research. For non–graphical approaches to causal reasoning in a continuous time event history setting consider Eerola (1994), Lok (2001), Arjas and Parner (2004) and Fosen et al. (2004). Reduction of computational complexity. One important step towards reducing computational effort is to reduce dimensionality. Many methods based on graphical models are doing exactly this. They are based on factorisation, model or parametric collapsibility and separation. Expert systems perform fast calculations by exploiting the local structure of graphical models, decomposing it into its cliques. Model search can be accelerated, e.g. by testing the presence or absence of an edge within the smallest relevant subset of variables which can be determined by graphical criteria. For local independence graphs, we have formulated a separation criterion and deduced a factorisation of the likelihood, both of which can be exploited to reduce dimensionality. For Markov processes, for example, Nodelman et al. (2003) show how the graphical structure can be exploited to find the graph when it is not known from background knowledge. A further topic of research is the question of collapsibility. Graphical collapsibility means that the local independence structure of a subprocess remains the same after ignoring the other components of the MPP. Ideally, we would also want to remain in the same model class, i.e. if the whole process is Markov we would like the subprocess to have the same independence structure and also be Markov. A first attempt at tackling the question of collapsibility of local independence graphs has been made in Didelez (2000, pp.102) but further investigation is still required. Finally, it should be said that local independence graphs are not a stand–alone method as they only encode the presence or absence of a dependence. One would typically want to further quantify and analyse the dependencies that are found. Also, they do not provide any particular inference method, apart from simplifying the likelihood. Local independence graphs can be combined with non–, semi– or fully parametric methods but more research is required to investigate how the graphical representation of the local indepen21 dence structure can simplify inference. 5 Appendix First of all we prove that the trail condition of Proposition 2.5 is indeed equivalent to Definition 2.4. Then we discuss some further properties of δ–separation as well as local independence which will finally be used to prove Theorem 3.4. 5.1 Proof of Proposition 2.5 m We have to show A⊥ ⊥g B|C in (GB An(A∪B∪C) ) if and only if every allowed trail from A to B in G is blocked by C. We start with some preliminary remarks. Let π be an allowed trail from A to B in G. Note first that the situation of (j, k) ∈ E and (k, j) ∈ E for j, k on π implies that there are two different trails, one for each directed edge. Second, a trail always consists of distinct vertices. From both statements it follows that cycles within a trail are not m possible. Further, we know that if there is a trail π m between A and B in (GB An(A∪B∪C) ) then there exists an allowed trail π d from A to B in G on the same vertices or containing additional ones which are in G common children of some of the vertices in π m . In reverse, if there is an allowed trail π d from A to B in G then either there exists a trail π m in m (GB An(A∪B∪C) ) which differs from the former in that it can omit some of the vertices on π d because they are common children of some other vertices on π d (this is called a short cut). Or the trail π d contains vertices that are not in An(A ∪ B ∪ C) from where it follows that there exists a vertex γ on π d with γ 6∈ An(A ∪ B ∪ C) where the directed edges on this trail meet head–to–head. m d Let now A⊥ ⊥g B|C in (GB An(A∪B∪C) ) and π be an allowed trail in G from A to B. If this trail contains vertices which are not in An(A ∪ B ∪ C) then it follows from the above considerations that there exists a vertex γ on π d where the edges meet head–to–head and which is not in C as well as all its descendants. Thus, this trail is blocked by C. Otherwise, m there exists a trail π m between A and B in (GB An(A∪B∪C) ) with the same vertices or some short–cuts which is intersected by C. In case that no short–cuts are possible, there are no vertices in π d where edges meet head–to–head and therefore there has to be at least one such vertex γ ∈ C. In the other case, it holds for all γ ∈ {pa(η)| η on π d and edges meet head–to–head at η}, that no edges meet head–to–head in γ. Since π m is intersected by C, it follows that at least one of the vertices in π d where no edges meet head–to–head is element of C. Thus, this trail is also blocked by C. Finally, let any allowed trail from A to B be blocked by C and consider a trail π m between m d A and B in (GB An(A∪B∪C) ) . For the corresponding allowed trail π it either holds that at least one of its vertices where no edges meet head–to–head is an element of C. In this case it has to be in π m , so that we have that the latter is intersected by C. Or there exists a vertex γ on π d where edges meet head–to–head and γ as well as all its descendants are not elements of C. Note that γ has to be an ancestor of A ∪ B ∪ C because otherwise its parents would not be linked and no elements of π d . Thus, there is a directed path from γ to A or to B (not to C by definition of γ). Since one of these directed paths would yield an allowed trail from A to B without vertices where edges meet head–to–head it has to 22 be intersected by C which is again not possible because then some of the descendants of γ would be elements of C. This argumentation applies to all vertices in π d where edges meet head–to–head. Thus, any allowed trail from A to B in G which yields a path between A m and B in (GB An(A∪B∪C) ) always includes at least one vertex which is an element of C and where no edges meet head–to–head, and therefore any such path is intersected by C. 5.2 Asymmetric graphoids As both, δ–separation and local independence, are new notions of graph separation and independence, respectively, and both are asymmetric the properties known for the classic graph separation and for stochastic independence are not useful. We therefore develop a new framework generalising the notion of graphoid axioms (Pearl and Paz, 1987; Pearl, 1988; Dawid, 1998) to the asymmetric case. These characterise ternary relations (A ir B | C), which can be read as A is irrelevant for B given C. It is then shown which of these asymmetric properties hold for δ–separation and local independence, respectively. Some properties are only satisfied if we make some further assumptions as will be detailed. The main aim is to use these properties in order to prove Theorem 3.4 in Section 5.3. We start by generalising the graphoid axioms to the case of an asymmetric ternary relation on a general lattice. A space V equipped with a semi–order ’≤’ is called a lattice if for any elements A, B ∈ V there exists a least upper bound denoted by A ∨ B such that for all C ∈ V with A ≤ C and B ≤ C we have (A ∨ B) ≤ C. The largest lower bound is similarly given by A ∧ B. Due to the asymmetry we formulate a left and a right version of each graphoid axiom as follows. Definition 5.1 Asymmetric (semi)graphoid Let (A ir B | C) be a ternary relation on a lattice (V, ≤). The following properties are called the asymmetric semigraphoid properties: Left redundancy: A ir B | A Right redundancy: A ir B | B Left decomposition: A ir B | C, D ≤ A ⇒ D ir B | C Right decomposition: A ir B | C, D ≤ B ⇒ A ir D | C Left weak union: A ir B | C, D ≤ A ⇒ A ir B | (C ∨ D) Right weak union: A ir B | C, D ≤ B ⇒ A ir B | (C ∨ D) Left contraction: A ir B | C and D ir B | (A ∨ C) ⇒ (A ∨ D) ir B | C Right contraction: A ir B | C and A ir D | B ∨ C ⇒ A ir (B ∨ D) | C If additionally the following properties hold we have an asymmetric graphoid: Left intersection: A ir B | C and C ir B | A ⇒ (A ∨ C) ir B | (A ∧ C) Right intersection: A ir B | C and A ir C | B ⇒ A ir (B ∨ C) | (B ∧ C). 23 Note that for the case that V is the power set of V and A, B, C ∈ V, and if left redundancy, decomposition and contraction hold then A ir B | C ⇔ A\C ir B | C. (13) To see this note that it follows directly from left decomposition that A ir B | C ⇒ A\C ir B | C. To show A\C ir B | C ⇒ A ir B | C, note that trivially A\C ir B | C ∪ (C ∩ A). Additionally, it follows from left redundancy (i.e. C ir B | C) and left decomposition that (C ∩ A) ir B | C. Left contraction now yields the desired result. Similarly we can show that if right redundancy, decomposition, and contraction hold then A ir B | C ⇔ A ir B\C | C. (14) Properties (13) and (14) permit to generalise results for disjoint sets to the case of overlapping sets. The following corollary, for instance, exploits this to reformulate the intersection property in a more familiar way. Corollary 5.2 Alternative intersection property Let V be the power set of V and A, B, C ∈ V. Given a ternary relation on V that satisfies all ‘left’ versions of Definition 5.1 (except weak union) then for pairwise disjoint sets A, B, C, D ∈ V A ir B | (C ∪ D) and C ir B | (A ∪ D) ⇒ (A ∪ C) ir B | D. (15) If all ‘right’ versions of Definition 5.1 (except weak union) are satisfied then A ir B | (C ∪ D) and A ir C | (B ∪ D) ⇒ A ir (B ∪ C) | D. (16) Proof: As the conditions for property (13) are given we have that A ir B | (C ∪ D) ⇔ (A ∪ C ∪ D) ir B | (C ∪D) from where it follows with left decomposition that (A∪D) ir B | (C ∪D). With the same argument we get C ir B | (A∪D) ⇒ (C∪D) ir B | (A∪D). Left intersection yields (A ∪ C ∪ D) ir B | D which is again equivalent to (A ∪ C) ir B | D because of (13). Implication (16) can be shown analogously. 5.2.1 Graphoid properties satisfied by δ–separation The interpretation of δ–separation as irrelevance relation should be that if C δ–separates A from B in G then A is irrelevant for B given C. This is denoted by A irδ B | C. The semi–order is given by the set inclusion ’⊂’, the join and meet operations by union and intersection, respectively. Proposition 5.3 Graphoid properties for δ–separation Let G be a directed graph. The δ–separation satisfies the following graphoid axioms: (i) left redundancy, (ii) left decomposition, (iii) left and right weak union, 24 (iv) left and right contraction, (v) left and right intersection. Proof: (i) We have by definition for non–disjoint sets that A irδ B | A ⇔ A\(B ∪ A) irδ B | A\B ⇔ ∅ irδ B | A\B which is again by definition always true. It is easily checked that for properties (ii), (iii), and left contraction we can assume that all involved sets are pairwise disjoint without loss of generality. m m (ii) It follows from A⊥ ⊥g B | C in (GB ⊥g B | C in (GB An(A∪B∪C) ) that D⊥ An(A∪B∪C) ) m for D ⊂ A since decomposition holds for ⊥ ⊥g . Further, (GB An(B∪C∪D) ) is a subgraph of m m (GB ⊥g B | C in (GB An(A∪B∪C) ) or has even less edges. Thus, D⊥ An(B∪C∪D) ) . m ⊥g B | (iii) Left weak union can easily be shown: A⊥ ⊥g B | C in (GB An(A∪B∪C) ) implies A⊥ m m B (C ∪ D) in (GB An(A∪B∪C∪D) ) = (GAn(A∪B∪C) ) because ordinary weak union holds for graph separation and because An(A ∪ B ∪ C ∪ D) =An(A ∪ B ∪ C) for D ⊂ A. Right weak union is trivial since application of the definition for overlapping sets yields that (C ∪ D)\B = C. (iv) For left contraction we have to show that m A⊥ ⊥g B | C in (GB An(A∪B∪C) ) (17) m D⊥ ⊥g B | (A ∪ C) in (GB An(A∪B∪C∪D) ) (18) and m imply (A ∪ D)⊥ ⊥g B | C in (GB An(A∪B∪C∪D) ) . Since contraction holds for ordinary graph m separation it is sufficient to show that A⊥ ⊥g B | C in (GB An(A∪B∪C∪D) ) . Thus, we have to show that there is no path between A and B in the latter graph not intersected by C. Let m F be the additional vertices in (GB An(A∪B∪C∪D) ) , i.e. F =An(D)\An(A∪B ∪C). If F = ∅ left contraction trivially holds. Otherwise we do not only have these additional vertices but also the additional edges due to directed edges starting in F and due to marrying parents of children in F . By definition of F we have that for all vertices f ∈ F there exists a directed path in G from f to some d ∈ D which is not intersected by An(A ∪ B ∪ C). m From this it follows first of all that any path between B and F in (GB An(A∪B∪C∪D) ) has to be intersected by A ∪ C because otherwise (18) would be violated. Thus, there is no new path between A and B through F not intersected by A ∪ C. Secondly, for all parents k in GB of some f ∈ F there can be no path between k and B not intersected by A ∪ C m in (GB An(A∪B∪C∪D) ) for the same reason. Thus, marrying parents of F does not yield a new path from or through pa(F ) to B not intersected by A ∪ C. A path from A to B not intersected by C would therefore have to be a direct one. This is in turn not possible because of (17) and because A and B trivially have no common children in F (in GB ). In order to show right contraction in full generality we have to apply the definition for overlapping sets. Let A∗ = A\(B ∪ C) and C ∗ = C\B, then we have to show that from m A∗ ⊥ ⊥g B | C ∗ in (GB An(A∪B∪C) ) (19) m A∗ \D⊥ ⊥g D | (B ∪ C ∗ )\D in (GD An(A∪B∪C∪D) ) (20) and 25 B∪D it follows that A∗ \D⊥ ⊥g (B ∪ D) | C ∗ \D in (GAn )m . From (19) we get by (A∪B∪C∪D) m left decomposition that A∗ \D⊥ ⊥g B | C ∗ in (GB An(A∪B∪C) ) . From this it follows by similar arguments as applied in the proof for left contraction that A∗ \D⊥ ⊥g B | C ∗ in m B∪C m B (GAn(A∪B∪C∪D) ) . The same separation also holds in (GAn(A∪B∪C∪D) ) since the latter graph results by deleting some edges of the former. Thus, we have A∗ \D⊥ ⊥g B | C ∗ in B∪D B∪D (GAn )m . From (20) we get that A∗ \D⊥ ⊥g D | (B ∪ C ∗ )\D in (GAn )m (A∪B∪C∪D) (A∪B∪C∪D) since the latter graph has again the same or less edges than the original one. With à = A∗ \D we can show by the properties of ordinary graph separation (decomposition, contraction, and intersection) that Ã⊥ ⊥g B | C ∗ and Ã⊥ ⊥g D | (B\D) ∪ (C ∗ \D) imply Ã⊥ ⊥g (B ∪ D) | C ∗ \D as desired. This can be seen as follows: With decomposition we get Ã⊥ ⊥g B\D | (C ∗ \D) ∪ (C ∗ ∩ D) and Ã⊥ ⊥g (C ∗ ∩ D) | (C ∗ \D) ∪ (B\D). Intersection and decomposition yield Ã⊥ ⊥g B\D | C ∗ \D. This, together with Ã⊥ ⊥g D | (B\D) ∪ (C ∗ \D) and the contraction property, provides the desired result. (v) We omit this proof as these properties are not actually required for the proof of Theorem 3.4 (cf. Didelez, 2000, pp. 69). Note that with the above results both properties (13) and (15) hold for δ–separation. Right redundancy does not hold and hence neither does (14) in general. However, we have the following result. Lemma 5.4 Special case of right decomposition for δ–separation Given a directed graph G, it holds that: A irδ B | C, D ⊂ B ⇒ A irδ D | (C ∪ B)\D Proof: Due to property (13) we can assume that A ∩ C = ∅. Let A∗ = A\B and C ∗ = C\B. m ∗ Then, we have to show that A∗ ⊥ ⊥g B | C ∗ in (GB ⊥g D | C ∗ ∪(B\D) An(A∪B∪C) ) implies A ⊥ m m ∗ ⊥g D | C ∗ ∪ (B\D) in (GB in (GD An(A∪B∪C) ) holds due to An(A∪B∪C) ) . Note that A ⊥ weak union and decomposition of ordinary graph separation. Changing the graph to m (GD An(A∪B∪C) ) means that all edges that are present in G as directed edges starting in B\D are added. Additionally, those edges have to be added which result from vertices in B\D having common children with other vertices. Since all these new edges involve m vertices in B\D there can be no additional path between A∗ and D in (GD An(A∪B∪C) ) not intersected by C ∗ ∪ (B\D). Although (14) does not hold in full generality it is easily checked that (16) holds, we omit the details. 5.2.2 Graphoid properties satisfied by local independence An obvious way to translate local independence as irrelevance relation is by letting A ir B | C stand for A → / B | C, A, B, C ⊂ V . The semi–order is again given by the set inclusion ’⊂’, the join and meet operations by union and intersection, respectively. Proposition 5.5 Graphoid properties of local independence The following properties hold for local independence as defined in Definition 2.2: 26 (i) left redundancy, (ii) left decomposition, (iii) left and right weak union, (iv) left contraction, (v) right intersection. Proof: (i) Left redundancy holds since obviously the FtA∪B –compensators of NB are FtA∪B – measurable, i.e. if the past of NA is known, then the past of NA is of course irrelevant. (ii) Left decomposition holds since the FtA∪B∪C –compensators Λk (t), k ∈ B, are FtB∪C – measurable by assumption so that the same must hold for the FtB∪C∪D –compensators Λk (t), k ∈ B, for D ⊂ A. (iii) Left and right weak union also trivially hold since adding information on the past of components that are already uninformative (left) or included (right) does not change the compensator. (iv) Left contraction holds since we have that the FtA∪B∪C∪D –compensators Λk , k ∈ B, are by assumption FtA∪B∪C –measurable and these are again by assumption FtB∪C –measurable. (Right contraction is not so easily seen and we omit the proof as it is not required for the proof of Theorem 3.4.) (v) The property of right intersection can be checked by noting that in the definition of local independence the filtration w.r.t. which the intensity process should be measurable is always generated at least by the process itself. With the above Proposition (13) holds. In contrast, it is clear by the definition of local independence that right redundancy does not hold because otherwise any process would always be locally independent of any other process given its own past. It follows that the equivalence (14) does not hold. However, it is always true that A → / B |C ⇒A→ / B\C | C. For symmetric irrelevance relations it is the property of intersection that makes a semigraphoid a graphoid. Its importance for the equivalence of pairwise, local and global Markov properties in undirected conditional independence graphs is well–known (cf. Lauritzen, 1996). As shown later, it is of similar importance for local independence graphs hence we now give a condition for left intersection to hold. Proposition 5.6 Left intersection for local independence Under the assumption of (6) local independence satisfies left intersection. Proof: Left intersection assumes that the FtA∪B∪C –compensators Λk (t), k ∈ B, are FtB∪C – as well B∪(A∩C) as FtA∪B –measurable. With (6) we get that they are Ft –measurable which yields the desired result. The remaining property, right decomposition, requires special consideration because it makes a statement about the irrelevance of a process NA after discarding part of the possibly relevant information NB\D . If the irrelevance of NA is due to knowing the past 27 of NB\D then it will not necessarily be irrelevant anymore if the latter is discarded. Right decomposition therefore only holds under specific restrictions on the relation between the potentially irrelevant process and the relevant process to be discarded. One particular such situation is given in the following proposition. Proposition 5.7 Conditions for right decomposition of local independence P Consider a marked point process and assume that the cumulative counting process Nk is non–explosive, that ΛC admits an intensity and that there are only countably many jumps ∆Λ. Let A, B, C, D ⊂ V with (B ∩ A)\(C ∪ D) = ∅. Right decomposition holds under the conditions that B→ / A\(C ∪ D) | (C ∪ D) (21) A→ / {k} | C ∪ B or B → / {k} | (C ∪ D ∪ A) (22) and for all k ∈ C\D. Proof: In this proof we proceed somewhat informally for the sake of simplicity. The formal proof is based on the results of Arjas et al. (1992) and given in Didelez (2000, pp. 72). Redefine A∗ = A\(C ∪ D), B ∗ = B\D and C ∗ = C\D. Then the above assumptions ensure that A∗ ∩ B ∗ = ∅ and in a local independence graph G on A ∪ B ∪ C ∪ D according to Definition 3.1 we have that pa(B ∗ ) ⊆ C ∪ D, pa(A∗ ) ⊆ C ∪ D, A∗ has no children in D and that A∗ and B ∗ have no common children in C ∗ . Hence, C ∪ D must separate A∗ and B ∗ in Gm . Thus, by (12) and (11), the assumptions of this proposition imply ND (t)⊥ ⊥ FtA− | FtB− ∪C ∗ ∗ ∗ ∪D as well as FtA ⊥ ⊥ FtB | FtC∪D . ∗ ∗ We want to show that the FtA∪C∪D –compensator Λ̃D (t) of ND (t) is FtC∪D –measurable. With the above and interpretation (1) we have Λ̃D (dt) = = = = ) = E(ND (dt) | FtA− ∪C E(ND (dt) | FtA∪C∪D − ∗ ∗ ∗ ∗ ∗ E(E(ND (dt) | FtA− ∪B ∪C ∪D ) | FtA− ∪C ∪D ) ∗ ∗ ∗ ∗ E(E(ND (dt) | FtB− ∪C ∪D ) | FtA− ∪C ∪D ) ∗ ), E(ND (dt) | FtC− ∪D ) = E(ND (dt) | FtC∪D − ∗ ∗ ∪D ) as desired. 5.3 Proof of Theorem 3.4 It is easily checked that (7) ⇒ (5) ⇒ (4): First, pa(k) always δ–separates V \(pa(k) ∪ {k}) from {k} in G, hence (5) is just a special case of (7). Second, (5) ⇒ (4) holds by left weak union and left decomposition of local independence. Also, it is easy to see that the equivalence of the pairwise and local dynamic Markov properties immediately follows from 28 left intersection assuming (6), left weak union and left decomposition. Thus, the following proof considers situations where A, B, C do not form a partition of V or pa(B) ⊂ / C. The structure of the proof corresponds to the one given by Lauritzen (1996, p. 34) for the equivalence of the Markov properties in undirected conditional independence graphs. Due to the asymmetry of local independence, however, this version is more involved. Assume that (4) holds and that C δ–separates A from B in the local independence graph. We have to show that A → / B | C, i.e. the FtA∪B∪C –compensators Λk (t), k ∈ B, are FtB∪C –measurable. The proof is via backward induction on the number |C| of vertices in the separating set. If |C| = |V | − 2 then both, A and B consist of only one element and (7) trivially holds. If |C| < |V | − 2 then either A or B consist of more than one element. Let us first consider the case that A, B, C is a partition of V and none of them is empty. If | A |> 1 let α ∈ A. Then, by left weak union and left decomposition of δ–separation we have that C ∪ (A\{α}) δ–separates {α} from B, i.e. {α} irδ B | C ∪ (A\{α}) and C ∪ {α} δ–separates A\{α} from B in G, i.e. A\{α} irδ B | (C ∪ {α}). Therefore, we have by the induction hypothesis that {α} → / B | C ∪ (A\{α}) and A\{α} → / B | (C ∪ {α}). From this it follows with the modified version of left intersection as given in (15) (which can be applied because of the assumption that (6) holds) that A → / B | C as desired. If | B |> 1 let {β} ∈ B. With Lemma 5.4 we have that C ∪ (B\{β}) δ–separates A from {β}, i.e. A irδ {β} | C ∪ (B\{β}) and C ∪ {β} δ–separates A from B\{β}, i.e. A irδ B\{β} | (C ∪ {β}). Therefore, we have again by the induction hypothesis that A→ / {β} | C ∪ (B\{β}) and A → / B\{β} | (C ∪ {β}). (23) The above immediately implies (by definition of local independence) that A → / B | C. Note that this argumentation could not be applied to the first part of the proof because A→ / B | (C ∪ {α}) does not imply A → / B | C. Let us now consider the case that A, B, C ⊂ V are disjoint but no partition of V . First, we assume that they are a partition of An(A ∪ B ∪ C), i.e. that A ∪ B ∪ C is an ancestral set. Let γ ∈ V \(A ∪ B ∪ C), i.e. γ is no ancestor of A ∪ B ∪ C. Thus, every allowed trail (cf. Proposition 2.5) from γ to B is blocked by A ∪ C since any such trail includes an edge (k, b) for some b ∈ B where no edges meet head–to–head in k and k ∈ A ∪ C. Therefore, we get {γ} irδ B | (A ∪ C). Application of left contraction, weak union, and decomposition for δ–separation yields A irδ B | (C ∪ {γ}). 29 It follows with the induction hypothesis that A→ / B | (C ∪ {γ}) as well as {γ} → / B | (A ∪ C). With left intersection as given by (15) in Corollary 5.2 and left decomposition for local independence we get the desired result. Finally, let A, B, C be disjoint subsets of V and A ∪ B ∪ C not necessarily an ancestral set. Choose γ ∈ an(A ∪ B ∪ C) and define G̃B = GB ⊥g B | C in (G̃B )m An(A∪B∪C) . Since A⊥ we know from the properties of ordinary graph separation that (i) either {γ}⊥ ⊥g B | (A ∪ C) in (G̃B )m (ii) or A⊥ ⊥g {γ} | (B ∪ C) in (G̃B )m . (i) In the first case {γ} irδ B | (A ∪ C) and it follows from left contraction that (A ∪ {γ}) irδ B | C. Application of left weak union and left decomposition yields A irδ B | (C ∪ {γ}). With the induction hypothesis we therefore get A→ / B | (C ∪ {γ}) and {γ} → / B | (A ∪ C). Left intersection according to (15) and left decomposition for local independence yields A→ / B | C. (ii) The second case is the most complicated and the proof makes now use of right decomposition for local independence under the conditions given in Proposition 5.7. First, we have from (ii) that A⊥ ⊥ G {γ} | B ∪ C in (GAn(A∪B∪C) )m since the additional edges starting in B can only yield additional paths between A and γ that must be intersected by B. Since deleting further edges out of γ does not create new paths, it holds A irδ {γ} | (B ∪ C). With A irδ B | C, application of right contraction for δ–separation yields A irδ (B ∪ {γ}) | C. Now, we can apply Lemma 5.4 to get A irδ B | (C ∪ {γ}) from where it follows with the induction hypothesis that A→ / B | (C ∪ {γ}) and A → / {γ} | (B ∪ C). With right intersection for local independence we get A → / (B ∪ {γ}) | C. In addition, {γ} → / A | (B∪C) by the same arguments as given above for A → / {γ} | (B∪C). In order to apply Proposition 5.7 we still have to show that for all k ∈ C either A irδ {k} | (C∪B∪{γ}) or {γ} irδ {k} | (C ∪ B ∪ A) which by the induction hypothesis implies the corresponding local independencies. To see this, assume that there exists a vertice k ∈ C for which neither holds. With the trail condition we then have that in GAn(A∪B∪C) there exists an allowed trail from A and γ, respectively, to k such that every vertex where edges do not meet head–to–head are not in (C ∪ B ∪ {γ})\{k} and (C ∪ B ∪ A)\{k}, respectively, and every vertex where edges meet head–to–head or some of their descendants are in (C ∪ B ∪ {γ})\{k} respective (C ∪ B ∪ A)\{k}. This would yield a path between A and γ which is not blocked by C ∪ B (note that k is a head–to–head node on this trail) in GAn(A∪B∪C) . This in turn contradicts the separation of A and γ by B ∪ C in GB An(A∪B∪C) 30 because the edges starting in B cannot contribute to this trail. Consequently we can apply right decomposition and get the desired result. Acknowledgements: I wish to thank Iris Pigeot for her persistent encouragement and advice, and Niels Keiding for valuable comments and suggestions. Financial support by the German Research Foundation (SFB 386) is also gratefully acknowledged. References Aalen, O. O. (1987). Dynamic Modelling and Causality. Scandinavian Actuarial Journal , 177 – 190. Aalen, O. O., Ø. Borgan, N. Keiding, and J. Thormann (1980). Interaction between Life History Events: Nonparametric Analysis of Prospective and Retrospective Data in the Presence of Censoring. Scandinavian Journal of Statistics 7, 161 – 171. Allard, D., Brix A. and J. Chadoeuf (2001). Testing local independence between two point processes. Biometrics 57 508 – 517. Andersen, P. K., Ø. Borgan, R. D. Gill, and N. Keiding (1993). Statistical Models Based on Counting Processes. Springer, New York. Arjas, E. (1989). Survival Models and Martingale Dynamics (with Discussion). Scandinavian Journal of Statistics 16, 177 – 225. Arjas, E., P. Haara, and I. Norros (1992). Filtering the Histories of a Partially Observed Marked Point Process. Stochastic Processes and Applications 40, 225 – 250. Arjas, E., and J. Parner (2004). Causal Reasoning from Longitudinal Data. Scandinavian Journal of Statistics 31, 171 – 201. Brémaud, P. (1981). Point Processes and Queues: Martingale Dynamics. Springer, New York. Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (1999). Probabilistic Networks and Expert Systems. Springer, New York. Cox, D. R. and V.S. Isham (1980). Point Processes. Chapman and Hall, London. Cox, D. R. and N. Wermuth (1996). Multivariate Depencencies — Models, Analysis and Interpretation. Chapman and Hall, London. Dahlhaus, R. (2000). Graphical Interaction Models for Multivariate Time Series. Metrika 51, 157 – 172. Daley, D. J. and D. Vere-Jones (2003). An introduction to the theory of point processes. Springer, London. Dawid, A. P. (1979). Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society, Series B 41, 1 – 31. Dawid, A. P. (1998). Conditional Independence. In S. Kotz, C. B. Read, and D. L. Banks (Eds.), Encyclopedia of Statistical Sciences, Update Volume 2, pp. 146 – 155. Wiley–Interscience, New York. 31 Dean, T. and K. Kanazawa (1989)). A model for reasoning about persistence and causation. Comp. Intelligence 5, 142 – 150. Didelez, V. (2000). Graphical models for event history data based on local independence. Logos, Berlin. Edwards, D. (2000). Introduction to Graphical Modelling (2nd ed.). Springer, New York. Eerola, M. (1994). Probabilistic Causality in Longitudinal Studies, Volume 92 of Lecture Notes in Statistics. Springer, Berlin. Eichler, M. (1999). Graphical Models in Time Series Analysis. Ph. D. thesis, University of Heidelberg. Eichler, M. (2000). Causality Graphs for Stationary Time Series. Technical report, University of Heidelberg. Fleming, T. R. and D. P. Harrington (1991). Counting Processes and Survival Analysis. Wiley, New York. Florens, J. P. and D. Fougère (1996). Noncausality in Continuous Time. Econometrica 64, 1195 – 1212. Fosen, J., Borgan, O., Weedon-Fekjoer, H. and O. O. Aalen (2004) Path analysis for survival data with recurrent events. Technical Report, University of Oslo, Norway. Gottard, A. (2002) Graphical duration models. Technical Report 2002/07, University of Florence, Italy. Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross–spectral Methods. Econometrica 37, 424 – 438. Jordan, M. (1999). Learning in Graphical Models. MIT Press, Cambridge. Keiding, N. (1999). Event history analysis and inference from observational epidemiology. Statistics in Medicine 18, 2353 – 2363. Keiding, N., Klein, J. P. and M. M. Horowitz (2001). Multi-state models and outcome prediction in bone marrow transplantation. Statistics in Medicine 20, 1871 – 1885. Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford. Lauritzen, S. L. (2000). Causal Inference from Graphical Models. In O. E. BarndorffNielsen, D. R. Cox, and C. Klüppelberg (Eds.), Complex Stochastic Systems. Chapman and Hall, London. Lok, J. J (2001). Statistical modelling of causal effects in time. Ph. D. thesis, Free University of Amsterdam. Nodelman, U., Shelton, C. R. and D. Koller (2002). Continuous Time Bayesian Networks. Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, 378 – 387. Morgan Kaufman, San Francisco. Nodelman, U., Shelton, C. R. and D. Koller (2003). Learning Continuous Time Bayesian Networks. Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence, 451 – 458. Morgan Kaufman, San Francisco. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, San Mateo. 32 Pearl, J. (2000). Causality. Models, Reasoning, and Inference. Cambridge University Press, Cambridge. Pearl, J. and A. Paz (1987). Graphoids: A Graph–Based Logic for Reasoning about Relevancy Relations. In B. Du Boulay, D. Hogg, and L. Steel (Eds.), Advances in Artificial Intelligence – II, pp. 357 – 363. Robins J. (1986). A new approach to causal inference in mortality studies with sustained exposure period — applications to control of the healthy workers survivor effect. Mathematical Modelling 7, 1393 – 1512. Robins J. (1987). Addendum to “A new approach to causal inference in mortality studies with sustained exposure period — applications to control of the healthy workers survivor effect”. Computers and Mathematics with Applications 14, 923 – 945. Robins J. (1997). Causal inference from complex longitudinal data. In M. Berkane, Latent Variable Modeling and Applications to Causality. Lecture Notes in Statistics 120. Springer, New York. Schweder, T. (1970). Composable Markov Processes. Journal of Applied Probability 7, 400 – 410. Spirtes, P., C. Glymour, and R. Scheines (2000). Causation, Prediction, and Search (2nd ed.). MIT Press, Cambridge, Massachesetts. Vass, M., Avlund, K., Hendriksen, C., Andersen C. K. and N. Keiding (2002). Preventive home visits to older people in Denmark: methodology of a randomized controlled study. Aging Clin. Exp. Res 14, 509 – 515. Vass, M., Avlund, K., Kvist, K., Hendriksen, C., Andersen C. K. and N. Keiding (2004). Structured home visits to older people. Are they only of benefit for women? A randomised controlled trial. Scand. J. Prim. Health Care 22, 106 – 111. Verma, T. and J. Pearl (1990). Causal Networks: Semantics and Expressiveness. In R. D. Shachter, T. S. Levitt, L. N. Kanal, and J. F. Lemmer (Eds.), Uncertainty and Artificial Intelligence, pp. 69 – 76. North Holland, Amsterdam. Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester. 33