J. R. Statist. Soc. B (2010) 72, Part 1, pp. 129–142 Criteria for surrogate end points based on causal distributions Chuan Ju and Zhi Geng Peking University, Beijing, People’s Republic of China [Received November 2008. Final revision September 2009] Summary. When a treatment has a positive average causal effect (ACE) on an intermediate variable or surrogate end point which in turn has a positive ACE on a true end point, the treatment may have a negative ACE on the true end point due to the presence of unobserved confounders, which is called the surrogate paradox. A criterion for surrogate end points based on ACEs has recently been proposed to avoid the surrogate paradox. For a continuous or ordinal discrete end point, the distributional causal effect (DCE) may be a more appropriate measure for a causal effect than the ACE. We discuss criteria for surrogate end points based on DCEs. We show that commonly used models, such as generalized linear models and Cox’s proportional hazard models, can make the sign of the DCE of the treatment on the true end point determinable by the sign of the DCE of the treatment on the surrogate even if the models include unobserved confounders. Furthermore, for a general distribution without any assumption of parametric models, we give a sufficient condition for a distributionally consistent surrogate and prove that it is almost necessary. Keywords: Causal diagram; Causal effect; Surrogate end point 1. Introduction In many biomedical studies, a surrogate end point is an attractive alternative when the true end point is difficult to obtain. However, misuse of a surrogate may lead to severe consequences and even disaster. A typical disastrous example of the unreliability of a surrogate end point was reported by Fleming and DeMets (1996) and Moore (1995). Ventricular arrythmia is associated with a significant increase in the risk for death, and thus it was hoped that suppressing ventricular arrythmia would reduce the death rate. Three drugs were found to suppress arrythmia and were approved by the Food and Drug Administration for use in patients with severely symptomatic arrythmia. However, among patients who took these medicines, the death rate was not reduced but significantly increased. As a result, approximately 50 000 people died (Moore, 1995). More examples of surrogates causing problems in clinical trials can also be found in Fleming and DeMets (1996) and Manns et al. (2006), such as CD4 cell counts for survival time in studies of acquired immune deficiency syndrome and bone mass for fracture in osteoporosis studies. In recent years, there have been many references questioning the validity and use of surrogate markers (Fleming and DeMets, 1996; Baker, 2006; Alonso and Molenberghs, 2008). Until now, there have been several criteria for surrogates. The first intuitive criterion requires that there is a strong correlation between the surrogate and the true end point. But Baker and Kramer (2003) illustrated that, for a strongly correlated surrogate, the treatment may have a Address for correspondence: Zhi Geng, School of Mathematical Sciences, Peking University, Beijing 100871, People’s Republic of China. E-mail: zgeng@math.pku.edu.cn © 2010 Royal Statistical Society 1369–7412/10/72129 130 C. Ju and Z. Geng positive effect on the surrogate but a negative effect on the true end point. Prentice (1989) proposed a criterion which requires the conditional independence of a treatment and the true end point given the surrogate. Frangakis and Rubin (2002) and Rubin (2004) pointed out that the conditional independence criterion may not satisfy the property of causal necessity, and they proposed a principal surrogate criterion which satisfies the property. Lauritzen (2004) used a causal network to depict a strong surrogate criterion and showed that a strong surrogate is always a principal surrogate. On the basis of average causal effects (ACEs), Chen et al. (2007) illustrated that, even for a principal surrogate or a strong surrogate, when a treatment has a positive ACE on the surrogate which in turn has a positive ACE on the true end point, the treatment may have a negative ACE on the true end point due to the presence of unobserved confounders, which is called the surrogate paradox. In the case where an end point or response is a continuous or ordinal discrete variable, the ACE may not be a suitable measure when the response has different variances for different treatment levels or when some values of the response mean special benefits or costs. Thus a more appropriate measure of causal effects than the ACE may be the distributional causal effect (DCE), which is defined as the difference between the cumulative distributions of potential responses for two different treatment levels. In this paper, we present the notion of distributionally consistent surrogates (DCSs) based on DCEs. We say that an intermediate variable is a DCS if a non-negative (0 or greater), a non-positive (0 or less) and a null (0) DCE of a treatment on the surrogate imply a non-negative, a non-positive and a null DCE of the treatment on the true end point respectively. Further we say that it is a strict DCS if the inequalities in the conditions are strict (i.e. change ‘0 or greater’ and ‘0 or less’ to ‘greater than 0’ and ‘less than 0’ respectively). We shall show that, when the model between a treatment and an intermediator and the model between the intermediator and a true end point are generalized linear models or Cox’s proportional hazard models, the intermediator will be a strict DCS for the true end point even if the models include unobserved confounders. Furthermore we shall also present a sufficient condition for DCS end points for a general distribution without any assumption of parametric models and prove that the condition is almost necessary. The paper is organized as follows. In Section 2, we first give an example to illustrate some problems with criteria for surrogate end points based on ACEs, and then we present the notions of a DCS and strict DCS based on DCEs. In Section 3, we prove that distributional consistency and strict distributional consistency are generally guaranteed under many commonly used models, such as generalized linear models and proportional hazard models. In Section 4, we give a condition for DCSs for a general distribution without any assumption of parametric models. In Section 5 we give some discussion. All proofs of the theorems are presented in Appendix A. 2. Definitions of consistent surrogates based on causal distributions Lauritzen (2004) proposed the causal diagram in Fig. 1 to depict the relationship between a treatment T , a surrogate S, a true end point Y and an unobserved confounder set U in a randomized trial, where T , S and Y are univariate, and U is univariate or multivariate, which consists of all non-descendants of S and all non-descendants of Y except T and S. The arrows from U to S and Y mean that all arrows between U and .S, Y/ are oriented from U to S or Y ; in other words, there is no arrow from S or Y pointing at U. Since we focus on randomized trials rather than observational studies, the unobserved confounder set U does not affect the treatment T , but it may affect both the surrogate S and the true end point Y since usually the surrogate cannot be randomized in a trial. In this paper we use this diagram to represent a surrogate end point and all definitions and theoretical results will be based on this graphical model, for which the joint probability can be factorized into Criteria for Surrogate End Points Fig. 1. 131 Causal diagram for depicting a surrogate (Lauritzen, 2004) p.t, s, y, u/ = p.t/ p.u/ p.s|t, u/ p.y|s, u/ = p.x|pax /, x∈V where V = {t, s, y, u} is the set of all variables and pax is the parent set of x. Let do.T = t/ denote an external intervention to force treatment to level T = t. In the context of a randomized trial, we only ever condition on an intervention do.T = t/, never on an observation T = t, and thus we make all probabilities conditional on do.T = t/ and omit p.t/. Following the causal interpretation in Pearl (2000), the post-interventional distribution resulting from the intervention do.X = x/ can be determined by deleting the factor p.x|pax / from the joint probability. Graphically the intervention do.X = x/ means that the arrows entering X are cut off. For the causal diagram in Fig. 1, we have the post-intervention distributions p{s, y, u|do.t/} = p.u/ p.s|t, u/ p.y|s, u/, and p{y, u|do.s/, do.t/} = p.u/ p.y|s, u/: The ACE of a variable X on another variable Z for comparing two treatment levels x and x (x > x ) is defined as ACE{X → Z|do.x /, do.x /} = E{Z|do.x /} − E{Z|do.x /}: Criteria for consistent surrogates based on ACEs were proposed by Chen et al. (2007) to avoid the phenomenon that a treatment has a positive ACE on a surrogate which in turn has a positive ACE on a true end point, but the treatment has a negative ACE on the true end point. Below we give an example to illustrate that the ACE consistent surrogates have not completely avoided the surrogate paradox and a more rigorous criterion for surrogate end points is needed. 2.1. Example 1 Moore (1995) described the dramatic story of one of the worst drug disasters. Irregular heart beat, or arrythmia, was predictive of early mortality. A new drug approved by the Food and Drug Administration in 1985 could lead to a reduction of the risk for arrythmia of a third. However, taking the drug would increase the death rate by three times. An estimated 50 000 patients in the USA died because they took this and other antiarrhythmic drugs. Suppose that the treatment T , the surrogate S and the unobserved confounder U are binary with values 0 and 1, and the true end point Y is ternary with values 0, 1 and 2. U may denote a particular gene or heart injury that could affect both heartbeat and sudden death. Let T = 1 denote treatment, S = 1 correction of arrythmia, U = 0 heart injury, Y = 0 sudden death immediately, Y = 1 a survival time within 5 years and Y = 2 a survival time of more than 5 years. The artificial probabilities are shown in Table 1 with 132 C. Ju and Z. Geng Table 1. U 0 1 Artificial probabilities for example 1 P(S = 1|U = u, T = t) P(Y = 0|U = u, S = s) P(Y = 1|U = u, S = s) T=0 T=1 S=0 S=1 S=0 S=1 0.95 0.75 0.7 0.99 1 0.01 0.03 0.01 0 0.8 0.2 0.1 P.Y = 2|U = u, S = s/ = 1 − 1 y=0 P.Y = y|U = u, S = s/ omitted and P.U = 0/ = 0:2. With the given probabilities we have that ACE{T → S|do.T = 1/, do.T = 0/} = 0:14 > 0, ACE{S → Y |do.S = 1/, do.S = 0/} = 0:40 > 0 and ACE{T → Y |do.T = 1/, do.T = 0/} = 0:05 > 0. According to the definition of a consistent surrogate based on ACEs, the surrogate S is an ACE consistent surrogate for the true end point Y. But it can be shown that the treatment T increases the death rate by three times (i.e. P.Y = 0|T = 1/=P.Y = 0|T = 0/ = 3:05) even if it reduces the risk of arrythmia to a third (i.e. P.S = 0|T = 1/=P.S = 0|T = 0/ = 0:32), which fit the numbers in Moore (1995) very well. Thus the surrogate S does not raise a surrogate paradox based on ACEs, but it raises a surrogate paradox based on causal distributions, i.e. T has a positive distributional causal effect on S in the sense that the causal distribution of S given do.T = 1/ stochastically dominates the causal distribution of S given do.T = 0/, and S also has such a positive distributional causal effect on Y , but T may still have a negative causal effect on Y. This example explains that a consistent surrogate based on ACEs may not be a good criterion for some applications. Thus we now introduce a more rigorous criterion for surrogates based on causal distributions to avoid this problem. Definition 1. The DCE of a variable X on another variable Z for a specific threshold z comparing two treatment levels x and x .x > x / is defined as the difference of post-intervention cumulative probabilities DCE{X → .Z > z/|do.x /, do.x /} = P{Z > z|do.x /} − P{Z > z|do.x /}: We say that X has a non-negative, non-positive or null DCE on Z if, for any two levels x > x , DCE{X → .Z > z/|do.x /, do.x /} is respectively greater than or equal to, less than or equal to or equal to 0 for all z. If, in addition, there are z and x > x such that DCE{X → .Z > z/|do.x /, do.x /} is greater than or less than 0, then we say that X has respectively a positive or negative DCE on Z. We have a non-negative DCE of X on Z if and only if the distribution of Z given do(x) is stochastically non-decreasing in x. Note that DCE depends on a specific threshold z but the sign (negative, positive or zero) of the DCE does not. Hereafter we compare only two levels t and t of treatment T. For a treatment T with more than two levels, we can compare its levels pairwise, and thus we assume that T is binary without loss of generality. In applications of surrogates, we observe a surrogate S instead of observing the true end point Y , and the confounder U is unobserved. Thus the distribution of the observed variables .T , S/ can be evaluated from observed data, but the conditional distributions of Y and the marginal distribution of U cannot be observed. Below we give definitions of a DCS end point and a strict DCS end point such that they can be used to assess the effect of the treatment T on the true Criteria for Surrogate End Points 133 end point Y when we have prior knowledge about the sign of the DCE of S on Y , in spite of the unobserved variables. Definition 2. An intermediate variable S in Fig. 1 is a DCS end point for Y if (a) for a non-negative DCE of S on Y , a non-negative or a non-positive DCE of T on S implies respectively a non-negative or non-positive DCE of T on Y , (b) for a non-positive DCE of S on Y , a non-negative or non-positive DCE of T on S implies respectively a non-positive or non-negative DCE of T on Y and (c) a null DCE of T on S implies a null DCE of T on Y. By this definition, a DCS end point ensures that the sign (non-strict inequalities) of the DCE of the treatment T on the surrogate S can determine the sign (non-strict inequalities) of the DCE of the treatment T on the true end point Y. Thus the surrogate paradox does not occur in the DCE context, although it is still possible for the treatment T to have a positive or negative DCE on the surrogate S but a null DCE on the true end point. Further we introduce the strict notion as follows. Definition 3. An intermediate variable S in Fig. 1 is a strict DCS end point for Y if (a) for a positive DCE of S on Y , a positive or negative DCE of T on S implies respectively a positive or negative DCE of T on Y , (b) for a negative DCE of S on Y , a positive or negative DCE of T on S implies respectively a negative or positive DCE of T on Y and (c) a null DCE of T on S implies a null DCE of T on Y. In definitions 2 and 3, we implicitly require that the DCE of T on S has the same sign for all thresholds s and also that the DCE of S on Y has the same sign for all thresholds y. From the equation ∞ 0 E.X/ = P.X > x/dx − {1 − P.X > x/} dx −∞ 0 (Chung (1974), page 49), we have that ACE{X → Z|do.x /, do.x /} = ∞ −∞ DCE{X → .Z > z/|do.x /, do.x /} dz: Thus the sign of a DCE implies that the corresponding ACE has the same sign (note that P.Z > z/ is right continuous in z). For example, a positive DCE and a non-negative DCE imply a positive ACE and a non-negative ACE respectively. According to the definitions, a (strictly) consistent surrogate based on DCEs is also a (strictly) consistent surrogate based on ACEs. From example 1, however, it can be seen that the converse is not true. We define the conditional DCE of T on S given U = u as DCE{T → .S > s/|do.t /, do.t /, u} = P{S > s|do.t /, u} − P{S > s|do.t /, u}, and we also define DCE{T → .Y > y/|do.t /, do.t /, u} and DCE{S → .Y > y/|do.s /, do.s /, u} similarly. Then we have the following relationship between the conditional DCEs on the causal pathway from T to Y. Lemma 1. If T has a non-negative conditional DCE on S given U = u, and S has a nonnegative conditional DCE on Y given U = u, then T has a non-negative conditional DCE on Y given U = u. From lemma 1, we see that, when U is empty, a non-negative or non-positive DCE of T on 134 C. Ju and Z. Geng Table 2. A set of probabilities when U is empty S −1 0 1 P(S = s|T = t) P(Y = y|S = s) T=0 T=1 Y=0 Y=1 0 0.9 0.1 0.1 0 0.9 0.9 0.11 0.1 0.1 0.89 0.9 S and a non-negative DCE of S on Y imply respectively a non-negative or non-positive DCE of T on Y. But this is not true with ACEs. For example, for the probabilities that are given in Table 2, we have ACE{T → S|do.T = 1/, do.T = 0/} = 0:7 > 0, ACE{S → Y |do.S = 0/, do.S = −1/} = 0:79 > 0 and ACE{S → Y |do.S = 1/, do.S = 0/} = 0:01 > 0, but ACE{T → Y |do.T = 1/, do.T = 0/} = −0:071 < 0. 3. Models for which an intermediator is a consistent surrogate In this section, we show that strict distributional consistency is guaranteed by a large class of commonly used models including generalized linear models and proportional hazard models. 3.1. Generalized linear models and extensions Suppose that the true end point Y is from the exponential family which has densities of the form yθ − b.θ/ p.y; θ, φ/ = exp + c.y, φ/ , a.φ/ and also suppose that the intermediator S is from a (typically) different exponential family with different functions a.·/, b.·/ and c.·/. Further we make a commonly used assumption for generalized linear models that the dispersion parameter φ is constant over individuals, as described in section 2.2 of McCullagh and Nelder (1989). Let ai , bi , . . . denote parameters, and let ai .·/, bi .·/, . . . denote unknown functions. First we consider the following model: g{E.Y |S = s, U = u/} = a1 s + c1 .u/, h{E.S|T = t, U = u/} = a2 t + c2 .u/, where g.·/ and h.·/ are strictly monotone link functions (model A). The generalized linear model that was proposed by McCullagh and Nelder (1989) is a special case of model A. We have the following result for model A. Theorem 1. For model A, the intermediator S is a strict DCS for the true end point Y. Next we consider a more general model: g{E.Y |S = s, U = u/} = a1 .s/ b1 .u/ + c1 .u/, h{E.S|T = t, U = u/} = a2 .t/ b2 .u/ + c2 .u/, where g.·/ and h.·/ are strictly monotone link functions (model B). Criteria for Surrogate End Points 135 Model B can be seen as a generalization of model II in Chen et al. (2007): E.Y |U = u, S = s/ = h.u/s + g.u/ and E.S|U = u, T = t/ = q.t/ + r.u/. We show below that, for model B with some additional conditions, S is a DCS. Theorem 2. For model B, if a1 .s/ and a2 .t/ are monotone functions, and both b1 .u/ and b2 .u/ do not change their signs with changes in u, then S is a DCS for the true end point Y. 3.2. Proportional hazard models Proportional hazard models proposed by Cox (1972) are frequently used for survival data. We consider the following model to guarantee strict distributional consistency: λ.y|S = s, U = u/ = λ0 .y/ exp{a1 s + c1 .u/}, λÅ .s|T = t, U = u/ = λÅ0 .s/ exp{a2 t + c1 .u/}, where λ.·/ and λÅ .·/ denote hazard functions and λ .·/ and λÅ .·/ are baseline hazards (model 0 0 C). Theorem 3. For model C, S is a strict DCS for the true end point Y. Consider the Cox models λ.y|S = s, U = u/ = λ0 .y/ exp.a1 s + b1 u/ and λ .s|T = t, U = u/ = Clearly, this belongs to model C and so S is a strict DCS for Y under the Cox model. We consider the following generalization of model C: λ0 .s/ exp.a2 t + b2 u/. λ.y|S = s, U = u/ = λ0 .y/{a1 .s/ b1 .u/ + c1 .u/}, λÅ .s|T = t, U = u/ = λÅ .s/{a .t/ b .u/ + c .u/}, 0 2 2 2 where λ.·/ and λÅ .·/ denote hazard functions and λ0 .·/ and λ0Å .·/ are baseline hazards (model D). Theorem 4. For model D, if a1 .s/ and a2 .t/ are monotone functions, and both b1 .u/ and b2 .u/ do not change their signs with changes in u, then S is a DCS for the true end point Y. 3.3. Hybrid models Model A and model C (and also model B and model D) have similar structures. Our results can be extended to a hybrid model from models A–D, in which each equation may come from a different model. For example, model AC denotes that the first model equation for Y comes from model A, and the second model equation for S comes from model C. Similarly we can have other hybrid models, and we have the following results. Corollary 1. For models AC and CA, S is a strict DCS for Y. For models BD and DB, S is a DCS for Y if a1 .s/ and a2 .t/ are monotone functions and both b1 .u/ and b2 .u/ do not change their signs with changes in u. Below we give some examples to illustrate our results in this section and also give some extended models. 3.4. Example 2 Strict DCS holds for all common parametric models and their combinations in Table 3 where .X, Z/ are replaced by .Y , S/ and .S, T/. Also the strict DCS holds for linear regressions with non-normal errors. 136 C. Ju and Z. Geng Table 3. holds Common parametric models for which strict DCS Type of regression Linear Logistic Poisson Cox 4. Form E.X|Z = z, U = u/ = az + b.u/ logit{p.X = 1|Z = z, U = u/} = az + b.u/ log{E.X|Z = z, U = u/} = az + b.u/ λ.x|Z = z, U = u/ = λ0 .x/ exp{ax + b.u/} Consistent surrogates for general distributions In the previous section, we assumed that S and Y follow the exponential family distributions or some commonly used models. In this section, we only assume the causal diagram in Fig. 1 for T , S, Y and U , but we do not make any assumption about their distributions. We shall first present a general sufficient condition for a DCS and then show that it is almost necessary. First, we show the sufficient condition for a DCS in the following theorem. Following the graphical terminology in Pearl (2000), a backdoor path from S to Y is a path between S and Y with an arrow pointing at S. A subset U Å of U blocks a backdoor path p from S to Y if p contains a chain i → m → j or a fork i ← m → j such that m is in U Å , or if p contains a collider i → m ← j such that m is not in U Å and no descendant of m is in U Å . Intuitively, when U Å blocks all backdoor paths between S and Y , the confounding bias between S and Y is removed by conditioning on U Å . Theorem 5. Suppose that U Å is a subset of U that blocks all backdoor paths from S to Y. Then S is a DCS for the true end point Y for any distribution on U Å if (a) the DCE of S on Y conditional on U Å = uÅ has the same sign for all uÅ and (b) the DCE of T on S conditional on U Å = uÅ has the same sign for all uÅ . The confounder set U may contain many latent factors in a complex mechanism. U Å can be defined as the parent set paS or paY , and then all backdoor paths from S to Y will be blocked. Here we compare the conditions in theorem 5 with the conditions for consistent surrogates based on ACEs in theorem 1 of Chen et al. (2007): (a) the expectation of Y on S is monotonic conditionally on U (i.e. ACE{S → Y |do.s /, do.s /, u} 0 for all s > s and u, or ACE{S → Y |do.s /, do.s /, u} 0 for all s > s and u), and (b) T is a risk (or preventive) factor to S (i.e. P.S s|t , u/ − P.S s|t , u/ 0 for all s, u and t > t , or P.S s|t , u/ − P.S s|t , u/ 0 for all s, u and t > t ). Comparing the two conditions, we see that, first, our condition (a) requires monotonicity on distribution, whereas Chen’s condition (a) only requires monotonicity on expectation, and the former is stronger than the latter; second, our condition (b) is the same as Chen’s condition (b); third, we generalize the set U in the conditions to any subset U Å which blocks all backdoor paths from S to Y. Below we give an example to illustrate the generality. 4.1. Example 3 Consider example 1 again, where U = {U0 , U1 , U2 } denotes three unknown genes with binary values. Suppose that gene U1 affects both arrythmia S and gene U2 , gene U2 affects survival Y , and gene U0 affects both U1 and U2 , as shown in Fig. 2. Further assume that the causal Criteria for Surrogate End Points Fig. 2. 137 Causal diagram with three unknown genes Table 4. U2 P(Y = 0|U2 , S) P(Y = 1|U2 , S) P(Y = 2|U2 , S) S =0 S =1 S =0 S =1 S =0 S =1 y=0 y=1 0.1 0.2 0.2 0.1 0.3 0.2 0.1 0.2 0.6 0.6 0.7 0.7 −0:1 0.1 0.1 0.1 0 1 Table 5. U1 0 1 P .Y jU2 , S/ and DCE{S ! .Y > y/jdo.S D 1/, do.S D 0/, U2 } in example 3 DCE{S → (Y > y)|U2 } P .Y jU1 , S/ and DCE{S ! .Y > y/jdo.S D 1/, do.S D 0/, U1 } in example 3 P(Y = 0|U1 , S) P(Y = 1|U1 , S) P(Y = 2|U1 , S) S =0 S =1 S =0 S =1 S =0 S =1 y=0 y=1 0.19 0.18 0.11 0.12 0.21 0.22 0.19 0.18 0.6 0.6 0.7 0.7 0.08 0.06 0.1 0.1 DCE{S → (Y > y)|U1 } diagram has the probabilities given in Table 4, with P.U2 = 0|U1 = 0/ = 0:1, P.U2 = 0|U1 = 1/ = 0:2, P.S = 0|T = 0, U1 = 0/ = P.S = 0|T = 1, U1 = 1/ = 0:2, P.S = 0|T = 1, U1 = 0/ = 0:1 and P.S = 0|T = 0, U1 = 1/ = 0:3. From these probabilities, we have P.Y = y|U1 = u, S = s/ in Table 5. Clearly, the DCE of S on Y conditional on U1 = u1 has the same sign for all u1 as shown in Table 5, whereas the DCE of S on Y conditional on U2 = u2 (or equivalently conditional on U = u) does not have the same sign for all u2 as shown in Table 4. Nevertheless, according to theorem 5 with U Å = U1 , the intermediator S is a DCS for Y. This example illustrates that U Å may be an arbitrary subset of U as long as all backdoor paths from S to Y are blocked. Since we suppose that the confounder set U is not observed, the conditions in theorem 5 are untestable from data, and their validity must be argued by prior knowledge. Our condition (a) means that the surrogate S is a risk (or preventive) factor for the true end point Y , and our condition (b) means that the treatment T has the same sign of DCE on the surrogate S for all U Å = uÅ . For example, smoking T = 1 increases the amount S of tar deposited in a person’s lung at a distribution level regardless of a person’s background factors (i.e. condition (b) holds), and a larger amount of tar also increases the likelihood of lung cancer regardless of background variables (i.e. condition (a) holds). Thus, under the causal diagram in Fig. 1, the amount S of tar is a DCS for lung cancer Y. 138 C. Ju and Z. Geng Next, we show that the sufficient condition that is given in theorem 5 is also almost necessary. In the following theorem, we consider the causal diagram in Fig. 1 again, and we consider the probabilities P.S s|U Å = uÅ , T = t/ and P.Y y|U Å = uÅ , S = s/ as continuous random variables for any given s, uÅ , t and y, where U Å is a subset of U. We say that the probability P.Y y|U Å = uÅ , S = s/ is continuous with respect to the continuous variables in U Å if, as a function of U Å = uÅ , P.Y y|U Å = uÅ , S = s/ is continuous in these variables. This means that the conditional probability does not change dramatically for a small change in any continuous variable in U Å . For example, if age is a continuous variable in U Å , then a very small change of age may not lead to a dramatic difference in the risk of cancer. This does not require the probability to be continuous with respect to discrete variables in U Å . Theorem 6. Suppose that both P.S s|U Å = uÅ , T = t/ and P.Y y|U Å = uÅ , S = s/ are continuous with respect to the continuous variables in U Å where U Å is a subset of U. If S is a DCS for the true end point Y for any distribution on U Å , then conditions (a) and (b) in theorem 5 hold with probability 1. From theorem 6, it can be seen that the sufficient condition for a DCS that is given in theorem 5 is also almost necessary. 5. Discussion The reliability of surrogate end points is especially important in biomedical studies. Misuse of surrogates can lead to serious consequences, like the drug disaster that was described in Moore (1995). In this paper, we showed that the ACE consistency proposed by Chen et al. (2007) is insufficient to avoid these problems. To avoid surrogate paradoxes completely, we introduced DCS end points, which are more reliable than ACE-based consistent surrogate end points. For a DCS end point S, the traditional methods for data analysis can be used for statistical inference on the DCE of T on S, and then the result can be used to explain the sign of the DCE of T on Y without observing the true end point Y. We have given some conditions for validating DCS end points. Since we suppose that neither the true end point Y nor the confounder set U is observed, these conditions for DCS end points are not directly testable from data. But they are satisfied under many commonly used models and can serve as a guideline for the evaluation of surrogate end points in practice. From the results in Section 3, the intermediator S is always a DCS if we know or assume that the commonly used generalized linear models, proportional hazard models or some of their extensions hold among the variables. In particular, under structural equation models, the intermediator S is always a strict DCS end point for the true end point Y. As shown in Section 4, when there are no parametric model assumptions, we must assume that T and S are preventive (or risk) factors to S and Y respectively, conditionally on the unobserved confounder U or a subset U Å of U which blocks all backdoor paths from S to Y. Now we discuss how to assess the validity of these assumptions. First we need to determine U Å . If we have the complete diagram (e.g. the diagram in Fig. 2), then U Å can be a set which blocks all backdoor paths from S to Y ; otherwise, by prior knowledge, we may choose a subset U Å of U which contains the parent set paS or paY . Next we need to judge conditions (a) and (b) in theorem 5 conditionally on the selected set U Å . Essentially, these conditions require that T and S have monotonic causal effects on S and Y at the distributional level respectively, and they are weaker than the monotonicity assumption at the individual level that was used by Imbens and Angrist (1994) for identifying causal effects. For example, for any subpopulation defined by U Å = uÅ (e.g. age or gender), smoking T always increases the probability of having a higher Criteria for Surrogate End Points 139 amount of tar S in the lung, and the amount of tar S always increases the probability for lung cancer Y , although smoking may decrease tar or tar may prevent cancer for some particular individuals. We could also check these conditions by observing more variables. From the causal diagram in Fig. 1, we can see that U is a confounder set between S and Y. If the confounder set U or the parent set paS of S can be determined by prior knowledge and is observed, then we can check the model for S and condition (b) in theorem 5 from data; if Y and paY can be observed, then we can check the model for Y and condition (a) in theorem 5 from data. If the models in Section 3 or the conditions in Section 4 hold, then S is a DCS end point. Otherwise, we should evaluate the surrogate end point within each subpopulation stratified by the variables that led to effect reversal. For example, in example 1, it is heart injury that led to effect reversal, and thus we should evaluate the surrogate end point for survival time separately within the two subpopulations stratified by heart injury. As a referee commented on theorem 5, the subset U Å which blocks all backdoor paths from S to Y in theorem 5 can be replaced by a vector-valued function U Å = f.U/ that makes Y⊥ ⊥ U|.S, U Å / hold. With such a U Å , we have from T ⊥ ⊥ Y |.S, U, U Å / that T ⊥ ⊥ Y |.S, U Å /, and we Å Å can also verify that P{Y y|do.s/, u } = P.Y y|s, u /. Similarly to the proof of theorem 5, we can show that the conclusion of theorem 5 is also true for such a U Å . As an application, we consider the case where Y and S are both binary. In this case, we can take U Å to be {P.Y = 1|S = 0, U/, P.Y = 1|S = 1, U/}, and then we have Y ⊥ ⊥ U|.S, U Å / because the distribuÅ tion of Y conditional on .S, U/ is fully determined by U and S. This provides a useful way of finding a specific U Å . If U Å is restricted to a subset of U , then the conditional independence Y⊥ ⊥ U|.S, U Å / generally implies paY ⊆ U Å , which in turn implies the supposition that is required in theorem 5 that U Å blocks all backdoor paths from S to Y. The potential application of our results to situations where there is an additional ‘direct effect’ between T and Y needs to be investigated further in our future work. More generally, if we consider a surrogate as an intermediate variable, then the results in this paper may also be useful for instrumental variable methods, effect modification and validation criteria for biomarkers. Acknowledgements We thank the Joint Editor, the Associate Editor and two referees for their valuable comments and suggestions that greatly improved the previous version of this paper, and we greatly appreciate a referee’s contribution to our proofs. This research was supported by the Natural Science Foundation of China (grants 10771007, 10721403 and 10931002) and Ministry of Education– Microsoft Key Laboratory of Statistics and Information Technology of Peking University. Appendix A: Proofs of theorems First we give two lemmas to be used in the proofs of the theorems. Lemma 2. Suppose that X is stochastically larger than Y (i.e. P.X > c/ P.Y > c/ for all c). Let f be a non-decreasing function. Then E{f.X/} E{f.Y/}. Further, if X is strictly stochastically larger than Y (i.e. P.X > c/ > P.Y > c/ for some c) and f is increasing, then E{f.X/} > E{f.Y/}. Proof. This result can be obtained directly from the equality 0 ∞ P.X > x/ dx − {1 − P.X > x/} dx E.X/ = 0 −∞ (Chung (1974), page 49) and the right continuity of cumulative distribution functions. 140 C. Ju and Z. Geng Lemma 3. Suppose that X1 and X2 are from the exponential family with the density function in Section 3.1 with a fixed φ. Let p = inf x∈R {p.x; θ, φ/ > 0} and q = supx∈R {p.x; θ, φ/ > 0}. If E.X1 / > E.X2 /, then X1 is stochastically larger than X2 , and further we have P.X1 > c/ > P.X2 > c/ for all c ∈ .p, q/. Proof. Without loss of generality assume that a.φ/ > 0. With a fixed φ, the exponential family distributions exhibit a monotone likelihood ratio in θ. From lemma 3.4.2 (ii) of Lehmann and Romano (2005), page 70, for any θ > θ , we have Pθ .X > x/ Pθ .X > x/ for all x. In addition, from theorem 3.4.1 (ii) of Lehmann and Romano (2005), page 66, we have that, for c ∈ .p, q/, Pθ .X > c/ is strictly increasing in θ. Thus, from lemma 2, E.X/ is also strictly increasing in θ. If E.X1 / > E.X2 /, we obtain θX1 > θX2 , and hence the result follows. A.1. Proof of lemma 1 Under the causal diagram in Fig. 1, we have P{Y > y|do.t/, u} = E{f.S, u/|do.t/, u} where f.s, u/ = P{Y > y|do.s/, u}. By the conditions of lemma 1, S is stochastically non-decreasing in t given u, and f.s, u/ is a non-decreasing function of s. From lemma 2, T has a non-negative DCE on Y given u. A.2. Proof of theorem 1 We first prove condition (a) in definition 3. Without loss of generality, assume that g.·/ and h.·/ are increasing functions. Then a positive DCE of T on S implies that a2 > 0 in model A, and thus by lemma 3 the causal distribution of S given do.T = t/ and U = u is stochastically increasing in t for each u. Similarly, a positive DCE of S on Y implies that a1 > 0 and from lemma 3 f.s, u/ = P{Y > y|do.s/, u} is an increasing function of s, where p < y < q (p and q are as defined in lemma 3). From lemma 2 and P{Y > y|do.t/, u} = E{f.S, u/|do.t/, u}, we have that P{Y > y|do.t/, u} is increasing in t for each u. From P{Y > y|do.t/} = E[P{Y > y|do.t/, U}], we obtain that T has a positive DCE on Y. In the same way, we can prove conditions (b) and (c) in definition 3, and thus S is a strict DCS for Y. A.3. Proof of theorem 2 We prove condition (a) in definition 2. Without loss of generality, assume that g.·/ and h.·/ are increasing functions. By the conditions of theorem 2 and lemma 3, a non-negative DCE of T on S implies that a2 .t/ b2 .u/ is non-decreasing in t for each u, and thus T has a non-negative conditional DCE on S for each U = u. Similarly, a non-negative DCE of S on Y implies that S has a non-negative conditional DCE on Y for each U = u. From lemma 1, T has a non-negative conditional DCE on Y for each U = u. By P{Y > y|do.t/} = E{P.Y > y|do.t/, U}, we obtain that T has a non-negative DCE on Y. In the same way, we can prove conditions (b) and (c) in definition 2, and thus S is a DCS for Y. A.4. Proof of theorem 3 Let G.y/ = P.Y > y/. From the definition of λ.y/ in Cox (1972), G .y/ P.Y y + Δy|Y > y/ =− , λ.y/ = lim Δy→0+ Δy G.y/ y y we have G.y/ = exp{− 0 λ.u/ du}. Hence, G.y|s, u/ = G0 .y/a1 s+c1 .u/ where G0 .y/ = exp{− 0 λ0 .u/ du}. Thus a positive DCE of S on Y implies that a1 < 0. Similarly, a positive DCE of T on S implies that a2 < 0. The rest of the proof is the same as the last part of the proof for theorem 1. Criteria for Surrogate End Points 141 A.5. Proof of theorem 4 Using the equations G.y|s, u/ = G0 .y/a1 .s/b1 .u/+c1 .u/ and GÅ .s|t, u/ = G0Å .s/a2 .t/b2 .u/+c2 .u/ , we can prove the result in the same way as for theorem 2. A.6. Proof of corollary 1 For model AC and model CA, the strict consistency can be proved in the same way as for theorem 1; for model BD and model DB, the distributional consistency can be proved in the same way as for theorem 2. A.7. Proof of theorem 5 Since U Å blocks all backdoor paths from S to Y , we have P{Y > y|do.s/, uÅ } = P.Y > y|s, uÅ / from theorem 3.4.1 of Pearl (2000). Similarly we also have P{S > s|do.t/, uÅ } = P.S > s|t, uÅ /. For the diagram in Fig. 1, all paths from T to Y must pass a parent node of S except for the path T → S → Y . Since U Å blocks all backdoor paths from S to Y , we know that the paths from T to Y are blocked either by S or by U Å . Thus we have the conditional independence T ⊥ ⊥ Y |.S, U Å /, and so we have P{Y > y|do.t/, uÅ } = E{f.S, uÅ /|do.t/, uÅ } where f.s, uÅ / = P{Y > y|do.s/, uÅ } = P.Y > y|s, uÅ /: Then we can obtain the theorem from lemma 2. A.8. Proof of theorem 6 Let DCE{T → .S > s0 /|do.t /, do.t /, uÅ } = a.uÅ /, DCE{S → .Y > y0 /|do.s /, do.s /, uÅ } = b.uÅ / and DCE{T → .Y > y0 /|do.t /, do.t /, uÅ } = c.uÅ / for some s0 , y0 and s > s . When U Å is a binary variable taking values in {0, 1}, since DCE{T → S|do.t /, do.t /} DCE{S → Y |do.s /, do.s /} DCE{T → Y |do.t /, do.t /} 0, for any distribution on U Å , we have {p a.0/ + .1 − p/ a.1/} {p b.0/ + .1 − p/ b.1/} {p c.0/ + .1 − p/ c.1/} 0, where p = P.U Å = 0/. Take ∀p ∈ [0, 1], f.t/ = {a.0/t + a.1/} {b.0/t + b.1/} {c.0/t + c.1/}, and then f.t/ 0, ∀t ∈ [0, ∞/. Because S is a DCS for U Å = 0, we know that a.0/ b.0/ c.0/ 0; thus the leading coefficient of f.t/ is non-negative. Since a.uÅ / and b.uÅ / are continuous random variables in [−1, 1], the three roots of f.t/ are mutually distinct with probability 1. Thus a.0/ a.1/ 0 and b.0/ b.1/ 0 almost surely, or otherwise f.t/ < 0 on some interval in [0, ∞/. When U Å is a discrete random variable taking values in {u1 , u2 , . . . un , . . . }, we take two arbitrary elements ui and uj from the set. With the arguments above, we obtain a.ui / a.uj / 0 and b.ui / b.uj / 0 with probability 1. Since these pairs of ui and uj are enumerable and a countable union of null events is still a null event, we know that a.ui / a.uj / 0 and b.ui / b.uj / 0 for all ui and uj with probability 1. When U Å is a continuous random variable taking values in an interval I ⊆ R, because I ∩ Q is an enumerable set, we have that, with probability 1, a.ui / a.uj / 0 and b.ui / b.uj / 0 for all ui and uj in I ∩ Q. Since I ∩ Q is dense in I , by using continuity of a.u/ and b.u/ we know that a.u/ a.u / 0 and b.u/ b.u / 0 for all u and u in I with probability 1. 142 C. Ju and Z. Geng When U Å is a random vector, similarly by choosing two arbitrary points from an enumerable dense set and then using continuity, we know that a.u/ a.u / 0 and b.u/ b.u / 0 for all u and u with probability 1. For fixed uÅ , t and t , DCE{T → .S > s/|do.t /, do.t /, uÅ } has the same sign for all s; otherwise, for the subpopulation of U Å = uÅ , S will not be a DCS. We have just proved that a.uÅ / has the same sign for all uÅ almost surely; thus the DCE of T on S conditional on U Å = uÅ has the same sign for all uÅ with probability 1. Similarly the DCE of S on Y conditional on U Å = uÅ has the same sign for all uÅ with probability 1. References Alonso, A. and Molenberghs, G. (2008) Surrogate end points: hopes and perils. Exprt Rev. Pharmecon. Outcms Res., 8, 255–259. Baker, S. G. (2006) Surrogate endpoints: wishful thinking or reality? J. Natn. Cancer Inst., 98, 502–503. Baker, S. G. and Kramer, B. S. (2003) A perfect correlate does not a surrogate make. BMC Med. Res. Methodol., 3, article 16. Chen, H., Geng, Z. and Jia, J. (2007) Criteria for surrogate end points. J. R. Statist. Soc. B, 69, 919–932. Chung, K. L. (1974) A Course in Probability Theory, 2nd edn. New York: Academic Press. Cox, D. R. (1972) Regression models and life-tables (with discussion). J. R. Statist. Soc. B, 34, 187–220. Fleming, T. R. and DeMets, D. L. (1996) Surrogate end points in clinical trials: are we being misled? Ann. Intern. Med., 125, 605–613. Frangakis, C. E. and Rubin, D. B. (2002) Principal stratification in causal inference. Biometrics, 58, 21–29. Imbens, G. W. and Angrist, J. (1994) Identification and estimation of local average treatment effects. Econometrica, 62, 467–475. Lauritzen, S. L. (2004) Discussion on causality. Scand. J. Statist., 31, 189–192. Lehmann, E. L. and Romano, J. P. (2005) Testing Statistical Hypotheses, 3rd edn. New York: Springer. Manns, B., Owen, W. F., Winkelmayer, W. C., Devereaux, P. J. and Tonelli, M. (2006) Surrogate markers in clinical studies: problems solved or created? Am. J. Kidney Dis., 48, 159–166. McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. London: Chapman and Hall. Moore, T. (1995) Deadly Medicine: Why Tens of Thousands of Patients Died in America’s Worst Drug Disaster. New York: Simon and Schuster. Pearl, J. (2000) Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press. Prentice, R. L. (1989) Surrogate endpoints in clinical trials: definition and operational criteria. Statist. Med., 8, 431–440. Rubin, D. B. (2004) Direct and indirect causal effects via potential outcomes. Scand. J. Statist., 31, 161–170.