Misspecification of Cox regression models with composite endpoints

Researchers routinely adopt composite endpoints in multicenter randomized trials designed to evaluate the effect of experimental interventions in cardiovascular disease, diabetes, and cancer. Despite their widespread use, relatively little attention has been paid to the statistical properties of estimators of treatment effect based on composite endpoints. We consider this here in the context of multivariate models for time to event data in which copula functions link marginal distributions with a proportional hazards structure. We then examine the asymptotic and empirical properties of the estimator of treatment effect arising from a Cox regression model for the time to the first event. We point out that even when the treatment effect is the same for the component events, the limiting value of the estimator based on the composite endpoint is usually inconsistent for this common value. We find that in this context the limiting value is determined by the degree of association between the events, the stochastic ordering of events, and the censoring distribution. Within the framework adopted, marginal methods for the analysis of multivariate failure time data yield consistent estimators of treatment effect and are therefore preferred. We illustrate the methods by application to a recent asthma study. Copyright © 2012 John Wiley & Sons, Ltd.


Introduction
Many diseases put individuals at elevated risk for a multitude of adverse clinical events, and researchers routinely design randomized clinical trials to evaluate the effectiveness of experimental interventions for the prevention of these events. Trials in cardiology, for example, record times of events such as non-fatal myocardial infaction, non-fatal cardiac arrest, and cardiovascular death [1]. In cerebrovascular disease, patients with carotid stenosis can be treated with medical therapy or surgery, and trials evaluating their relative effectiveness may record endpoints such as strokes ipsilateral to the surgical site, contralateral strokes, and death [2]. In oncology, researchers often design trials to study treatment effects on disease progression and death [3], but palliative trials of patients with skeletal metastases may be directed at preventing skeletal complications including vertebral and non-vertebral fractures, bone pain, and the need for surgery to repair bone [4]. In these and many other settings, although interest lies in preventing each of the respective events, it is generally infeasible to conduct studies to answer questions about each component.
When one type of event is of greater clinical importance than others, it can be chosen as the basis of the primary treatment comparison, and effects on other types of events can then be assessed through secondary analyses. When two or more events are of comparable importance, co-primary endpoints can be specified, but tests of hypotheses must typically control the experimental type I error rate through multiple comparison procedures [5][6][7]; these make decision analyses more complex. A seemingly simple alternative strategy is to adopt a so-called composite event [8,9] that is said to have occurred if any one of a set of component events occurs. The time of the composite event is therefore the minimum of the times of all component events.
There are several additional reasons investigators may consider the use of composite endpoints in clinical trials. In studies involving a time-to-event analysis, the use of a composite endpoint will mean that more events will be observed than would be observed for any particular component. If the same clinically important effect is specified for the composite endpoint and one of its components, this increased event rate will translate into greater power for tests of treatment effects; at the design stage this translates to a reduction in the required number of subjects or duration of follow-up [9][10][11]. Composite endpoints are routinely adopted through the introduction of one or more less serious events, however, which presumably warrants revising the clinically important effect of interest. Moreover, we show later that with models featuring a high degree of structure, model assumptions may not even be compatible for the composite endpoint and one of its components.
In time-to-event analyses, interest may lie in the effect of an experimental treatment versus standard care on the risk of a non-fatal event. This is a common framework in trials of patients with advanced diseases where interest lies in improving quality of life through the prevention of complications. In such settings, individuals are at considerable risk of death and a competing risks problem arises. Investigators often deal with this by adopting a composite endpoint based on the time to the minimum of the non-fatal event of interest and death [12,13]. This strategy leads to an 'event-free survival' analysis that is particularly common in cancer where progression-free survival is routinely adopted as a primary endpoint [14]. In palliative trials, however, a treatment may not be expected to have an effect of survival, and if a non-negligible proportion of individuals die before experiencing the clinical event of interest, this analysis can lead to a serious underestimation of the effect of the treatment [10,15].
Recommendations are available in the literature on how to design trials, analyze resultant data, and report findings when composite endpoints are to be used [10][11][12]16]. The main recommendations include that (i) individual components should have similar frequency of occurrence, (ii) the treatment should have a similar effect on all components, (iii) individual components should have similar importance to patients, (iv) data from all components should be collected until the end of trial, and (v) individual components should be analyzed and reported separately as secondary endpoints. The first three recommendations have face validity and seem geared towards helping ensure that conclusions regarding treatment effects on the composite endpoint have some relation to treatment effects on the component endpoints, thus helping in the interpretation of results. The collection of data on the occurrence of the component endpoints until the end of the trial facilitates separate assessment of treatment effects on each of the component endpoints. This means the consistency of findings across components can be empirically assessed.
The aforementioned issues have been actively debated in the medical literature [11,[16][17][18][19], but there has been relatively little formal statistical investigation of these points. In this paper, we discuss statistical considerations related to composite endpoint analyses and use the recommendations to guide the investigation. Because the Cox regression model is routinely adopted for the analysis of composite endpoints in clinical trials [12], we consider it here and point out important issues regarding model specification and interpretation. We formulate multivariate failure time models with proportional hazards for the marginal distributions that may be used to reflect the settings where composite endpoints are most reasonable according to the current guidelines. We study the asymptotic and empirical properties of estimators arising from a composite endpoint analysis. We also explore the utility of marginal methods based on multivariate failure time data [20]. We argue that the belief that composite endpoints provide an overall measure of the effect of treatment is overly simplistic, and a thoughtful interpretation of intervention effects based on composite endpoints alone is difficult. Their use as a primary basis for treatment comparison in clinical trials therefore warrants careful consideration. The remainder of this paper is organized as follows. In Section 2, we construct bivariate failure time distributions for which the marginal distributions have proportional hazards between two treatment groups. We then derive the distribution for the time to the first event and show that it does not typically feature proportional hazards across the two treatment groups. We use large sample theory for misspecified models to derive the limiting value of the log hazard ratio from a naive Cox model, and empirical studies demonstrate finite sample properties which are in close alignment with the theory. An alternative approach to synthesizing data over component events is to conduct a global analysis on the basis of the marginal methods of Wei et al. [20]; we explore this in Section 3. An application to a recently completed asthma management study illustrates the various methods in Section 4, and we make the concluding remarks in Section 5.
where we write Â to make the relation between Â and explicit. For Archimedean copulas, this can be written as Copula functions have received considerable attention in the statistical literature in the past few years because they offer a convenient and attractive way of linking two marginal distributions to create a joint survival function [23]. Suppose T 1 and T 2 are a pair of non-negative random variables with respective survivor functions F 1 .t 1 j´I˛1/ and F 2 .t 2 j´I˛2/ given a covariate´. If we let U 1 D F 1 .T 1 j´I˛1/ and U 2 D F 2 .T 2 j´I˛2/ where˛k indexes the marginal distribution for T k j´, then U k UNIF.0; 1/, k D 1; 2. We can define the bivariate 'survival' distribution function of .U 1 ; U 2 / through a copula as in (1) and obtain a joint survivor function for .T 1 ; T 2 / 0 given Z as where D .˛0; Â/ 0 with˛D .˛0 1 ;˛0 2 / 0 . Because Kendall's is invariant to monotonic increasing or decreasing transformations [21], it can be interpreted as a measure of association of the transformed variables .T 1 ; T 2 / 0 given Z. The use of a copula function to define the joint distribution of .T 1 ; T 2 /j´is particularly appealing because one can specify the marginal distributions to have a proportional hazards form; this is not typically possible for joint distributions induced by random effects or intensity-based analyses.
If a composite endpoint analysis is planned, it would be based on modeling the random variable T D min.T 1 ; T 2 /, which has survival, density, and hazard function conditional on Z, given by (3) f .tj´/ D dF.t j´I /=dt , and .t j´I / D d log F.t jxI /=dt , respectively. Suppose Z is a binary indicator where Z D 1 for individuals in a treatment group and Z D 0 otherwise. A key point is that the hazard ratio .t j´D 1I /= h.t j´D 0I / is not, in general, independent of time. As a result, even if the marginal distributions feature proportional hazards, the model for the composite endpoint will typically not. We study this point further in the next four settings for three different Archimedean copulas and the case of independent components.
Consider the joint distribution of .T 1 ; T 2 /jZ in which the marginal distribution for T k jZ, 1; 2 features proportional hazards; so If the joint survivor function F 12 .t 1 ; t 2 j´I / is determined by the Clayton copula through (2), by (3) the survivor function of the failure time T D min.T 1 ; T 2 / given´is Hence, the hazard ratio for the treatment versus control groups for the composite endpoint is which is not invariant with respect to time in general.
To gain some insight into this function, suppose the marginal distributions are exponential with common baseline hazards of 10 .t / D 20 .t / D D log 10 so that the probability of a type k event occurring before t D 1 is 0.90 for a control subject (i.e., P .T k < 1jZ D 0/ D 0:90). Further suppose that a common hazard ratio of 0.50 holds for the two margins (i.e., exp.ˇ1/ D exp.ˇ2/ D 0:50). This setting is consistent with the recommendations that the component events occur with comparable frequency because P .T 1 < T 2 jZ/ D 0:5, and have comparable treatment effects (ˇ1 Dˇ2). Figure 1 are the same for the two component endpoints, there can be non-negligible variation in the hazard ratio over time, and within this family of models, the nature of this variation depends on the strength of the association between the two failure times.

Composite endpoint analysis based on a Frank copula.
The generator for the Frank copula [25] is H.uI Â/ D log..exp. Â t/ 1/=.exp.Â/ 1//, and the resulting copula function is If we adopt the same marginal distributions as before, the survivor function for the composite endpoint is , the hazard ratio .t j´D 1I /= .t j´D 0I / has a complicated form. Figure 1(b) contains a plot of this hazard ratio over OE0; 1, and as in the case of the Clayton copula, there is considerable variation in this ratio over time.

Composite endpoint analysis based on a Gumbel-Hougaard copula.
The generator for the Gumbel-Hougaard [26] copula is H.uI Â/ D . log t/ Â giving for Â > 1; Kendall's is given by Â D .Â 1/=Â. The corresponding survivor function for the composite endpoint is and ifˇ1 Dˇ2 Dˇ, the hazard is Interestingly, the hazard ratio in this case is exp.ˇ/, which means that the proportional hazards model for the composite endpoint is compatible with a proportional hazards model for the margins. If the hazard ratio is in fact common for the component endpoints, then a consistent estimator will be obtained for this common effect on the basis of a Cox model for the composite endpoint.

Composite endpoint analysis with independent components.
Here, we consider the setting where the component failure times are independent; a special case of Â D 0 for the joint models in Sections 2.1.1-2.1.3. In this case, the hazard ratio for the composite endpoint analysis reduces to With nonhomogeneous hazards, it is apparent that the composite endpoint analysis is only compatible with a proportional hazards assumption if either (A.1)ˇ1 Dˇ2 Dˇor (A.2) 10 .t / D 20 .t /. If 1 Dˇ2 Dˇ, then a consistent estimate of this common effect is obtained in a composite endpoint analysis. Ifˇ1 ¤ˇ2 but the hazard functions are identical, the multiplicative effect is .exp.ˇ1/Cexp.ˇ2//=2. If assumptions A.1 and A.2 do not hold, then the ratio is a complicated time varying function of the baseline hazards and respective treatment effects.

Misspecification of the Cox model with composite endpoints
The previous section demonstrated that the composite endpoint analysis is typically based on a misspecified Cox regression model if the marginal distributions satisfy the proportional hazards assumption. In this section, we investigate the frequency properties of estimators from a composite endpoint analysis when the component endpoints are associated through a copula function.
Let T i D min.T i1 ; T i2 / denote the time of the composite endpoint for individual i in a sample of size m. Let fN i .s/; s < 0g denote the counting process for subject i, which indicates the occurrence of the composite endpoint, so that dN i .s/ D 1 if T i D s and is zero otherwise. Suppose that it is planned to follow all subjects over the interval .0; C but that subjects may be lost to follow-up or withdraw from the study prematurely. Let   [27] to estimate the relative hazard where we assume the hazard function for T i j´i to have the form where 0 .t / is a non-negative baseline hazard function corresponding to the control group and´i is the treatment covariate for individual i, i D 1; : : : ; m. The treatment effect˛can be estimated using the maximum partial likelihood [28] by solving (7) is correctly specified, then (8) has expectation zero and the solution b is consistent for the true value,˛. In the independence case, this true value isˇif the treatment effect is common (i.e., under A.1ˇDˇ1 Dˇ2) or D log.exp.ˇ1/ C exp.ˇ2//=2 if the baseline hazard functions are the same (i.e, under A.2). More generally, however, b is consistent for˛ , the solution to expected score function U .˛/ D E.U.˛// given by where the expectation E is with respect to the true model for˚ N Y .s/; d N N .s/ ; 0 < s; Z « [29][30][31]. By using the true model based on (3) and assuming independent censoring for the withdrawal time W i with survival distribution G.wj´/ D G.w/, these expectations can be obtained as follows: To illustrate the bias resulting from a composite endpoint analysis, consider a randomized clinical trial in which subjects are to be followed over the interval .0; C where C D 1. Let Z D 1 for treated subjects and Z D 0 for control subjects and suppose P .Z D 1/ D 1 P .Z D 0/ D 0:5. We setˇ1 Dˇ2 DˇD log 0:80 to consider the case compatible with the current recommendations on the use of composite endpoints. We set 1 and 2 so that (i) P .T 1 < T 2 jZ D 0/ D p 1 equals a desired probability that the type 1 event occurs before the type 2 event among control subjects and that (ii) P .C < T / D A satisfies the administrative censoring rate for the composite endpoint among all subjects, where A D 0:20. Finally, suppose subjects may withdraw from the study early, and let W have an exponential distribution with rate such that P .C < T / D , where P .C < T / D E Z OEP.W < T < C jZ/ C P .C < T jZ/ and is the overall censoring rate set to D 0:20, 0.40, 0.60, and 0.80. Figure 2 shows the limiting percent relative bias (100.˛ ˇ/=ˇ) of the treatment coefficient from a composite endpoint analysis when the data are generated by a Clayton copula with mild ( D 0:20) and moderate ( D 0:40) association. We plotted this relative bias against P .T 1 < T 2 jZ D 0/ D p 1 , and interestingly, the bias is greatest when p 1 D 0:50 but decreases as this probability approaches zero or one. In either of the extreme cases (p 1 D 0 or p 1 D 1), the composite endpoint coincides with the occurrence of a single endpoint, and a consistent estimate of the common treatment effect is obtained. Note that the bias .˛ ˇ/ is positive, and hence, the limiting value of the treatment effect is more conservative than the true common value for each of the components. This means that the estimated value would, on average, under-represent the magnitude of the treatment effect on either component, a conclusion in line with the findings of [10,15]. Moreover, we note that the common event rate and the common treatment effect are precisely the setting where composite endpoints are recommended for use [10][11][12]16]. The plots also reveal the sensitivity of the limiting value to the degree of random censoring; the higher the censoring rate, the smaller the asymptotic bias. This highlights an important point that the limiting value of an estimator from a misspecified failure time model is highly sensitive to the censoring distribution even under independent censoring. By comparing the left and right panels in Figure 2 is also apparent that the asymptotic bias is dependent on the degree of association between T 1 and T 2 ; the greater the association, the greater the asymptotic bias. This makes sense because when the event times are independent, consistent estimates should be obtained because assumptions A.1 and A.2 of Section 2.1.4 are satisfied.
Although of secondary interest, one can also show that b 0 .t /, 0 < t < C , is consistent for which when P .Z D 1/ D 0:5 and the censoring distribution is the same in the two groups reduces to  (2) to examine the empirical performance of estimators for finite samples. We assume that given Z, T k has an exponential distribution with hazard k exp.ˇkZ/, k D 1; 2, and model the association between T 1 and T 2 through a Clayton copula. We let T D min.T 1 ; T 2 / denote the time of the composite endpoint as before. We suppose interest lies in following subjects over .0; 1. As in the previous section, we determined the parameters 1 and 2 to satisfy the constraints P .T 1 < T 2 jZ D 0/ D p 1 , where p 1 D 0:25, P .C < T / D A , and we set the administrative censoring rate to A D 0:20. We also incorporated random loss to follow-up with an exponential withdrawal time giving a net censoring rate of D 0:20, 0.40, 0.60, and 0.80 subject to the constraint A 6 .
For each parameter configuration, we derived the sample size for the composite endpoint analysis to achieve a prespecified power under the assumption that the Cox model in (7) holds. Therneau and Grambsch (2000) show that the required number of events is D D 4.´1 1 C´1 2 / 2 =.˛ / 2 , where´q is the qth quantile of the standard Normal distribution, 1 is the type I error for a one-sided test, 1 2 is the power, and˛ is the limiting value of treatment effect estimate obtained from (7). We focus on twosided tests at the 5% significance level ( 1 D 0:05) and sample sizes to achieve 80% power ( 2 D 0:20). We calculated the required number of subjects as m D D=P .T < C /. In all simulation studies, we considered both equal treatment effects (ˇ1 Dˇ2 DˇD :223) and unequal treatment effects (ˇ1 D :223 andˇ2 D 0). For each parameter configuration, we generated 2000 replicates. We report the mean of the b estimates, the empirical standard error (ESE), the average model-based standard error (ASE 1 ), and the average robust standard error (ASE 2 ). We also reported the empirical coverage probability (ECP %) of nominal 95% CIs for˛ based on robust standard errors and the empirical coverage probability of these intervals forˇ1 (ECP%). The last column contains the empirical power (EP%) of a Wald test of the null hypothesis of no treatment effect. Table I contains the simulation results with dependent component times given by D 0:40. The results for equal treatment effects are given in the top half of the table that we comment on first. The fourth column contains˛ , the limiting value of the estimator from the misspecified Cox model in (7). The fact that these values are all smaller in absolute value than the true common effects reveals the conservative nature of this limiting value, as already discussed in relation to Figure 2; the dependence of the limiting value on the degree of censoring is also apparent. This limiting value was used to derive the sample size (m) in the third column. The average estimator from the fitted Cox models reported in the fifth column closely approximates the limiting value. There is also close agreement between the empirical, average model-based, and average robust standard errors. The empirical coverage probabilities of the robust 95% CIs are very close to the nominal levels, and the empirical power is in good agreement with the nominal power of 80%. It is worth emphasizing that the empirical coverage probability is computed for the parameter˛ , not the commonˇ; for this latter parameter, the coverage rates are considerably lower.

Composite endpoints with dependent components.
In the bottom half of Table I, we reported the results for the caseˇ1 ¤ˇ2, where˛ is considerably smaller thanˇ1. This smaller limiting value leads to considerably larger sample sizes to achieve A D P .C < T / is the administrative censoring rate, D P .C < T / is the net censoring rate, ESE is the empirical standard error, ASE 1 is the average model-based standard error, ASE 2 is the average robust standard error, ECP % is the empirical coverage probability for˛ of nominal 95% CIs using the robust standard error, ECP% is the empirical coverage probability forˇ1 of nominal 95% CIs using the robust standard error, and EP% is the empirical power of a Wald test of H 0 W˛D 0 based on the robust standard error. the desired power. Again, however, we see close agreement between the average estimate and the limiting value, and very close agreement between the average model-based and average robust standard errors. The empirical coverage probability (for˛ ) is also consistent with the nominal level, as is the empirical power. Table II presents the simulation results with independent components (i.e., D 0). The results in the top half of Table II reveal that the limiting valuę is the same as the common valueˇDˇ1 Dˇ2 as expected because assumption A.1 of Section 2 is satisfied. Again, the average point estimate is in close agreement with this common value, and the three standard errors are in close agreement. When the treatment has an effect on T 1 and not T 2 ,˛ is again considerably smaller thanˇ1. Note, however, even though this is a misspecified model, the limiting value does not depend on the censoring distribution. This much smaller value leads to larger sample size requirements than in the top half of the table. Because the first component T 1 happens less frequently than the second component T 2 (i.e., P .T 1 < T 2 jZ D 0/ D 0:25), the limiting value from the misspecified Cox model is heavily attenuated in this setting. However, neither administrative nor random censoring appears to affect the limiting value of the estimator of treatment effect.

Limiting values for a Wei-Lin-Weissfeld analysis
In this section, we investigate the utility of the marginal approach of Wei et al. [20] for handling multivariate failure time data. This approach is based on formulating ordinary Cox models for each component  A D P .C < T / is the administrative censoring rate, D P .C < T / is the net censoring rate, ESE is the empirical standard error, ASE 1 is the average model-based standard error, ASE 2 is the average robust standard error, ECP % is the empirical coverage probability for˛ of nominal 95% CIs using the robust standard error, ECP% is the empirical coverage probability forˇ1 of nominal 95% CIs using the robust standard error, and EP% is the empirical power of a Wald test of H 0 W˛D 0 based on the robust standard error. event to obtain component-specific estimates of treatment effect, and it is therefore compatible with the way the joint distributions were constructed using copula functions in Section 2. Estimation proceeds under a working independence assumption, as often adopted for analyses based on generalized estimating equations. We obtain a robust estimate of the covariance matrix, and then we obtain a global estimate of treatment effect by taking a weighted average of all component-specific estimates with weights chosen to minimize the variance of the global estimator. A key distinction between the global approach of Wei et al. [20] and the composite endpoint approach is that the former makes use of all observed events whereas the composite endpoint uses only information on the first event.
In the derivations that follow, the composite endpoint is comprised of K components, but we subsequently focus on the case K D 2. We let dN ik .s/ D I.T ik D s/, fN ik .s/; 0 < sg denote the counting process for type k events, and fN i .s/ D .N i1 .s/; N i2 .s/; 0 < sg denote the bivariate counting process for subject i, i D 1; : : : where k0 .t / is the baseline hazard function andˇk is the corresponding treatment effect. The kth component-specific score function forˇk is where Under the copula model of Section 2 with marginal distributions featuring proportional hazards, the solution to the score equation (10), b k , is consistent for the true treatment effectˇk. If we let D .ˇ1; : : : ;ˇK/ 0 and its estimate b D . b 1 ; : : : ; b K / T , Wei et al. [20] show that p m. b ˇ/ converges in distribution to a multivariate normal distribution with a zero-mean vector and variance-covariance matrix †.ˇ/ and provides a consistent sandwich-type estimate for †.ˇ/.
The global estimate of treatment effect proposed by Wei et al. [20] is a linear combination of all component-specific treatment effect estimates b 1 ; : : : ; b K and can be obtained as where the weight c.
where c.ˇ/ D † 1 .ˇ/JOEJ 0 † 1 .ˇ/J 1 . We therefore require the limiting value of the robust variance †.ˇ/ to obtain the limiting value. The detailed derivations are deferred to the Appendix. An alternative asymptotically equivalent approach to estimating the global effect and to deriving the limiting value involves specifying a single Cox regression model and fitting it using all events while 'stratifying' on the event type [32]. Although this has some appeal, we adopt the current framework on the basis of synthesizing estimates from separate Cox regression models because it makes explicit the fact that the global estimate and associated limiting value may be viewed as a weighted average of the component-specific estimates.  Table III reports the results from a global analysis of treatment effect based on the marginal analysis proposed by Wei et al. [20]. In this table, the sample sizes were computed on the basis of the formula for the composite endpoint analysis using the limiting value of the regression coefficient. As one would expect from (10), when the treatment effects are equal, then the marginal analysis yields consistent estimators for this common effect and the mean estimate across all simulated trials is very close to the limiting value. Moreover, the ESE and the average robust standard error were in very close agreement; the average model-based standard error is conservative because it is based on the working independence assumption being correct. The empirical coverage probabilities (based on the robust standard errors) were compatible with the nominal 95% level for Ň whenˇ1 Dˇ2. Whenˇ1 ¤ˇ2, the empirical coverage forˇ1 was zero, a reflection of the difference between Ň andˇ1. Whenˇ2 D 0, the limiting value Ň was quite small, and hence, the sample sizes of the trial were much larger. Whenˇ1 ¤ˇ2, the composite endpoint and global analyses yield estimators that do not coincide witȟ 1 ,ˇ2, or each other. We next compare these limiting values. We consider the case in which two failure times are generated by a Clayton copula with exponential margins and a single treatment covariate modeled through proportional hazards withˇ1 D log.0:80/ andˇ2 D 0. We consider mild and moderate association between the failure times with D 0:20 and D 0:40, respectively. Administrative censoring was set to 40% and additional random censoring from an exponential withdrawal time gave cases with 60% and 80% as well. The limiting values of the composite endpoint and global analyses were plotted against P .T 1 < T 2 jZ D 0/ D p 1 in Figure 3. It is apparent that when p 1 approaches zero, the limiting value for both methods approaches 0. For the composite endpoint, this makes sense because the first event is most likely to be a type 2 event for which there is no treatment benefit. As p 1 approaches 1, the limiting value for the composite endpoint analysis approachesˇ1 for analogous reasons. The limiting value from the global analyses tracks these limiting values quite well, but tend to correspond to larger estimates of treatment effect because the limiting value is larger in absolute value. Thus, even when the two components have equal frequencies and the proportional hazards assumption holds for each component, the global analysis, in the limit, will yield an estimate of treatment effect that is greater than that of the composite endpoint analysis. These relationships hold across both levels of association and over different degrees of censoring. Although we have restricted attention to the Clayton copula in these calculations and empirical studies, this investigation could be repeated under other copula models, and although the limiting values would differ, qualitatively similar findings would be expected.

Application to an asthma management study
We now apply both the composite endpoint analysis and the global approach to an asthma management study [33]. This is a two-phase, multicenter, randomized, parallel group effectiveness trial for comparing two treatment strategies for asthma management over a 2-year period. The control strategy is a 'clinical strategy', in which the treatment was guided on the basis of patient symptoms and spirometry readings. The experimental strategy is a so-called 'sputum strategy' (SS), whereby a cellular analysis of sputum samples was used to guide corticosteroid therapy use to keep eosinophils cell counts less than 2%. In phase I, a total of 107 patients were identified through the minimum treatment to maintain control. The aim of this asthma study was to investigate whether SS is more effective than clinical strategy on reducing the number and severity of exacerbations in phase II.
In our analysis, we focus on two types of exacerbations: mild exacerbations defined as requiring a daily maintenance dose of fluticasone of < 250 g and severe exacerbations defined here as requiring a minimum daily maintenance dose > 250 g. The composite endpoint is defined as the time to the first of the two type of exacerbations. Figure 4 displays the empirical distribution function plots for the two component types of exacerbations and for the composite endpoint. It is apparent that the severe exacerbations occur much more frequently than mild exacerbations and thus represent the majority of the events contributing to the composite endpoint. Table IV presents the results of the proportional hazards regression analysis in which the single binary covariate is the treatment indicator taking the value one for patients in the experimental (SS) group and zero otherwise. From these results, it is clear that the experimental SS strategy leads to a significantly lower hazard of severe exacerbations with a relative risk reduction of 47% (95% CI: 0.02, 0.71;    [20] global analysis yields an estimate that is close to that obtained from the composite endpoint analysis, but it is apparent when examining the effects on the separate components that a global estimate is not an adequate summary of the data. The last column of Table IV gives the p-values for testing the proportional hazards assumption using univariate tests based on Schoenfeld residuals [32]. There is insufficient evidence to reject the null hypothesis of proportional hazards for each component, and the test yields a p-value just shy of statistical significance for the composite endpoint analysis at 0.063. Thus, although we have demonstrated that, in principle, if the proportional hazards assumption holds for the components of a composite endpoint, it generally does not hold for composite endpoint itself, the tests do not suggest problems with model fit for this particular data. Although the association may be well characterized by the Gumbel-Hougaard copula, the power of the tests for departures from proportional hazards may also be inadequate.

Discussion
Composite endpoints are widely adopted in clinical trials, and fitting a Cox proportional hazards model is the standard approach to estimating treatment effects on the basis of such endpoints. We have demonstrated that even when the treatment effects are the same for component endpoints under marginal Cox models, the Cox model for the composite endpoint is typically misspecified because the proportional hazards assumption does not in general hold. The estimator of treatment effect under such a misspecified Cox model for the composite endpoint has a slightly conservative limiting value, meaning that the benefit of treatment was under-estimated in the settings we examined. We found several factors that influence the limiting value including the strength of the association between the individual component events, stochastic ordering of the individual components, and the degree and nature of the censoring process; empirical studies corroborated these findings. Although we have not explored this here, it is clear from Section 2 that the specific copula function would also have an important effect. More generally, variation in the treatment effect across the individual components makes it even more difficult to interpret estimators. Composite endpoints are often thought to offer a measure of the 'overall effect' of a treatment [9]. In fact, the opposite can be true if treatment effects are in opposing directions for different components, and one component tends to occur first. The event tending to occur first will have the greatest influence on the estimator of treatment effect based on the composite endpoint, masking the effect on the events that tend to occur later. In the case where the events occur with equal frequency and the treatment effect is the same for the components, the asymptotic calculations of Section 2 show that the estimate based on the composite endpoint suggests a smaller benefit that holds for the components. Although some might argue that this is therefore a conservative approach, it is essentially incorrect. The global approach of Wei et al. [20] can, however, provide evidence of this adverse effect in the component-wise analysis, and this will lead to an attenuation of the global effect in the weighted analysis.
Another rationale put forward for adopting composite endpoints is to model the event-free survival probability. For example, Sheehe [34] proposed that the event-free survival curve can be computed on the basis of Cox model estimates of hazard ratios from the composite endpoint containing mortality as a component. As we have demonstrated, estimators from Cox regression using composite endpoints can be attenuated by including components for which there is no treatment effect. Using estimates from the composite endpoint analysis (event-free survival) may not provide a valid representation of the effect on the non-fatal event.
Two of the guidelines for the use of composite endpoints include the requirement that individual component events should be of roughly equal frequency, and the treatment effects should be comparable across all components. Our analytical and empirical investigation shows that these may not be sufficient conditions if interest lies in estimating these common effects in the sense that even when these conditions are satisfied, the association between the two events can lead to substantial bias in estimators based on composite endpoints. We support the recommendations that (i) data from all components should be followed until the end of the trial and (ii) individual components should be analyzed and reported separately. This alternative design and analysis facilitates a global approach [20] based on combining estimates from individual components, as well as assessment of whether this is appropriate. In the context of the copula-based joint model, we found that the global approach, in general, outperforms the composite endpoint analysis in terms of the properties of the resulting estimators and power or sample size requirements.
We have formulated a model with proportional hazards for each component event through the use of a copula function to reflect an idealized situation in alignment with the recommendations in the literature. We restricted attention to the situation with two component endpoints, but three or more components are often specified in practice. When multiple components are of interest, copula functions with an 'exchangeable' association structure can be readily adopted; more baseline marginal hazard functions and treatment effects would need to be specified. It is relatively straightforward to extend the derivations and empirical studies reported here for this setting but more challenging to cover a meaningful spectrum of settings, summarize results, and make recommendations.
Alternative frameworks could naturally be adopted for specifying models for correlated failure time data. One might, for example, consider intensity-based models where the risk of one type of event changes with the occurrence of another type of event. This could arise because of a biological mechanism in which the medical risk actually increases or if treating physicians alter the therapy being given. This formulation, although natural for characterizing the response process, is not compatible with proportional hazards for the marginal models. One might also consider frailty models for addressing the association between event times, but again, the marginal models will not have a proportional hazards form.
We have assumed independent censoring in this paper. Another way in which patients may be treated differently following the occurrence of a clinically important event is to be withdrawn from a study. The occurrence of one event may increase the risk an investigator may withdraw the patient from the study and result in response-dependent censoring. If the events are independent conditional on the treatment covariate, this will not pose a problem but otherwise will lead to biased estimates of the baseline hazard functions and treatment effects. Use of inverse probability of censoring weights will help reduce this bias, and this is currently under investigation.
Finally, we have focused on the frequency properties of estimators under a Cox regression models. Cox models are used routinely, but of course, the proportional hazards assumption may not be valid for either the marginal distributions or the composite endpoint. There is increasing interest in use of alternative regression models for the analysis of survival data including accelerated failure time models and additive models. Exploration of the behavior of estimators from such models warrants study.

Appendix: Derivation of the limiting value Ň
Suppose the proportional hazards assumption holds for the marginal distribution of each component event and a copula model is used to characterize the association. Under a working independence assumption [20], the limiting value for b k isˇk. Here, we consider a generic scalar covariate Z i , but note that the functions simplify with Z i a binary treatment indicator because Z r i D Z i in this case. Let  where ƒ k .dt k j´i I / D dƒ k .t k j´i I /=dt k ; F.dt 1 ; dt 2 j´i I / D @ 2 F.t 1 ; t 2 j´i I /=@t 1 @t 2 , F.dt 1 ; t 2 j´i I / D @F.t 1 ; t 2 j´i I /:@t 1 , and F.t 1 ; dt 2 j´i I / D @F.t 1 ; t 2 jZ i I /=@t 2 . More specifically, if the joint survivor function F.t 1 ; t 2 j´i I / is specified by the Clayton copula with margins of two exponential distributions, then hdM j .t j /; dM k .t k /i can be obtained in closed form and E.w ij .ˇj /w ik .ˇk// can be obtained through numerical integration. Thus, we obtain the limiting value of robust variance, then the limiting weights can be calculated using c.ˇ/ D † 1 .ˇ/=J 0 † 1 J and the limiting value Ň using equation (12).