Surrogate endpoints are often used in clinical trials instead of well-established hard endpoints for practical convenience. The meta-analytic approach relies on two measures of surrogacy: one at the individual level and one at the trial level. In the survival data setting, a two-step model based on copulas is commonly used. We present a new approach which employs a bivariate survival model with an individual random effect shared between the two endpoints and correlated treatment-by-trial interactions. We fit this model using auxiliary mixed Poisson models. We study via simulations the operating characteristics of this mixed Poisson approach as compared to the two-step copula approach. We illustrate the application of the methods on two individual patient data meta-analyses in gastric cancer, in the advanced setting (4069 patients from 20 randomized trials) and in the adjuvant setting (3288 patients from 14 randomized trials).

Surrogate endpoints are often used in clinical trials instead of well-established endpoints for practical convenience: they are usually cheaper, more rapid, easier, or less invasive to measure.13 To validate a new criterion as a surrogate of an other one, the approach endorsed by the food and drug administration (FDA)4 relies on the analysis of two endpoints from multiple clinical trials. Two criteria must be fulfilled for an endpoint to be a valid surrogate:5,6 a strong association at the individual level between the true and the surrogate endpoints and a strong association at the trial level between the effects of the treatment on the true and on the surrogate endpoints.

For normally distributed endpoints, Buyse et al.7 proposed to measure individual-level surrogacy in terms of Rindiv2 and trial-level surrogacy using the Rtrial2 . Burzykowski et al.8 adapted this approach to the case that both the surrogate and true endpoints are failure time variables. Their estimation strategy consists of two steps: in the first step, a copula model is used to measure the individual-level surrogacy in terms of Kendall’s τ9 and to estimate the treatment effects on each endpoint and for each trial; in the second step, the Rtrial2 is computed from the linear regression of the treatment effects estimated in the first step, accounting for the measurement error resulting from using estimated effects, the true values of which are unknown.10

This approach has been successfully employed in several applications in oncology1120 and is commonly considered the reference method.4,21 Nevertheless, the two-step estimation procedure makes a less efficient use of the information present in the data when estimating the trial-level R2 as compared to the original approach by Buyse et al.7 Thus, research in this field has been very dynamic for 15 years.22 For example, Shi et al.23 compared the two-stage approach to conventional Rtrial2 measures based on marginal models, Renfro et al.24 proposed to perform the second-stage evaluation within a Bayesian framework, Ghosh et al.25 explored a semicompeting risks strategy within the accelerated failure time model, and Alonso and Molenberghs26 adopted an information theory perspective.

Mixed proportional hazard models, also known as frailty models,2729 seem the most natural framework for adapting the meta-analytic approach by Buyse et al.7 to the case of failure time endpoints.30,31 Nonetheless, estimation of frailty models with complex structures of random effects can be computationally very intensive. The connection between the proportional hazard model and the Poisson regression model has long been known32,33 and has proven valid and useful also in the case of dependent observations.3436 The use of such auxiliary Poisson models is particularly convenient in the presence of several and/or nested random effects and has been proposed for the analysis of a single failure time endpoint in an individual patient data meta-analysis.37

In the present paper, we consider a bivariate auxiliary Poisson mixed model to estimate the parameters of the bivariate frailty model with individual random effects and random trial-by-treatment interactions. In the same spirit of the original approach by Buyse et al.,7 individual-level and trial-level surrogacy measures are obtained jointly within the same model, based on well-established methods for generalized linear mixed models. We compare via simulations the Poisson approach to the two-step copula approach with Clayton, Plackett, and Hougaard copulas, with or without measurement-error adjustment in the second-step linear regression of the treatment effects. We illustrate the methods using individual patient data of two meta-analyses of randomized trials: (i) a meta-analysis of 4069 patients from 20 advanced gastric cancer trials;38 and (ii) a meta-analysis of 3288 patients from 14 adjuvant gastric cancer trials.39

The GASTRIC (Global Advanced/Adjuvant Stomach Tumor Research International Collaboration) group studied the role of (adjuvant) chemotherapy in advanced38 and in resectable39 gastric cancers. In both cases, the main endpoint was overall survival (OS), but the analysis also included progression-free survival (PFS) and disease-free survival (DFS), respectively. Figure 1 shows the survival curves for OS and for the candidate surrogates PFS and DFS in the experimental and control arms. Their validity as surrogate endpoints for OS was assessed in both the advanced17 and the adjuvant16 case via the two-stage copula approach. In both cases, the Spearman’s ρ was reported as measure of individual-level surrogacy. As the Plackett copula function was used to model the dependence between the endpoints, we were able to recover numerically the estimate of the Kendall’s τ based on the estimate of ρ (supplementary Figure S1).


                        figure

Figure 1. Advanced (left) and adjuvant (right) GASTRIC meta-analyses. Overall survival (OS, black) and progression-/disease-free survival (PFS/DFS, red) functions in the chemotherapy (CT, solid line) and control (Ctrl, dashed line) arms.

2.1 Advanced GASTRIC meta-analysis

The advanced GASTRIC meta-analysis38 included 4069 patients from 20 trials. It has been previously reported17 that the rank correlation coefficient between PFS and OS was 0.85 (95% CI: 0.85–0.85) as estimated by a Plackett copula, corresponding to a Kendall’s τ of 0.68 (0.68–0.68). The R2 between the treatment effects on PFS and on OS, adjusted for measurement error, was 0.61 (95% CI: 0.04–1.00) so that the validity of PFS as surrogate endpoint for OS could not be confirmed for chemotherapy trials of patients with advanced gastric cancer. Supplementary Figure S2 shows the estimated treatment effects on OS versus the estimated treatment effects on PFS.

2.2 Adjuvant GASTRIC meta-analysis

The adjuvant GASTRIC meta-analysis39 included 3288 patients from 14 trials. The rank correlation coefficient between DFS and OS reported by Oba et al.16 was 0.97 (95% CI: 0.97–0.98) as estimated by a Plackett copula, corresponding to τ = 0.88 (0.88–0.89). Because of a high amount of estimation error, the between-trial R2 estimate was highly numerically unstable and, hence, unreliable. The unadjusted R2 was 0.96 (95% CI: 0.93–1.00) so that DFS was deemed valid as surrogate endpoint for OS in adjuvant trials of gastric cancer patients. However, due to such high association, the estimated R2 was close to the upper limit 1 and the obtained numerical results needed to be interpreted with caution.

Let Tij and Sij be the times to the true and the surrogate endpoints, respectively, for patient j{1,,ni} in trial i{1,,N} . Let Zij be the indicator of the treatment arm to which the j-th patient in the i-th trial has been randomized. In the following subsections, we recall briefly the two-step copula approach and we present the newly explored Poisson approach.

3.1 Two-step copula approach

Burzykowski et al.8 extended the meta-analytic approach of Buyse et al.7 to the case of failure time endpoints. Their estimation approach consists in two steps, one for the individual level and one for the trial level.

At the first step, the marginal distributions of the two endpoints are modeled using the following bivariate proportional hazard model:

{hSij(s;Zij)=hSi(s)exp{αiZij}hTij(t;Zij)=hTi(t)exp{βiZij}(1)
with trial-specific baseline hazards hSi(s) and hTi(s) and treatment effects αi and βi . Individual dependence is accounted for using a copula function Cθ(SSij(s),STij(t)) , with SSij(s) and STij(t) the survival functions of Sij and Tij . It has been questioned whether it is more appropriate to set the copula on the cumulative distribution functions rather than survival functions.40 We considered the first solution as in the original reference works.8,30

The dependence parameter of the copula model θ can be reparametrized into the Kendall’s τ as measure of individual-level surrogacy. We considered Weibull marginal hazards and three copula functions. The Clayton41 copula is defined as Cθ(u,v)=(u-θ+v-θ-1)-1/θ , with θ>0 and Kendall’s τ=θ/(θ+2) . The Plackett42 copula is defined as Cθ(u,v)=[Q-R1/2]/[2(θ-1)] , with θ>0 , Q=1+(θ-1)(u+v) , and R=Q2-4θ(θ-1)uv ; the Kendall’s τ is computed numerically for this family as no analytical expression is available. The Hougaard43 copula is Cθ(u,v)=exp(-[(-lnu)1/θ+(-lnv)1/θ]θ) , with θ(0,1) and the Kendall’s τis1-θ .

At the second step of the estimation algorithm, the treatment effect estimates obtained at the first step are assumed to follow the mixed model

(α^iβ^i)=(αiβi)+(ϵaiϵbi)(2)
with true treatment effects
(αiβi)N((αβ),D=(da2dadbρtrialdadbρtrialdb2))(3)
and estimation errors
(ϵaiϵbi)N((00),Ωi=(ωai2ωaiωbiρϵiωaiωbiρϵiωbi2))(4)

The trial-level surrogacy is measured in terms of Rtrial2=ρtrial2 . This is computed in a linear regression of the βi’s over the αi’s with measurement error adjustment, by fixing the Ωi ’s at their estimates from the first step.10,44 In practice, this adjusted (for measurement error) Rtrial2 can be difficult to compute. Whenever this error-in-variables linear regression cannot be reliably fitted, the unadjusted Rtrial2 is often obtained using a linear regression—equivalent to fixing all the elements of Ωi equal to 0—by weighting the observations (αi,βi)' by the trial size, in order to account somehow indirectly and approximately for measurement error. The number of events could also be used for weighting, but using the trial size is what is done in practice in clinical applications.

3.2 Poisson approach

Let us consider the semiparametric Cox model, h(t)=h0(t)exp{βZ} . By dividing the time scale into intervals k=1,,K corresponding to the intervals between the observed event times, the so-called auxiliary Poisson model for the event indicator δ(k)Poi(μ(k)) in the k-th interval is log(μ(k))=μ0(k)+βZ+log(y(k)) , where y(k) is the time spent at risk during the period k.32,37 In such a model, the parameters {μ0(k)}k=1,,K are the logarithms of the baseline risk rates during each interval k, under the assumption that the risk is constant within the interval. In practice, different time intervals can be employed, leading to an approximation of the estimators of the Cox model parameters.

In meta-analyses, each subject j is recruited within a trial i. To account for both within-trial and within-subject dependence, the joint model for the two endpoints can be expressed as

{hSij(s)=hSi(s)exp{uij+αiZij}hTij(t)=hTi(t)exp{uij+βiZij}(5)
where uijN(0,σindiv2) are individual random terms. The parameters of this random effects Cox model can be estimated equivalently via an auxiliary bivariate mixed Poisson model
{log(μSij(k))=μSi(k)+uij+αiZij+log(ySij(k))log(μTij(k))=μTi(k)+uij+βiZij+log(yTij(k))(6)
with ySij(k) and yTij(k) the time spent at risk by subject i in trial j with respect to each endpoint during the period k, and where {μSi(k)=μS(k)+mSi}k=1,,K and {μTi(k)=μT(k)+mTi}k=1,,K are the baseline log-hazard rates.

3.2.1 Individual-level surrogacy

In the context of Gaussian linear models, Buyse et al.7 assumed correlated but distinct individual error terms uSij and uTij to derive Rindiv2 of Tij|Zij,Sij|Zij . In the case of failure time endpoints, such Rindiv2 would not express the association between S and T , but between the random effects which modulate their hazard functions, which is a more indirect way of measuring dependence. For failure time endpoints, the two-step copula approach measures the individual association in terms of Kendall’s τ (see Section 3.1). The use of the same random term uij in model (5) (and (6)) provides us with a shared frailty, the estimated variance of which ( σindiv2 ) can be used to compute the Kendall’s τ=40sL(s)L(2)(s)ds-1 between S and T , where L(s) and L(2)(s) are the Laplace transform of the frailty distribution and its second derivative.28 Nevertheless, an analytic expression of L(s) is not available for the log-normal frailty distribution and Hougaard45 suggested to approximate it analytically or by simulations. Following Munda46 and Asmussen et al.,47 we used the Laplace approximation.48

3.2.2 Trial-level surrogacy

In model (6), we assumed that the trial-specific treatment effects are bivariate normal random variables, consistently with the two-step copula model (3). Such choice provides us with the correlation ρtrial between the treatment effect on the surrogate endpoint S and the effect on the true endpoint T, which can be translated straightforwardly into the coefficient of determination Rtrial2=ρtrial2 , also referred to simply as R2 in the rest of the paper.

3.2.3 Reduced Poisson models

In our work, we considered reduced models in order to deal with the problems of computational instability one may encounter in fitting the full model (6). The Poisson model TI incorporates both random trial-treatment interactions (αi,βi)' and individual random effects uij, but assumes common baselines between trials ( μSi(k)=μS(k) , μTi(k)=μT(k),i ). The Poisson model TIa accounts for trial-specific baseline risks using shared random effects at the trial level: μSi(k)=μS(k)+mi , μTi(k)=μT(k)+mi , miN(0,σm2) .

We compared in simulations the proposed mixed Poisson approach to the two-step copula approach. We considered the Clayton, Plackett, and Hougaard copula families, with and without measurement error adjustment at the second step.

We generated data under the mixed proportional hazard models and the Clayton copula. Full details of these two data generating methods are available in the Supplementary Material.

All the methods for model fitting and data generation are implemented in the R package surrosurv,49 publicly available from the CRAN.

4.1 Simulation design

4.1.1 Simulation parameters

Exponentially distributed times were simulated with baseline rates fixed to exp(μS)=4/log(2) and exp(μT)=8/log(2) in order to obtain median survival times of four and eight years for S and T , respectively. The trial effects (mSi,mTi)' had null means, variances σS2=σT2=0.2 , and correlation ρm=0.5 . The treatment effects (αi,βi)' had means equal to α=β=log(0.75) , variances da2=db2=0.1 , and correlation ρtrial depending on the scenario. Administrative censoring was added at 15 years, leading to about 40% of censoring for S and 20% for T on average across scenarios.

In different scenarios, we varied the number (N = 10, 20, 40) and size ( ni=500,250,125 ) of the trials, and the values of R2 ( 0.4,0.8 ) and of Kendall’s τ ( 0.4,0.6 ). In the six main scenarios (Table 1), the trials had fixed size ( ni=500,250 , or 125). In supplementary scenarios 3s and 4s, the same number of patients ( iNni=5000 ) was randomly assigned to N = 20 trials (95% of trials had 91 to 445 patients). These two scenarios served the purpose of testing whether variable trial sizes play any role, as compared to fixed size in scenarios 3 and 4. In a sensitivity analysis, we explored a scenario reflecting the strong association observed in the adjuvant GASTRIC meta-analysis (Section 5.2): R2=0.9,τ=0.8 , with N=15 and ni=200 .

Table

Table 1. Simulation scenarios.

Table 1. Simulation scenarios.

Five hundred simulated meta-analyses were considered in each scenario. Simulations were replicated with data performed both using a mixed proportional hazard model and a Clayton copula model.

4.1.2 Evaluation criteria

The main parameter of interest was the trial-level surrogacy measure R2; the secondary one was the individual-level surrogacy measure, the Kendall’s τ. For these two parameters, we computed in each scenario (i) the relative bias, with respect to the true parameter value, and (ii) the relative mean squared error, relative to the adjusted Clayton copula, chosen as reference method. We considered that a model converged to a reliable solution if the maximum absolute gradient was <10-2 and, for adjusted models, the eigenvalues of the variance–covariance matrix of the trial-treatment interaction were all >10-8 .

4.2 Results

Table 2 reports the results of simulations for the adjusted copula models and the Poisson models when data were generated using the bivariate mixed proportional hazard model.

Table

Table 2. Results with data simulated with mixed proportional hazard models.

Table 2. Results with data simulated with mixed proportional hazard models.

The trial-level R2 (see also Figure 2) was often well estimated by the Clayton and Plackett copula models when the R2 was moderate (0.4) and underestimated by 10%–23% when it was high (0.8). The Hougaard model overestimated moderate R2 by 13%–32%, but had lower bias when R2=0.8 (7%–9%, downwards). Of note, the Hougaard copula model is probably the most different model from the data generating process used to generate the data, which can explain at least part of this bias. The Poisson TI model was often the model with the smallest bias and MSE when the R2 was high, with bias 8 % and MSE as low as 0.46 times the Clayton copula (used as reference). Nevertheless, this model showed 11%–28% of bias (upwards) when the R2 was low. The Poisson TIa model provided the least biased estimates in the scenarios with high R2 and high τ, but its results were mixed in other cases. The MSE of all methods for the estimation of the R2 were similar to each other across different scenarios.


                        figure

Figure 2. Relative bias (with respect to the true value) of the estimate of R2 with data generated with mixed proportional hazard models. N: number of trials. ni: number of patients per trial.

The estimates of the Kendall’s τ (Table 2, supplementary Figure S3) were not or little biased for all methods except the Hougaard copula model, which tended to overestimate (+16% – 17%) the τ when it was moderate (0.4) and to underestimate it (−42%) when it was high (0.6). Also, its MSE was extremely high as compared to the Clayton copula model (up to 7×102 -fold higher when τ=0.6 ).

Unadjusted copula models (supplementary Table S1) often underestimated the R2 (bias as high as 49%), and in general, the bias was higher than the associated adjusted counterparts. On the other hand, such increased bias for unadjusted copula models came with excellent convergence rates (see supplementary Table S2). Among models with measurement-error adjustment, the Clayton and Plackett copula models showed the highest convergence rates (see also supplementary Figure S4); the Hougaard was the copula model with the worst convergence rates, in particular with several small trials. Within the Poisson family of models, the Poisson TI often had the highest convergence rates, in particular, when τ=0.4 .

The results with variable trial sizes (scenarios 3s and 4s, Table S3) were very similar to those in meta-analyses of fixed-size trials with the same total number of trials and patients (scenarios 3 and 4).

In summary, in this first part of simulations, the adjusted Clayton model showed the best results when R2=0.4 , while the Poisson TI model was the most accurate and precise when R2=0.8 .

When data were generated from the Clayton copula model (Table 3 and Tables S4 and S5; Figure 3), the true model recovered the R2 with no or small bias in general, showing the worst results with few trials (10), high τ (0.6), and low R2 (0.4). The Clayton model recovered the Kendall’s τ without bias in all the scenarios (Figure S5). The Plackett and Hougaard models showed serious convergence issues in this case (supplementary Table S6 and Figure S6). The former model had convergence rate <10% in all the scenarios, whereas the latter converged to a reliable result in ≥10% of the replicates only when τ=0.6 . Of note, also in this case, the Plackett and Hougaard copulas were misspecified models with respect to the Clayton copula used to generate the data.

Table

Table 3. Results with data simulated with Clayton copula.

Table 3. Results with data simulated with Clayton copula.


                        figure

Figure 3. Relative bias (with respect to the true value) of the estimate of R2 with data generated with Clayton copulas. The estimated value is not plotted whenever the convergence rate was lower than 10%. N: number of trials. ni: number of patients per trial.

Models Poisson TI and TIa, which often converged in at least 80% of the replicates, performed similarly to the Clayton copula, with absolute bias ≤15% and very often ≤5%, notably with higher values of N. On the other hand, the Poisson models estimated the Kendall’s τ with (downwards) bias of 10%–21% and high MSE as compared to the Clayton model (relative mean squared error as high as 3×102 ).

In the sensitivity analyses mimicking the association observed in the adjuvant GASTRIC meta-analysis, the results (Table S7) confirmed convergence issues when dependence is high. Such issues can heavily depend on the data generating process. The convergence rates of the adjusted Plackett model were 30% and 0% with data generated by mixed proportional hazard models and Clayton copulas, respectively, while for the Poisson models, they were 0% and 63%–73%. The R2 was estimated accurately in general, except for the Clayton copula when the data were generated by mixed proportional hazard models (18% downwards). The Hougaard copula strongly underestimated the Kendall’s τ (−73% and −64%).

In this section, we show the results of the copula and the Poisson models for the advanced and the adjuvant GASTRIC meta-analyses introduced in Section 2. In both cases, we refitted all the models on 250 bootstrapped data sets, in order to compute confidence intervals for Rtrial2 and τ .

5.1 Advanced GASTRIC meta-analysis

Table 4 shows the estimates of the trial-level R2 and of the individual-level Kendall’s τ obtained with the models described in the paper. The Kendall’s τ for the previously published results was recovered based on the published Spearman’s ρ, according to the Plackett’s copula function employed therein (see Section 2 and Figure S1).

Table

Table 4. Surrogacy results. Advanced GASTRIC meta-analysis.

Table 4. Surrogacy results. Advanced GASTRIC meta-analysis.

The estimates of trial-level surrogacy from adjusted copula models (0.41 [95% CI: 0.15–1.00], 0.40 [0.04–1.00], and 0.38 [0.01–1.00] for Clayton, Plackett, and Hougaard, respectively) were lower than those from their unadjusted equivalents (0.45 [0.28–0.74] for all the three). The estimate from model Poisson TI was R2=0.63 (95% CI: 0.32–1.00). The model Poisson TIa, which had the best convergence metrics of all models, provided the highest estimate of R2= 0.83 (95% CI: 0.24–1.00).

Individual-level surrogacy was the highest for the Clayton ( τ=0.61 , 95% CI: 0.59–0.62) and the Plackett ( τ=0.62 , 95% CI: 0.60–0.63) copulas. The Hougaard copula provided the lowest estimate (0.32, 95% CI: 0.32–0.33), whereas the Poisson models estimated the τ at 0.51 (95% CI: 0.50–0.52). The confidence intervals of all the models were very narrow.

The convergence metrics for the advanced case are provided in Table S8. In this context, the copula models had high maximum gradient ( 1.5 ), whereas this was much smaller ( 5×10-5 ) for the Poisson models. The smallest eigenvalue of the covariance matrix of the random treatment effects was positive for all the adjusted models.

The results above are from auxiliary Poisson models estimated over eight time intervals. Supplementary Figure S7 shows that, even though the estimates of R2 and τ are instable for a small number of intervals, they are quite robust between 8 and 24 time intervals, whereas models failed to converge when 32 intervals were used.

5.2 Adjuvant GASTRIC meta-analysis

When analyzing the adjuvant gastric meta-analysis data, all the models had serious convergence problems (Table S9), with high maximum gradient ( 660 for the copula models, 5.1×10-5 for the Poisson models) and low minimum eigenvalue of the covariance matrix of the random treatment effects ( 3.5×10-11 ). Serious convergence problems were also reported in the original publication by Oba et al.16 As shown in supplementary Figure S8, using a different number of time intervals for the Poisson approach was not superior with regard to this aspect.

The results obtained, which have to be interpreted with high caution, are shown in Table 5. The R2s estimated by copula models with adjusted linear regression ranged from 0.94 (0.08–1.00, Hougaard model) to 1.00 (0.69–1.00, Plackett model). With the Poisson models, the R2 was estimated to be 0.54 (0.02–1.00, model TI) and 1.00 (0.08–1.00, model TIa). The estimates of the Kendall’s τ ranged from 0.74 (95% CI: 0.73–0.76) to 0.82 (0.81–0.83), except for the Hougaard copula (0.18 [0.17–0.19]).

Table

Table 5. Surrogacy results. Adjuvant GASTRIC meta-analysis.

Table 5. Surrogacy results. Adjuvant GASTRIC meta-analysis.

Of note, the convergence diagnostics (Table S9) suggest to be very careful in interpreting the point estimates of all models. Furthermore, due to the width of the confidence intervals, it is difficult to make conclusive comparisons between these estimates.

The evaluation of surrogate endpoints is particularly challenging in the case of failure time endpoints.21,22 Two-step copula models are the state-of-the art statistical methodology, but their estimation is obtained via a two-step algorithm which is regarded as suboptimal as compared to the surrogacy model available in the case of Gaussian endpoints.7

In the present paper, we proposed an alternative estimation strategy, which is based on a single-step estimation as in the Gaussian case.7 This alternative approach exploits the equivalence of parameter estimates between (random effect) proportional hazard models and auxiliary (mixed) Poisson log-linear models. In addition, this model formulation does not need any parametric assumption for the marginal baseline hazard functions, which is the case for the two-step copula approach. The auxiliary mixed Poisson model fits naturally into the GLMM theory and can accomodate complex structures of random effects. Within the same model, both measures of surrogacy (individual-level Kendall’s τ and trial-level R2) can be estimated. We considered simplified variants of the Poisson model to reduce its computational complexity. The model Poisson TIa—which accounts for both trial- and individual-level surrogacy in addition to adjusting for the trial effect on the baseline hazard—is more general, but simulations showed sometimes worse results than model TI. This latter—without trial adjustment on the baseline hazard—proved to be sufficiently complex to account for the main sources of dependence and sufficiently simple to be estimated in most cases.

In this paper, we considered three families of copulas (Clayton, Plackett, and Hougaard), which allowed to highlight how misspecification of the parametric family can have a huge impact on their convergence rate. We also showed that the adjustment for measurement error at the second step reduces estimation bias in many cases, despite lowering the convergence rates. In our simulations, the Clayton copula family proved to be quite robust to different parameters of the data generating process, with some underestimation of the R2 when the true value was high and the data came from a very different data generating process. Recently, Renfro et al.40 argued that applying the copula function on the cumulative incidence functions instead of the survival function could be more appropriate. In addition, they suggested that using the two-stage estimation procedure by Shih and Louis50 within the first step of the estimation algorithm of the copula models could reduce estimation bias. Nevertheless, we chose to adopt the methods by Burzykowski et al.8 in their original form, as commonly applied in numerous clinical publications1120 over the last 15 years, including a surrogate evaluation conducted by the US FDA.4

We explored three main scenarios with various numbers and sizes of trials. For a fixed total number of patients, we observed similar results (for all the methods) in meta-analyses of several small trials and in those of few big trials. The only exception was the Hougaard copula model, which converged more often when the trials were big than when they were numerous.

In our work, we considered a new approach to the evaluation of failure time surrogate endpoints, based on mixed Poisson models. We studied this approach in extensive simulations and compared it head to head to several copula models. We showed that the convergence rate and the estimation results vary according to the misspecification of the model and to the degree of dependence in the data. The Poisson approach generally provided the best results when the correlation was high. Adjustment of the baseline risk on the trial (model TIa) performed worse than the simpler model (TI) without adjustment. The Clayton model showed results among the most robust of all methods, notably when the dependence was moderate, but it was often the closest to the true data generating process. In real applications, we suggest that the model choice should take into account the goodness of fit of the different parametric families, as well as convergence metrics of the models estimated on the data at hand.

Evaluating a failure time surrogate endpoint can be quite challenging in real-life applications. The auxiliary Poisson approach is one additional statistical tool, which broadens the range of available models among which the best fitting one can be chosen.

The authors thank the GASTRIC (Global Advanced/Adjuvant Stomach Tumor Research International Collaboration) Group for permission to use their data. The investigators who contributed to GASTRIC are listed in references [16, 17, 38, and 39]. The GASTRIC Group data are available within the surrosurv package for research purposes, under the conditions that (1) the research be scientifically appropriate, (2) the confidentiality of individual patient data be protected, (3) the results of the analyses be shared with the GASTRIC Group prior to public communication, (4) the source of data be fully acknowledged as above, and (5) resulting data and results be further shared with the research community.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The present work has been supported by the Institut National du Cancer (INCa), Grant SHS 2014-141, and by the Ligue Nationale Contre le Cancer. The study sponsors had no involvement in either the study design; the collection, analysis, and interpretation of data; the writing of the manuscript; nor in the decision to submit the manuscript for publication.

Supplemental material is available for this article online.

1. Burzykowski, T, Molenberghs, G, Buyse, M. The evaluation of surrogate endpoints, New York, NY: Springer Science & Business Media, 2006.
Google Scholar
2. Chakravarty, A, Sridhara, R. Use of progression-free survival as a surrogate marker in oncology trials: some regulatory issues. Stat Methods Med Res 2008; 17: 515518.
Google Scholar | SAGE Journals | ISI
3. Buyse, M, Molenberghs, G, Paoletti, X Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J 2016; 58: 104132.
Google Scholar | Medline
4. Blumenthal, GM, Karuri, SW, Zhang, H Overall response rate, progression-free survival, and overall survival with targeted and standard therapies in advanced non-small-cell lung cancer: US food and drug administration trial-level and patient-level analyses. J Clin Oncol 2015; 33: 10081014.
Google Scholar | Medline
5. Daniels, MJ, Hughes, MD. Meta-analysis for the evaluation of potential surrogate markers. Stat Med 1997; 16: 19651982.
Google Scholar | Medline | ISI
6. Buyse, M, Molenberghs, G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 1998; 54: 10141029.
Google Scholar | Medline | ISI
7. Buyse, M, Molenberghs, G, Burzykowski, T The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 2000; 1: 4967.
Google Scholar | Medline
8. Burzykowski, T, Molenberghs, G, Buyse, M Validation of surrogate end points in multiple randomized clinical trials with failure time end points. J Roy Stat Soc Appl Stat 2001; 50: 405422.
Google Scholar | ISI
9. Kendall, MG . A new measure of rank correlation. Biometrika 1938; 30: 8193.
Google Scholar
10. van Houwelingen, HC, Arends, L, Stijnen, T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 2002; 21: 589624.
Google Scholar | Medline | ISI
11. Sargent, DJ, Wieand, HS, Haller, DG Disease-free survival versus overall survival as a primary end point for adjuvant colon cancer studies: Individual patient data from 20,898 patients on 18 randomized trials. J Clin Oncol 2005; 23: 86648670.
Google Scholar | Medline | ISI
12. Buyse, M, Burzykowski, T, Carroll, K Progression-free survival is a surrogate for survival in advanced colorectal cancer. J Clin Oncol 2007; 25: 52185224.
Google Scholar | Medline | ISI
13. Burzykowski, T, Buyse, M, Piccart-Gebhart, MJ Evaluation of tumor response, disease control, progression-free survival, and time to progression as potential surrogate end points in metastatic breast cancer. J Clin Oncol 2008; 26: 19871992.
Google Scholar | Medline | ISI
14. Buyse, M, Burzykowski, T, Michiels, S Individual- and trial-level surrogacy in colorectal cancer. Stat Methods Med Res 2008; 17: 46775.
Google Scholar | SAGE Journals | ISI
15. Michiels, S, Le Maître, A, Buyse, M Surrogate endpoints for overall survival in locally advanced head and neck cancer: meta-analyses of individual patient data. Lancet Oncol 2009; 10: 341350.
Google Scholar | Medline | ISI
16. Oba, K, Paoletti, X, Alberts, S Disease-free survival as a surrogate for overall survival in adjuvant trials of gastric cancer: a meta-analysis. J Natl Canc Inst 2013; 105: 16001607.
Google Scholar | Medline
17. Paoletti, X, Oba, K, Bang, YJ Progression-free survival as a surrogate for overall survival in advanced/recurrent gastric cancer trials: a meta-analysis. J Natl Canc Inst 2013; 105: 16671670.
Google Scholar | Medline
18. Mauguen, A, Pignon, JP, Burdett, S Surrogate endpoints for overall survival in chemotherapy and radiotherapy trials in operable and locally advanced lung cancer: a re-analysis of meta-analyses of individual patients’ data. Lancet Oncol 2013; 14: 619626.
Google Scholar | Medline | ISI
19. Michiels, S, Pugliano, L, Marguet, S Progression-free survival as surrogate end point for overall survival in clinical trials of HER2-targeted agents in HER2-positive metastatic breast cancer. Ann Oncol 2016; 27: 10291034.
Google Scholar | Medline
20. Rotolo, F, Pignon, JP, Bourhis, J Surrogate endpoints for overall survival in loco-regionally advanced nasopharyngeal carcinoma: an individual patient data meta-analysis. J Natl Canc Inst 2017; 109. DOI:10.1093/jnci/djw239.
Google Scholar | Medline
21. Green, E, Yothers, G, Sargent, DJ. Surrogate endpoint validation: statistical elegance versus clinical relevance. Stat Methods Med Res 2008; 17: 477486.
Google Scholar | SAGE Journals | ISI
22. Burzykowski, T . Surrogate endpoints: wishful thinking or reality? Stat Methods Med Res 2008; 17: 463466.
Google Scholar | SAGE Journals | ISI
23. Shi, Q, Renfro, LA, Bot, BM Comparative assessment of trial-level surrogacy measures for candidate time-to-event surrogate endpoints in clinical trials. Comput Stat Data Anal 2011; 55: 27482757.
Google Scholar | ISI
24. Renfro, LA, Shi, Q, Sargent, DJ Bayesian adjusted R2 for the meta-analytic evaluation of surrogate time-to-event endpoints in clinical trials. Stat Med 2012; 31: 743761.
Google Scholar | Medline
25. Ghosh, D, Taylor, JMG, Sargent, DJ. Meta-analysis for surrogacy: accelerated failure time models and semicompeting risks modeling. Biometrics 2012; 68: 226232.
Google Scholar | Medline
26. Alonso, A, Molenberghs, G. Evaluating time to cancer recurrence as a surrogate marker for survival from an information theory perspective. Stat Methods Med Res 2008; 17: 497504.
Google Scholar | SAGE Journals | ISI
27. Vaupel, JW, Manton, KG, Stallard, E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 1979; 16: 439454.
Google Scholar | Medline | ISI
28. Duchateau, L, Janssen, P. The frailty model, New York, NY: Springer Verlag, 2008.
Google Scholar
29. Wienke, A . Frailty models in survival analysis. (Chapman & Hall/CRC biostatistics series), Boca Raton, FL: Taylor and Francis, 2010. DOI:10.1201/9781420073911.
Google Scholar
30. Burzykowski, T, Cortiñas Abrahantes, J Validation in the case of two failure-time endpoints. In: Burzykowski, T, Molenberghs, G, Buyse, M (eds). The evaluation of surrogate endpoints, New York, NY: Springer, 2005, pp. 163194.
Google Scholar
31. Rondeau, V, Pignon, JP, Michiels, S. A joint model for the dependence between clustered times to tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat Methods Med Res 2011; 24: 711729.
Google Scholar | SAGE Journals
32. Whitehead, J . Fitting Cox’s regression model to survival data using GLIM. J Roy Stat Soc C Appl Stat 1980; 29: 268275.
Google Scholar
33. Laird, N, Olivier, D. Covariance analysis of censored survival data using log-linear analysis techniques. J Am Stat Assoc 1981; 76: 231240.
Google Scholar | ISI
34. Ma, R, Krewski, D, Burnett, RT. Random effects Cox models: a Poisson modelling approach. Biometrika 2003; 90: 157169.
Google Scholar
35. Feng, S, Nie, L, Wolfe, RA. Laplace’s approximation for relative risk frailty models. Lifetime Data Anal 2009; 15: 343356.
Google Scholar | Medline
36. Hirsch, K, Wienke, A, Kuss, O. Log-normal frailty models fitted as Poisson generalized linear mixed models. Comput Meth Programs Biomed 2016; 137: 167175.
Google Scholar | Medline
37. Crowther, MJ, Riley, RD, Staessen, JA Individual patient data meta-analysis of survival data using Poisson regression models. BMC Med Res Methodol 2012; 12: 3434.
Google Scholar | Medline | ISI
38. The GASTRIC group . Role of chemotherapy for advanced/recurrent gastric cancer: an individual-patient-data meta-analysis. Eur J Cancer 2013; 49: 15651577.
Google Scholar | Medline
39. The GASTRIC group . Benefit of adjuvant chemotherapy for resectable gastric cancer: a meta-analysis. J Am Med Assoc 2010; 303: 17291737.
Google Scholar
40. Renfro, LA, Shang, H, Sargent, DJ. Impact of copula directional specification on multi-trial evaluation of surrogate endpoints. J Biopharm Stat 2015; 25: 857877.
Google Scholar | Medline
41. Clayton, D . A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978; 65: 141151.
Google Scholar | ISI
42. Plackett, RL . A class of bivariate distributions. J Am Stat Assoc 1965; 60: 516522.
Google Scholar | ISI
43. Hougaard, P . A class of multivariate failure time distributions. Biometrika 1986; 73: 671678.
Google Scholar | ISI
44. Gasparrini, A, Armstrong, B, Kenward, M. Multivariate meta-analysis for non-linear and other multi-parameter associations. Stat Med 2012; 31: 38213839.
Google Scholar | Medline | ISI
45. Hougaard, P . Analysis of multivariate survival data, Germany: Springer Verlag, 2000.
Google Scholar
46. Munda M. Beyond the shared frailty model. PhD Thesis, Université Catholique de Louvain, Belgium, http://hdl.handle.net/2078.1/150625%0A (2014, accessed 19 June 2017).
Google Scholar
47. Asmussen, S, Jensen, JL, Rojas-Nandayapa, L. On the Laplace transform of the lognormal distribution. Meth Comput Appl Probab 2014; 18: 441458.
Google Scholar
48. Goutis, C, Casella, G. Explaining the saddlepoint approximation. Am Stat 1999; 53: 216224.
Google Scholar
49. Rotolo F. surrosurv: evaluation of failure time surrogate endpoints in individual patient data meta-analyses (R package version 1.1.15), https://cran.r-project.org/package=surrosurv (2017, accessed 19 June 2017).
Google Scholar
50. Shih, JH, Louis, TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics 1995; 51: 13841399.
Google Scholar | Medline

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top