Abstract
Surrogate endpoints are often used in clinical trials instead of well-established hard endpoints for practical convenience. The meta-analytic approach relies on two measures of surrogacy: one at the individual level and one at the trial level. In the survival data setting, a two-step model based on copulas is commonly used. We present a new approach which employs a bivariate survival model with an individual random effect shared between the two endpoints and correlated treatment-by-trial interactions. We fit this model using auxiliary mixed Poisson models. We study via simulations the operating characteristics of this mixed Poisson approach as compared to the two-step copula approach. We illustrate the application of the methods on two individual patient data meta-analyses in gastric cancer, in the advanced setting (4069 patients from 20 randomized trials) and in the adjuvant setting (3288 patients from 14 randomized trials).
1 Introduction
Surrogate endpoints are often used in clinical trials instead of well-established endpoints for practical convenience: they are usually cheaper, more rapid, easier, or less invasive to measure.1–3 To validate a new criterion as a surrogate of an other one, the approach endorsed by the food and drug administration (FDA)4 relies on the analysis of two endpoints from multiple clinical trials. Two criteria must be fulfilled for an endpoint to be a valid surrogate:5,6 a strong association at the individual level between the true and the surrogate endpoints and a strong association at the trial level between the effects of the treatment on the true and on the surrogate endpoints.
For normally distributed endpoints, Buyse et al.7 proposed to measure individual-level surrogacy in terms of and trial-level surrogacy using the . Burzykowski et al.8 adapted this approach to the case that both the surrogate and true endpoints are failure time variables. Their estimation strategy consists of two steps: in the first step, a copula model is used to measure the individual-level surrogacy in terms of Kendall’s τ9 and to estimate the treatment effects on each endpoint and for each trial; in the second step, the is computed from the linear regression of the treatment effects estimated in the first step, accounting for the measurement error resulting from using estimated effects, the true values of which are unknown.10
This approach has been successfully employed in several applications in oncology11–20 and is commonly considered the reference method.4,21 Nevertheless, the two-step estimation procedure makes a less efficient use of the information present in the data when estimating the trial-level R2 as compared to the original approach by Buyse et al.7 Thus, research in this field has been very dynamic for 15 years.22 For example, Shi et al.23 compared the two-stage approach to conventional measures based on marginal models, Renfro et al.24 proposed to perform the second-stage evaluation within a Bayesian framework, Ghosh et al.25 explored a semicompeting risks strategy within the accelerated failure time model, and Alonso and Molenberghs26 adopted an information theory perspective.
Mixed proportional hazard models, also known as frailty models,27–29 seem the most natural framework for adapting the meta-analytic approach by Buyse et al.7 to the case of failure time endpoints.30,31 Nonetheless, estimation of frailty models with complex structures of random effects can be computationally very intensive. The connection between the proportional hazard model and the Poisson regression model has long been known32,33 and has proven valid and useful also in the case of dependent observations.34–36 The use of such auxiliary Poisson models is particularly convenient in the presence of several and/or nested random effects and has been proposed for the analysis of a single failure time endpoint in an individual patient data meta-analysis.37
In the present paper, we consider a bivariate auxiliary Poisson mixed model to estimate the parameters of the bivariate frailty model with individual random effects and random trial-by-treatment interactions. In the same spirit of the original approach by Buyse et al.,7 individual-level and trial-level surrogacy measures are obtained jointly within the same model, based on well-established methods for generalized linear mixed models. We compare via simulations the Poisson approach to the two-step copula approach with Clayton, Plackett, and Hougaard copulas, with or without measurement-error adjustment in the second-step linear regression of the treatment effects. We illustrate the methods using individual patient data of two meta-analyses of randomized trials: (i) a meta-analysis of 4069 patients from 20 advanced gastric cancer trials;38 and (ii) a meta-analysis of 3288 patients from 14 adjuvant gastric cancer trials.39
2 Motivating examples
The GASTRIC (Global Advanced/Adjuvant Stomach Tumor Research International Collaboration) group studied the role of (adjuvant) chemotherapy in advanced38 and in resectable39 gastric cancers. In both cases, the main endpoint was overall survival (OS), but the analysis also included progression-free survival (PFS) and disease-free survival (DFS), respectively. Figure 1 shows the survival curves for OS and for the candidate surrogates PFS and DFS in the experimental and control arms. Their validity as surrogate endpoints for OS was assessed in both the advanced17 and the adjuvant16 case via the two-stage copula approach. In both cases, the Spearman’s ρ was reported as measure of individual-level surrogacy. As the Plackett copula function was used to model the dependence between the endpoints, we were able to recover numerically the estimate of the Kendall’s τ based on the estimate of ρ (supplementary Figure S1).
2.1 Advanced GASTRIC meta-analysis
The advanced GASTRIC meta-analysis38 included 4069 patients from 20 trials. It has been previously reported17 that the rank correlation coefficient between PFS and OS was 0.85 (95% CI: 0.85–0.85) as estimated by a Plackett copula, corresponding to a Kendall’s τ of 0.68 (0.68–0.68). The R2 between the treatment effects on PFS and on OS, adjusted for measurement error, was 0.61 (95% CI: 0.04–1.00) so that the validity of PFS as surrogate endpoint for OS could not be confirmed for chemotherapy trials of patients with advanced gastric cancer. Supplementary Figure S2 shows the estimated treatment effects on OS versus the estimated treatment effects on PFS.
2.2 Adjuvant GASTRIC meta-analysis
The adjuvant GASTRIC meta-analysis39 included 3288 patients from 14 trials. The rank correlation coefficient between DFS and OS reported by Oba et al.16 was 0.97 (95% CI: 0.97–0.98) as estimated by a Plackett copula, corresponding to τ = 0.88 (0.88–0.89). Because of a high amount of estimation error, the between-trial R2 estimate was highly numerically unstable and, hence, unreliable. The unadjusted R2 was 0.96 (95% CI: 0.93–1.00) so that DFS was deemed valid as surrogate endpoint for OS in adjuvant trials of gastric cancer patients. However, due to such high association, the estimated R2 was close to the upper limit 1 and the obtained numerical results needed to be interpreted with caution.
3 Methods to evaluate failure time surrogate endpoints
Let Tij and Sij be the times to the true and the surrogate endpoints, respectively, for patient in trial . Let Zij be the indicator of the treatment arm to which the j-th patient in the i-th trial has been randomized. In the following subsections, we recall briefly the two-step copula approach and we present the newly explored Poisson approach.
3.1 Two-step copula approach
Burzykowski et al.8 extended the meta-analytic approach of Buyse et al.7 to the case of failure time endpoints. Their estimation approach consists in two steps, one for the individual level and one for the trial level.
At the first step, the marginal distributions of the two endpoints are modeled using the following bivariate proportional hazard model:
| (1) |
The dependence parameter of the copula model θ can be reparametrized into the Kendall’s τ as measure of individual-level surrogacy. We considered Weibull marginal hazards and three copula functions. The Clayton41 copula is defined as , with and Kendall’s . The Plackett42 copula is defined as , with , , and ; the Kendall’s τ is computed numerically for this family as no analytical expression is available. The Hougaard43 copula is , with and the Kendall’s .
At the second step of the estimation algorithm, the treatment effect estimates obtained at the first step are assumed to follow the mixed model
| (2) |
| (3) |
| (4) |
The trial-level surrogacy is measured in terms of . This is computed in a linear regression of the βi’s over the αi’s with measurement error adjustment, by fixing the ’s at their estimates from the first step.10,44 In practice, this adjusted (for measurement error) can be difficult to compute. Whenever this error-in-variables linear regression cannot be reliably fitted, the unadjusted is often obtained using a linear regression—equivalent to fixing all the elements of equal to 0—by weighting the observations by the trial size, in order to account somehow indirectly and approximately for measurement error. The number of events could also be used for weighting, but using the trial size is what is done in practice in clinical applications.
3.2 Poisson approach
Let us consider the semiparametric Cox model, . By dividing the time scale into intervals corresponding to the intervals between the observed event times, the so-called auxiliary Poisson model for the event indicator in the k-th interval is , where is the time spent at risk during the period k.32,37 In such a model, the parameters are the logarithms of the baseline risk rates during each interval k, under the assumption that the risk is constant within the interval. In practice, different time intervals can be employed, leading to an approximation of the estimators of the Cox model parameters.
In meta-analyses, each subject j is recruited within a trial i. To account for both within-trial and within-subject dependence, the joint model for the two endpoints can be expressed as
| (5) |
| (6) |
3.2.1 Individual-level surrogacy
In the context of Gaussian linear models, Buyse et al.7 assumed correlated but distinct individual error terms to derive of . In the case of failure time endpoints, such would not express the association between , but between the random effects which modulate their hazard functions, which is a more indirect way of measuring dependence. For failure time endpoints, the two-step copula approach measures the individual association in terms of Kendall’s τ (see Section 3.1). The use of the same random term uij in model (5) (and (6)) provides us with a shared frailty, the estimated variance of which ( ) can be used to compute the Kendall’s between , where are the Laplace transform of the frailty distribution and its second derivative.28 Nevertheless, an analytic expression of is not available for the log-normal frailty distribution and Hougaard45 suggested to approximate it analytically or by simulations. Following Munda46 and Asmussen et al.,47 we used the Laplace approximation.48
3.2.2 Trial-level surrogacy
In model (6), we assumed that the trial-specific treatment effects are bivariate normal random variables, consistently with the two-step copula model (3). Such choice provides us with the correlation between the treatment effect on the surrogate endpoint S and the effect on the true endpoint T, which can be translated straightforwardly into the coefficient of determination , also referred to simply as R2 in the rest of the paper.
3.2.3 Reduced Poisson models
In our work, we considered reduced models in order to deal with the problems of computational instability one may encounter in fitting the full model (6). The Poisson model TI incorporates both random trial-treatment interactions and individual random effects uij, but assumes common baselines between trials ( , ). The Poisson model TIa accounts for trial-specific baseline risks using shared random effects at the trial level: , , .
4 Simulation study
We compared in simulations the proposed mixed Poisson approach to the two-step copula approach. We considered the Clayton, Plackett, and Hougaard copula families, with and without measurement error adjustment at the second step.
We generated data under the mixed proportional hazard models and the Clayton copula. Full details of these two data generating methods are available in the Supplementary Material.
All the methods for model fitting and data generation are implemented in the R package surrosurv,49 publicly available from the CRAN.
4.1 Simulation design
4.1.1 Simulation parameters
Exponentially distributed times were simulated with baseline rates fixed to and in order to obtain median survival times of four and eight years for , respectively. The trial effects had null means, variances , and correlation . The treatment effects had means equal to , variances , and correlation depending on the scenario. Administrative censoring was added at 15 years, leading to about 40% of censoring for S and 20% for T on average across scenarios.
In different scenarios, we varied the number (N = 10, 20, 40) and size ( ) of the trials, and the values of R2 ( ) and of Kendall’s τ ( ). In the six main scenarios (Table 1), the trials had fixed size ( , or 125). In supplementary scenarios 3s and 4s, the same number of patients ( ) was randomly assigned to N = 20 trials (95% of trials had 91 to 445 patients). These two scenarios served the purpose of testing whether variable trial sizes play any role, as compared to fixed size in scenarios 3 and 4. In a sensitivity analysis, we explored a scenario reflecting the strong association observed in the adjuvant GASTRIC meta-analysis (Section 5.2): , with .
|
Table 1. Simulation scenarios.

Five hundred simulated meta-analyses were considered in each scenario. Simulations were replicated with data performed both using a mixed proportional hazard model and a Clayton copula model.
4.1.2 Evaluation criteria
The main parameter of interest was the trial-level surrogacy measure R2; the secondary one was the individual-level surrogacy measure, the Kendall’s τ. For these two parameters, we computed in each scenario (i) the relative bias, with respect to the true parameter value, and (ii) the relative mean squared error, relative to the adjusted Clayton copula, chosen as reference method. We considered that a model converged to a reliable solution if the maximum absolute gradient was and, for adjusted models, the eigenvalues of the variance–covariance matrix of the trial-treatment interaction were all .
4.2 Results
Table 2 reports the results of simulations for the adjusted copula models and the Poisson models when data were generated using the bivariate mixed proportional hazard model.
|
Table 2. Results with data simulated with mixed proportional hazard models.

The trial-level R2 (see also Figure 2) was often well estimated by the Clayton and Plackett copula models when the R2 was moderate (0.4) and underestimated by 10%–23% when it was high (0.8). The Hougaard model overestimated moderate R2 by 13%–32%, but had lower bias when (7%–9%, downwards). Of note, the Hougaard copula model is probably the most different model from the data generating process used to generate the data, which can explain at least part of this bias. The Poisson TI model was often the model with the smallest bias and MSE when the R2 was high, with bias % and MSE as low as 0.46 times the Clayton copula (used as reference). Nevertheless, this model showed 11%–28% of bias (upwards) when the R2 was low. The Poisson TIa model provided the least biased estimates in the scenarios with high R2 and high τ, but its results were mixed in other cases. The MSE of all methods for the estimation of the R2 were similar to each other across different scenarios.

Figure 2. Relative bias (with respect to the true value) of the estimate of R2 with data generated with mixed proportional hazard models. N: number of trials. ni: number of patients per trial.
The estimates of the Kendall’s τ (Table 2, supplementary Figure S3) were not or little biased for all methods except the Hougaard copula model, which tended to overestimate (+16% – 17%) the τ when it was moderate (0.4) and to underestimate it (−42%) when it was high (0.6). Also, its MSE was extremely high as compared to the Clayton copula model (up to -fold higher when ).
Unadjusted copula models (supplementary Table S1) often underestimated the R2 (bias as high as 49%), and in general, the bias was higher than the associated adjusted counterparts. On the other hand, such increased bias for unadjusted copula models came with excellent convergence rates (see supplementary Table S2). Among models with measurement-error adjustment, the Clayton and Plackett copula models showed the highest convergence rates (see also supplementary Figure S4); the Hougaard was the copula model with the worst convergence rates, in particular with several small trials. Within the Poisson family of models, the Poisson TI often had the highest convergence rates, in particular, when .
The results with variable trial sizes (scenarios 3s and 4s, Table S3) were very similar to those in meta-analyses of fixed-size trials with the same total number of trials and patients (scenarios 3 and 4).
In summary, in this first part of simulations, the adjusted Clayton model showed the best results when , while the Poisson TI model was the most accurate and precise when .
When data were generated from the Clayton copula model (Table 3 and Tables S4 and S5; Figure 3), the true model recovered the R2 with no or small bias in general, showing the worst results with few trials (10), high τ (0.6), and low R2 (0.4). The Clayton model recovered the Kendall’s τ without bias in all the scenarios (Figure S5). The Plackett and Hougaard models showed serious convergence issues in this case (supplementary Table S6 and Figure S6). The former model had convergence rate <10% in all the scenarios, whereas the latter converged to a reliable result in ≥10% of the replicates only when . Of note, also in this case, the Plackett and Hougaard copulas were misspecified models with respect to the Clayton copula used to generate the data.
|
Table 3. Results with data simulated with Clayton copula.


Figure 3. Relative bias (with respect to the true value) of the estimate of R2 with data generated with Clayton copulas. The estimated value is not plotted whenever the convergence rate was lower than 10%. N: number of trials. ni: number of patients per trial.
Models Poisson TI and TIa, which often converged in at least 80% of the replicates, performed similarly to the Clayton copula, with absolute bias ≤15% and very often ≤5%, notably with higher values of N. On the other hand, the Poisson models estimated the Kendall’s τ with (downwards) bias of 10%–21% and high MSE as compared to the Clayton model (relative mean squared error as high as ).
In the sensitivity analyses mimicking the association observed in the adjuvant GASTRIC meta-analysis, the results (Table S7) confirmed convergence issues when dependence is high. Such issues can heavily depend on the data generating process. The convergence rates of the adjusted Plackett model were 30% and 0% with data generated by mixed proportional hazard models and Clayton copulas, respectively, while for the Poisson models, they were 0% and 63%–73%. The R2 was estimated accurately in general, except for the Clayton copula when the data were generated by mixed proportional hazard models (18% downwards). The Hougaard copula strongly underestimated the Kendall’s τ (−73% and −64%).
5 Data analyses
In this section, we show the results of the copula and the Poisson models for the advanced and the adjuvant GASTRIC meta-analyses introduced in Section 2. In both cases, we refitted all the models on 250 bootstrapped data sets, in order to compute confidence intervals for .
5.1 Advanced GASTRIC meta-analysis
Table 4 shows the estimates of the trial-level R2 and of the individual-level Kendall’s τ obtained with the models described in the paper. The Kendall’s τ for the previously published results was recovered based on the published Spearman’s ρ, according to the Plackett’s copula function employed therein (see Section 2 and Figure S1).
|
Table 4. Surrogacy results. Advanced GASTRIC meta-analysis.

The estimates of trial-level surrogacy from adjusted copula models (0.41 [95% CI: 0.15–1.00], 0.40 [0.04–1.00], and 0.38 [0.01–1.00] for Clayton, Plackett, and Hougaard, respectively) were lower than those from their unadjusted equivalents (0.45 [0.28–0.74] for all the three). The estimate from model Poisson TI was (95% CI: 0.32–1.00). The model Poisson TIa, which had the best convergence metrics of all models, provided the highest estimate of R2 = 0.83 (95% CI: 0.24–1.00).
Individual-level surrogacy was the highest for the Clayton ( , 95% CI: 0.59–0.62) and the Plackett ( , 95% CI: 0.60–0.63) copulas. The Hougaard copula provided the lowest estimate (0.32, 95% CI: 0.32–0.33), whereas the Poisson models estimated the τ at 0.51 (95% CI: 0.50–0.52). The confidence intervals of all the models were very narrow.
The convergence metrics for the advanced case are provided in Table S8. In this context, the copula models had high maximum gradient ( ), whereas this was much smaller ( ) for the Poisson models. The smallest eigenvalue of the covariance matrix of the random treatment effects was positive for all the adjusted models.
The results above are from auxiliary Poisson models estimated over eight time intervals. Supplementary Figure S7 shows that, even though the estimates of are instable for a small number of intervals, they are quite robust between 8 and 24 time intervals, whereas models failed to converge when 32 intervals were used.
5.2 Adjuvant GASTRIC meta-analysis
When analyzing the adjuvant gastric meta-analysis data, all the models had serious convergence problems (Table S9), with high maximum gradient ( for the copula models, for the Poisson models) and low minimum eigenvalue of the covariance matrix of the random treatment effects ( ). Serious convergence problems were also reported in the original publication by Oba et al.16 As shown in supplementary Figure S8, using a different number of time intervals for the Poisson approach was not superior with regard to this aspect.
The results obtained, which have to be interpreted with high caution, are shown in Table 5. The R2s estimated by copula models with adjusted linear regression ranged from 0.94 (0.08–1.00, Hougaard model) to 1.00 (0.69–1.00, Plackett model). With the Poisson models, the R2 was estimated to be 0.54 (0.02–1.00, model TI) and 1.00 (0.08–1.00, model TIa). The estimates of the Kendall’s τ ranged from 0.74 (95% CI: 0.73–0.76) to 0.82 (0.81–0.83), except for the Hougaard copula (0.18 [0.17–0.19]).
|
Table 5. Surrogacy results. Adjuvant GASTRIC meta-analysis.

Of note, the convergence diagnostics (Table S9) suggest to be very careful in interpreting the point estimates of all models. Furthermore, due to the width of the confidence intervals, it is difficult to make conclusive comparisons between these estimates.
6 Discussion
The evaluation of surrogate endpoints is particularly challenging in the case of failure time endpoints.21,22 Two-step copula models are the state-of-the art statistical methodology, but their estimation is obtained via a two-step algorithm which is regarded as suboptimal as compared to the surrogacy model available in the case of Gaussian endpoints.7
In the present paper, we proposed an alternative estimation strategy, which is based on a single-step estimation as in the Gaussian case.7 This alternative approach exploits the equivalence of parameter estimates between (random effect) proportional hazard models and auxiliary (mixed) Poisson log-linear models. In addition, this model formulation does not need any parametric assumption for the marginal baseline hazard functions, which is the case for the two-step copula approach. The auxiliary mixed Poisson model fits naturally into the GLMM theory and can accomodate complex structures of random effects. Within the same model, both measures of surrogacy (individual-level Kendall’s τ and trial-level R2) can be estimated. We considered simplified variants of the Poisson model to reduce its computational complexity. The model Poisson TIa—which accounts for both trial- and individual-level surrogacy in addition to adjusting for the trial effect on the baseline hazard—is more general, but simulations showed sometimes worse results than model TI. This latter—without trial adjustment on the baseline hazard—proved to be sufficiently complex to account for the main sources of dependence and sufficiently simple to be estimated in most cases.
In this paper, we considered three families of copulas (Clayton, Plackett, and Hougaard), which allowed to highlight how misspecification of the parametric family can have a huge impact on their convergence rate. We also showed that the adjustment for measurement error at the second step reduces estimation bias in many cases, despite lowering the convergence rates. In our simulations, the Clayton copula family proved to be quite robust to different parameters of the data generating process, with some underestimation of the R2 when the true value was high and the data came from a very different data generating process. Recently, Renfro et al.40 argued that applying the copula function on the cumulative incidence functions instead of the survival function could be more appropriate. In addition, they suggested that using the two-stage estimation procedure by Shih and Louis50 within the first step of the estimation algorithm of the copula models could reduce estimation bias. Nevertheless, we chose to adopt the methods by Burzykowski et al.8 in their original form, as commonly applied in numerous clinical publications11–20 over the last 15 years, including a surrogate evaluation conducted by the US FDA.4
We explored three main scenarios with various numbers and sizes of trials. For a fixed total number of patients, we observed similar results (for all the methods) in meta-analyses of several small trials and in those of few big trials. The only exception was the Hougaard copula model, which converged more often when the trials were big than when they were numerous.
In our work, we considered a new approach to the evaluation of failure time surrogate endpoints, based on mixed Poisson models. We studied this approach in extensive simulations and compared it head to head to several copula models. We showed that the convergence rate and the estimation results vary according to the misspecification of the model and to the degree of dependence in the data. The Poisson approach generally provided the best results when the correlation was high. Adjustment of the baseline risk on the trial (model TIa) performed worse than the simpler model (TI) without adjustment. The Clayton model showed results among the most robust of all methods, notably when the dependence was moderate, but it was often the closest to the true data generating process. In real applications, we suggest that the model choice should take into account the goodness of fit of the different parametric families, as well as convergence metrics of the models estimated on the data at hand.
Evaluating a failure time surrogate endpoint can be quite challenging in real-life applications. The auxiliary Poisson approach is one additional statistical tool, which broadens the range of available models among which the best fitting one can be chosen.
Acknowledgements
The authors thank the GASTRIC (Global Advanced/Adjuvant Stomach Tumor Research International Collaboration) Group for permission to use their data. The investigators who contributed to GASTRIC are listed in references [16, 17, 38, and 39]. The GASTRIC Group data are available within the surrosurv package for research purposes, under the conditions that (1) the research be scientifically appropriate, (2) the confidentiality of individual patient data be protected, (3) the results of the analyses be shared with the GASTRIC Group prior to public communication, (4) the source of data be fully acknowledged as above, and (5) resulting data and results be further shared with the research community.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The present work has been supported by the Institut National du Cancer (INCa), Grant SHS 2014-141, and by the Ligue Nationale Contre le Cancer. The study sponsors had no involvement in either the study design; the collection, analysis, and interpretation of data; the writing of the manuscript; nor in the decision to submit the manuscript for publication.
References
| 1. | Burzykowski, T, Molenberghs, G, Buyse, M. The evaluation of surrogate endpoints, New York, NY: Springer Science & Business Media, 2006. Google Scholar |
| 2. | Chakravarty, A, Sridhara, R. Use of progression-free survival as a surrogate marker in oncology trials: some regulatory issues. Stat Methods Med Res 2008; 17: 515–518. Google Scholar | SAGE Journals | ISI |
| 3. | Buyse, M, Molenberghs, G, Paoletti, X Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J 2016; 58: 104–132. Google Scholar | Medline |
| 4. | Blumenthal, GM, Karuri, SW, Zhang, H Overall response rate, progression-free survival, and overall survival with targeted and standard therapies in advanced non-small-cell lung cancer: US food and drug administration trial-level and patient-level analyses. J Clin Oncol 2015; 33: 1008–1014. Google Scholar | Medline |
| 5. | Daniels, MJ, Hughes, MD. Meta-analysis for the evaluation of potential surrogate markers. Stat Med 1997; 16: 1965–1982. Google Scholar | Medline | ISI |
| 6. | Buyse, M, Molenberghs, G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 1998; 54: 1014–1029. Google Scholar | Medline | ISI |
| 7. | Buyse, M, Molenberghs, G, Burzykowski, T The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 2000; 1: 49–67. Google Scholar | Medline |
| 8. | Burzykowski, T, Molenberghs, G, Buyse, M Validation of surrogate end points in multiple randomized clinical trials with failure time end points. J Roy Stat Soc Appl Stat 2001; 50: 405–422. Google Scholar | ISI |
| 9. | Kendall, MG . A new measure of rank correlation. Biometrika 1938; 30: 81–93. Google Scholar |
| 10. | van Houwelingen, HC, Arends, L, Stijnen, T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 2002; 21: 589–624. Google Scholar | Medline | ISI |
| 11. | Sargent, DJ, Wieand, HS, Haller, DG Disease-free survival versus overall survival as a primary end point for adjuvant colon cancer studies: Individual patient data from 20,898 patients on 18 randomized trials. J Clin Oncol 2005; 23: 8664–8670. Google Scholar | Medline | ISI |
| 12. | Buyse, M, Burzykowski, T, Carroll, K Progression-free survival is a surrogate for survival in advanced colorectal cancer. J Clin Oncol 2007; 25: 5218–5224. Google Scholar | Medline | ISI |
| 13. | Burzykowski, T, Buyse, M, Piccart-Gebhart, MJ Evaluation of tumor response, disease control, progression-free survival, and time to progression as potential surrogate end points in metastatic breast cancer. J Clin Oncol 2008; 26: 1987–1992. Google Scholar | Medline | ISI |
| 14. | Buyse, M, Burzykowski, T, Michiels, S Individual- and trial-level surrogacy in colorectal cancer. Stat Methods Med Res 2008; 17: 467–75. Google Scholar | SAGE Journals | ISI |
| 15. | Michiels, S, Le Maître, A, Buyse, M Surrogate endpoints for overall survival in locally advanced head and neck cancer: meta-analyses of individual patient data. Lancet Oncol 2009; 10: 341–350. Google Scholar | Medline | ISI |
| 16. | Oba, K, Paoletti, X, Alberts, S Disease-free survival as a surrogate for overall survival in adjuvant trials of gastric cancer: a meta-analysis. J Natl Canc Inst 2013; 105: 1600–1607. Google Scholar | Medline |
| 17. | Paoletti, X, Oba, K, Bang, YJ Progression-free survival as a surrogate for overall survival in advanced/recurrent gastric cancer trials: a meta-analysis. J Natl Canc Inst 2013; 105: 1667–1670. Google Scholar | Medline |
| 18. | Mauguen, A, Pignon, JP, Burdett, S Surrogate endpoints for overall survival in chemotherapy and radiotherapy trials in operable and locally advanced lung cancer: a re-analysis of meta-analyses of individual patients’ data. Lancet Oncol 2013; 14: 619–626. Google Scholar | Medline | ISI |
| 19. | Michiels, S, Pugliano, L, Marguet, S Progression-free survival as surrogate end point for overall survival in clinical trials of HER2-targeted agents in HER2-positive metastatic breast cancer. Ann Oncol 2016; 27: 1029–1034. Google Scholar | Medline |
| 20. | Rotolo, F, Pignon, JP, Bourhis, J Surrogate endpoints for overall survival in loco-regionally advanced nasopharyngeal carcinoma: an individual patient data meta-analysis. J Natl Canc Inst 2017; 109. DOI:10.1093/jnci/djw239. Google Scholar | Medline |
| 21. | Green, E, Yothers, G, Sargent, DJ. Surrogate endpoint validation: statistical elegance versus clinical relevance. Stat Methods Med Res 2008; 17: 477–486. Google Scholar | SAGE Journals | ISI |
| 22. | Burzykowski, T . Surrogate endpoints: wishful thinking or reality? Stat Methods Med Res 2008; 17: 463–466. Google Scholar | SAGE Journals | ISI |
| 23. | Shi, Q, Renfro, LA, Bot, BM Comparative assessment of trial-level surrogacy measures for candidate time-to-event surrogate endpoints in clinical trials. Comput Stat Data Anal 2011; 55: 2748–2757. Google Scholar | ISI |
| 24. | Renfro, LA, Shi, Q, Sargent, DJ Bayesian adjusted R2 for the meta-analytic evaluation of surrogate time-to-event endpoints in clinical trials. Stat Med 2012; 31: 743–761. Google Scholar | Medline |
| 25. | Ghosh, D, Taylor, JMG, Sargent, DJ. Meta-analysis for surrogacy: accelerated failure time models and semicompeting risks modeling. Biometrics 2012; 68: 226–232. Google Scholar | Medline |
| 26. | Alonso, A, Molenberghs, G. Evaluating time to cancer recurrence as a surrogate marker for survival from an information theory perspective. Stat Methods Med Res 2008; 17: 497–504. Google Scholar | SAGE Journals | ISI |
| 27. | Vaupel, JW, Manton, KG, Stallard, E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 1979; 16: 439–454. Google Scholar | Medline | ISI |
| 28. | Duchateau, L, Janssen, P. The frailty model, New York, NY: Springer Verlag, 2008. Google Scholar |
| 29. | Wienke, A . Frailty models in survival analysis. (Chapman & Hall/CRC biostatistics series), Boca Raton, FL: Taylor and Francis, 2010. DOI:10.1201/9781420073911. Google Scholar |
| 30. | Burzykowski, T, Cortiñas Abrahantes, J Validation in the case of two failure-time endpoints. In: Burzykowski, T, Molenberghs, G, Buyse, M (eds). The evaluation of surrogate endpoints, New York, NY: Springer, 2005, pp. 163–194. Google Scholar |
| 31. | Rondeau, V, Pignon, JP, Michiels, S. A joint model for the dependence between clustered times to tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat Methods Med Res 2011; 24: 711–729. Google Scholar | SAGE Journals |
| 32. | Whitehead, J . Fitting Cox’s regression model to survival data using GLIM. J Roy Stat Soc C Appl Stat 1980; 29: 268–275. Google Scholar |
| 33. | Laird, N, Olivier, D. Covariance analysis of censored survival data using log-linear analysis techniques. J Am Stat Assoc 1981; 76: 231–240. Google Scholar | ISI |
| 34. | Ma, R, Krewski, D, Burnett, RT. Random effects Cox models: a Poisson modelling approach. Biometrika 2003; 90: 157–169. Google Scholar |
| 35. | Feng, S, Nie, L, Wolfe, RA. Laplace’s approximation for relative risk frailty models. Lifetime Data Anal 2009; 15: 343–356. Google Scholar | Medline |
| 36. | Hirsch, K, Wienke, A, Kuss, O. Log-normal frailty models fitted as Poisson generalized linear mixed models. Comput Meth Programs Biomed 2016; 137: 167–175. Google Scholar | Medline |
| 37. | Crowther, MJ, Riley, RD, Staessen, JA Individual patient data meta-analysis of survival data using Poisson regression models. BMC Med Res Methodol 2012; 12: 34–34. Google Scholar | Medline | ISI |
| 38. | The GASTRIC group . Role of chemotherapy for advanced/recurrent gastric cancer: an individual-patient-data meta-analysis. Eur J Cancer 2013; 49: 1565–1577. Google Scholar | Medline |
| 39. | The GASTRIC group . Benefit of adjuvant chemotherapy for resectable gastric cancer: a meta-analysis. J Am Med Assoc 2010; 303: 1729–1737. Google Scholar |
| 40. | Renfro, LA, Shang, H, Sargent, DJ. Impact of copula directional specification on multi-trial evaluation of surrogate endpoints. J Biopharm Stat 2015; 25: 857–877. Google Scholar | Medline |
| 41. | Clayton, D . A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978; 65: 141–151. Google Scholar | ISI |
| 42. | Plackett, RL . A class of bivariate distributions. J Am Stat Assoc 1965; 60: 516–522. Google Scholar | ISI |
| 43. | Hougaard, P . A class of multivariate failure time distributions. Biometrika 1986; 73: 671–678. Google Scholar | ISI |
| 44. | Gasparrini, A, Armstrong, B, Kenward, M. Multivariate meta-analysis for non-linear and other multi-parameter associations. Stat Med 2012; 31: 3821–3839. Google Scholar | Medline | ISI |
| 45. | Hougaard, P . Analysis of multivariate survival data, Germany: Springer Verlag, 2000. Google Scholar |
| 46. | Munda M. Beyond the shared frailty model. PhD Thesis, Université Catholique de Louvain, Belgium, http://hdl.handle.net/2078.1/150625%0A (2014, accessed 19 June 2017). Google Scholar |
| 47. | Asmussen, S, Jensen, JL, Rojas-Nandayapa, L. On the Laplace transform of the lognormal distribution. Meth Comput Appl Probab 2014; 18: 441–458. Google Scholar |
| 48. | Goutis, C, Casella, G. Explaining the saddlepoint approximation. Am Stat 1999; 53: 216–224. Google Scholar |
| 49. | Rotolo F. surrosurv: evaluation of failure time surrogate endpoints in individual patient data meta-analyses (R package version 1.1.15), https://cran.r-project.org/package=surrosurv (2017, accessed 19 June 2017). Google Scholar |
| 50. | Shih, JH, Louis, TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics 1995; 51: 1384–1399. Google Scholar | Medline |

