Background
Restricted mean survival time (RMST) or life expectancy (LE) are common survival outcomes evaluated in health technology assessment (HTA). Moreover, estimation of quality-adjusted life-years is often based on calculating the mean survival times and utility values. Interest typically lies in the experience of patients who are recently diagnosed or participating in ongoing clinical trials. Hence, survival data are usually still immature (right censored). Extrapolation is then required to predict survival beyond follow-up to obtain estimates for RMST or LE.
1 In health economic evaluations, survival extrapolation is typically carried out by fitting the observed survival data with parametric models and subsequently predicting survival probabilities based on estimated model parameters.
2–4 However, the parametric distribution may fit the observed data well but generate poor extrapolation because it may not be sufficiently flexible to capture the underlying shape of a hazard function in the long run.
5,6 An unsuitable choice of the model may lead to biased extrapolations, which may eventually result in inaccurate cost-effectiveness analysis results in HTA.
2Previous recommendations from National Institute for Health and Care Excellence (NICE)’s Technical Support Document 14 suggested that 6 standard parametric models (SPMs)—exponential, Weibull, Gompertz, log-logistic, log-normal, and generalized gamma models—should be considered as selecting appropriate models for survival extrapolation.
3 More recent guidance from NICE’s Technical Support Document 21 published in 2021 provided general recommendations on using flexible parametric models incorporating restricted cubic splines.
5 In this study, we refer to flexible parametric models
7 as spline models. Recent studies have compared multiple statistical models, including SPMs, spline models, cure, mixture, and landmark models for survival extrapolation for patients treated by cancer immunotherapies. However, the results were compared only with the observed survival of 3 y
8 and 5 y.
9 Gray et al.
10 applied SPMs and spline models for survival extrapolation and further compared their performance using long follow-up cancer registry survival data. They suggested spline models should be routinely applied for extrapolating cancer survival data. However, they evaluated only models in an all-cause survival framework (ASF), that is, extrapolation of all-cause hazard (mortality) models, for 10-y survival outcomes instead of longer survival outcomes, such as LE.
10A review by Jackson et al.
11 summarized survival extrapolation approaches integrating external data, including survival extrapolation within a relative survival framework (RSF),
5,6 also referred as an excess hazard (mortality) framework.
12 Relative survival,
, is defined as the all-cause survival of the patients,
, divided by the expected survival of a comparable population free from the disease under study,
, written as
where
is time since diagnosis.
13 The hazard analogue of relative survival is excess hazard. The all-cause hazard,
is the sum of the expected hazard,
, and the excess hazard
, shown as
To extrapolate survival within an RSF, one needs to extrapolate the relative survival and the expected survival, that is, decompose the all-cause hazard into the expected hazard and the excess hazard, and extrapolate them separately. Extrapolation of expected survival can be carried out by projecting the expected hazard, in other words, predicting future general population mortality rates (GPMRs).
6 Extrapolation of relative survival can be done by modeling and predicting the excess hazard.
14 Afterward, one may use the interrelationship between the relative survival, all-cause survival, and expected survival to obtain the extrapolated all-cause survival.
For most cancers, the excess hazard, that is, the mortality attributed to cancer, decreases with time and remains low or reaches zero for a longer period.
15 This characteristic potentially favors fitting a parametric model to capture the underlying excess hazard function and extrapolate it. Furthermore, the expected hazard (from GPMRs) may explain a substantial part of the all-cause hazard in the long-run.
16,17 Therefore, extrapolation within an RSF can be reasonable for projecting long-term hazards. Andersson et al.
6 showed that extrapolating lifetime survival more reliably using spline models within an RSF. Their main focus was on estimating loss in LE rather than short-term survival outcomes, for example, 10-year RMST.
6One goal of extrapolation is to apply appropriate parametric models for predicting survival in randomized clinical trials. Clinical trial survival data usually do not have complete follow-up of the study population, leading to unknown long-term survival outcomes. An alternative to assessing the performance of survival extrapolation is to adopt observational data, for example, population-based cancer registers. The patterns of hazard functions across cancer sites and ages may be largely heterogeneous. Therefore, they offer a variety of survival functions for evaluating extrapolation approaches. Furthermore, cancer registers provide longer follow-up of known survival outcomes, which enables us to compare with survival extrapolations.
10 The Swedish Cancer Registry was established in 1958, and its completeness is greater than 96%.
18 All healthcare workers in Sweden are required to register all incident cancer cases with information such as age, sex, date of diagnosis, type of diagnosis, date of emigration, and so on. Accordingly, this study used these data from the real-world cancer registry to evaluate survival extrapolation methods.
This research aims to investigate the performance of survival extrapolations within an ASF and an RSF using SPMs and spline models for 10-y and lifetime survival outcomes using the Swedish Cancer Registry. To allow for a variety of hazard function shapes, we included 5 cancer sites: 2 sex-specific cancer sites (breast and prostate cancer) and other 3 non–sex-specific ones (colon cancer, melanoma, and chronic myeloid leukemia). Furthermore, we limited the follow-up time of the survival data and extrapolated by parametric models in an ASF and an RSF to compare with the observed Kaplan–Meier (K-M) survival estimates.
Methods
Study Population
This study used the Swedish Cancer Registry to identify patients diagnosed with colon cancer (International Classification of Diseases version 7 code 153.x), breast cancer (code 170.x), malignant melanoma (code 190.x), prostate cancer (code 177.x), and chronic myeloid leukemia (CML) (code 205.1) at 18 to 99 y old in Sweden during 1981–1990, with follow-up until December 31, 2020. Information regarding the date of death was obtained from the Cause of Death Register via the linkage with each patient’s Swedish unique registration number. Patients who emigrated were classified as censored on the date of first emigration. For individuals with multiple primary recorded tumors of the same type, we included only the first diagnosis. Patients diagnosed at autopsy were excluded. For breast cancer, males (<1%) were excluded from the analyses.
Survival Models
To align with the study by Gray et al.,
10 we classified the patients into 3 separate age groups (18–59, 60–69, 70–99 y) to produce a total of 15 cancer cohorts. Each cancer cohort was fitted by the SPMs and spline models within an ASF and an RSF.
14,19,20 The SPMs included a total of 6 models: exponential, Weibull, Gompertz, log-logistic, log-normal, and generalized gamma.
21 The spline models were fitted with restricted cubic splines on 3 different scales: log cumulative hazard, log cumulative odds, and normal equivalent deviate (probit) scale with
degrees of freedom (df).
19 For flexible parametric spline models,
df indicates that
internal knots and 2 boundary knots are applied for the restricted cubic spline function used for the baseline hazard function,
19 and the suggested knot locations are based on the centiles of the distribution of uncensored log event times.
22 To maintain the consistency across models, all the spline models were modelled with 5 df, that is, 4 internal knots.
For survival extrapolation within an RSF, the extrapolation of the expected survival is also required. We calculated the expected survival function with the Ederer I estimator,
13 which can be interpreted as the unbiased survival the patients would have had if they had not been diagnosed with the disease. The Ederer I method calculates the expected survival for the patient population from the GPMRs, stratified by age, sex, and calendar year.
23 We retrieved the GPMRs of Sweden until 2020 from the Human Mortality Database.
24 For mortality beyond 2020, we assumed that they are equal to 2020. Detailed step-by-step instructions on survival extrapolation within an RSF can be found in the tutorial by Sweeting et al.
12Evaluating Survival Extrapolation
To compare with the extrapolations, we used the K-M survival estimates of 40-y follow-up data as the observed survival outcomes. We presented the K-M survival curves with 95% confidence intervals (CIs), and the numbers at risk of each cancer cohort at selected time points. To evaluate predicted 10-y survival outcomes (10-y RMST and survival proportions at 10 y), the follow-up time for all surviving patients was right-censored at 2, 3, and 5 y and extrapolated by the parametric models to 10 y. For lifetime (or 40-y) survival outcomes (LE or 40-y RMST and survival proportions at lifetime or 40 y), the follow-up was right-censored at 2, 3, 5, and 10 y and extrapolated to lifetime (or 40 y). The LE was calculated by integrating the area under the survival curve. Namely, to obtain LE, we require the patients to have been followed until the survival has already reached zero. However, the survival proportions of some younger cancer patients were not yet zero by 40 y, so their 40-y RMST was assessed instead of LE.
We presented boxplots summarizing the differences (extrapolated minus observed) across 15 cancer cohorts (5 cancer sites by 3 age groups). These comparisons were made across 9 models, including 6 SPMs and 3 spline models, withinin either an ASF or an RSF. Varying follow-up time points, including 2, 3, 5, or 10 y, were used for extrapolation to 10 y or lifetime (40 y). Box plots show the median, first quartile, third quartile, interquartile range, and outliers for the difference. Furthermore, to evaluate the precision of the extrapolations, we presented the frequency of models in a table that predicted a difference less than or equivalent to 0.1 y of 10-y RMST, 1% survival proportion at 10 y, 1 y from the observed LE or 40-y RMST, and 1% at lifetime or 40 y, across all groups by time used for extrapolation and survival framework, where the cut points were selected subjectively. Larger numbers of frequencies imply better extrapolation performance.
For visual assessment on the extrapolation performance, we also included all the survival and hazard functions in the Supplementary Materials (
Supplementary Appendices C–F). The observed quantities were from the full follow-up of 10 y or lifetime (or 40 y). The observed survival functions were from the K-M survival estimators. The observed all-cause (or excess) hazard functions were fitted by spline models
7 on log cumulative (excess) hazard scales, with 5 df for baseline effects. The observed expected hazard functions were obtained from the observed all-cause hazard functions subtracted by the observed excess hazard functions.
For presenting the observed hazard function, we justify our choice of using spline models over nonparametric kernel-smoothed estimators based on the following reasons: the sensitivity of the kernel-smoothing method to bandwidth and the assumption of constant hazard over the bandwidth.
25 Notably, the hazard function may change rapidly at the boundaries, which makes the kernel smoothing less suitable. This characteristic may introduce bias especially after long follow-up, where only few patients are at risk. In addition, it is more common to show parametric excess hazard functions in practice instead of nonparametric estimates. Therefore, our preference leans toward employing spline models to present smooth functions of the observed all-cause and excess hazards over the full follow-up duration. It is worth noting that the hazard functions primarily serve for visual assessment on various shapes over time at Appendices C to F. However, the main results involve only comparing the extrapolated and the observed K-M survival curves, which provide a more illustrative insight into how well the model fits the survival data.
Statistical Software
All statistical analyses were performed in Stata version 17.0 (Statacorp, College Station, TX, USA). All SPMs and spline models were fitted and postestimated by the Stata commands
merlin20 and
stpm219 separately.
Discussion
Survival extrapolation is regularly practiced for predicting survival outcomes in health economic evaluations. An advantageous model should ideally fit the survival during the observed period well and extrapolate unbiased survival beyond follow-up. As a result, it is important to select models that adequately capture the hazard functions and make reasonable predictions.
5 Flexible parametric spline models were developed to sufficiently capture the hazard functions with potentially complex shapes,
22,26 but they do not guarantee that the extrapolations are always credible. Despite a longer follow-up period, extrapolating survival using either SPMs or spline models within an ASF may still generate inaccurate predictions on survival estimates at lifetime (
Figures 3 and
4). The underlying reason was that it may be difficult to find a suitable parametric distribution within an ASF to describe the hazard function both in the observed and extrapolation period. Among the 15 cancer cohorts, most of the observed all-cause hazard functions were increasing right after time since diagnosis for a short period, decreasing after the spike, and eventually monotonically increasing again. Some exceptions were monotonically increasing functions over time, for example, age groups 70 to 99 y for breast cancer, melanoma, and prostate cancer (
Supplementary Appendices E and F).
For multiple types of cancer, the relative survival function tends to drop substantially after cancer diagnosis and later approach a plateau over time. In other words, the excess hazard function spikes initially and approximates constant or even zero in the long run.
27 This specific characteristic of cancer survival patterns justifies extrapolating spline models within an RSF, because, for spline models, the log cumulative excess hazard beyond the last boundary knot of restricted cubic splines is a linear trend, which makes the excess hazard function in the tail behave like a Weibull distribution.
6In this study, we maintained consistency across all the spline models by selecting 4 internal knots, 5 df, for each model. Previous studies concluded that with a sufficient number of knots used in spline models fitted on log cumulative hazard scale, at least greater than 1 internal knot (2 df), the ability of splines to approximate complex hazard functions does not heavily depend on the correct choice of the number of knots.
28,29 In addition, Andersson et al.
6 investigated how sensitive the extrapolation is to the chosen number of knots, 5 to 7, that is, 3 to 5 internal knots in the spline models. The results suggested that the extrapolations are not very sensitive to the number of knots.
For 10-y survival outcomes, extrapolations within an RSF were not especially more successful than the same parametric model within an ASF because the extrapolated expected hazard may not yet explain a great part of the all-cause hazard. Instead, it is more important to select models that can capture the hazard shape and predict the shape in the near future, for example, spline models.
For lifetime (or 40-y) survival outcomes, survival extrapolations carried out by models within an RSF generally agreed well with the observed data, especially using spline models. The underlying reason is that the expected hazard from the GPMRs may explain a greater part of the long-term all-cause hazard. In addition, the spline models capture and reasonably predict the underlying excess hazard functions. For predicted survival proportions at lifetime, proportions despite longer follow-up, most of the models within an ASF overestimated survival due to the fact that the long-term extrapolated all-cause hazards were underestimated. On the contrary, the models within an RSF leaned toward underestimating survival, since they may overestimate excess hazards.
Strengths
A major strength of this study is comparing the extrapolated survival functions with data from a population-based cancer registry of empirical 40-y follow-up. We extended the study by Gray et al.
10 by investigating extrapolation using both SPMs and spline models within not only an ASF but also an RSF. We evaluated both 10-y survival outcomes and lifetime (or 40-y) outcomes. Gray et al. right-censored the survival data at 20%, 35%, and 50% survival points and extrapolated to 10 y. They found that of the 45 cancer cohorts, the spline models with the lowest Akaike information criterion (AIC) had 23 groups that predicted within 1-mo difference from the observed 10-y RMST, while the SPMs with the lowest AIC had only 9. Instead, our study right-censored the survival data at different time points. Our motivation was that in HTA, clinical trial data are usually followed up until a certain time instead of a certain censoring proportion. For 10-y RMST, within an ASF, we also observed that the spline models generally had a higher frequency of predicting difference within 0.1 y than the SPMs did. We reproduced the spline models within an RSF by Andersson et al.
6 using different cancer cohorts from the same cancer registry. For predicting survival to lifetime, Andersson et al.
6 showed that extrapolating the spline models within an RSF to lifetime using 10 y of follow-up had a difference of −0.23 y for colon cancer aged 60 to 69 y, while using an ASF had a difference of 2.13 y. Our study presented that, for colon cancer aged 60 to 69 y, the spline models within an RSF using 10 y had differences ranging from −0.09 to −0.08 y, and an ASF had differences ranging from 3.37 to 3.74 y (
Supplementary Table B3).
Sweeting et al.
12 investigated the use of excess hazard methods in survival extrapolation. They applied SPMs within an excess hazard framework, that is, the RSF in our study. They used breast cancer survival data with a maximum follow-up of 7.3 y. Our study extended this study by including not only SPMs but also spline models within an all-cause hazard (all-cause survival) and an excess hazard (relative survival) framework. Furthermore, considering the heterogeneity of cancer survival, we investigated the extrapolation performance by using a real-word cancer registry data of 5 cancer types across 3 age groups with a maximum follow-up of 40 y.
Another study by van Oostrum et al.
17 assessed the incorporation GPMRs for extrapolating survival in HTA. They recommended the additive hazards approach, that is, extrapolation within an RSF in our study. In addition, they investigated a variety of other approaches, including converging hazards (also known as imposing statistical cure), external additive hazards, and proportional hazards, which are plausible only under certain scenarios.
17 Our study corroborated this study by including the generalized gamma model and the spline models that incorporate GPMRs into the investigation. We showed that only certain SPMs, such as the Gompertz model, performed similarly well as the spline models when extrapolating within an RSF to lifetime.
Limitations
This study, however, has certain limitations. First, the generalizability of this study to health economic evaluations in randomized clinical trial settings may depend on the hazard shape of the disease of interest. For example, the all-cause hazard function may have multiple turning points. The study populations were selected from various cancer sites from the nationwide Swedish Cancer Registry. However, we argue that a total of 15 cancer cohorts had a wide range of heterogeneity in the underlying hazard functions over time. Thus, our results may provide valuable insights into evaluating survival extrapolations on diseases characterized by other complex hazard shapes. Further evaluation is needed when considering the further advancement of treatments on cancer survival. Moreover, additional analyses are require to generalize the findings to other diseases than cancers. Second, clinical trial data usually have much smaller sample sizes than population-based disease registers do. Consequently, the accuracy of extrapolation may need to be addressed under these data-limited scenarios. However, our results also contained chronic myeloid leukemia, which had similarly lower sample sizes (less than 400 in each age group) than other cancer types. Kearns
30 conducted a simulation study to evaluate the impacts of follow-up time used and sample sizes on extrapolation by different models, including spline models. The author concluded that varying lengths of follow-up had a greater influence on survival extrapolation than varying sample sizes in the defined simulation settings.
30 We suggest that future research should extend this study to investigate the uncertainty on extrapolated survival estimates due to sample size reduction in both contexts of population-based disease registers and clinical trials. Third, the uncertainty in the extrapolated survival estimates can also come from the expected hazards. In this study, we assumed that the expected hazards, obtained from the GPMRs, were based on the whole general population in Sweden and constant for the same age, sex, and calendar year. However, it has been shown that if estimates of expected measures are not based on the entire population, then the uncertainty in GPMRs should be taken into account.
31Other models, such as cure fraction models
32–35 and mixture-cure models,
9,36 were also applied to evaluate survival extrapolation in HTA. This article mainly focuses on exploring fitting survival data with a variety of parametric distributions within an ASF and an RSF. Discussion on cure fractions and their impact on survival prediction is beyond scope. Hwang et al.
37 proposed the rolling extrapolation algorithm to extrapolate the logit transformation of the relative survival ratio. This algorithm also incorporates spline models and the RSF. It has been applied to estimate loss of LE due to cancer
38,39 as well as other diseases.
40,41 Further investigations may compare this approach with other existing survival extrapolation methods, especially within an RSF.
Predicting survival from short follow-up time is a demanding task in essence. To select an appropriate model, one should consider not only the goodness-of-fit of the model during observed time but also the credibility of extrapolations beyond follow-up.
42 Special caution should always be paid as extrapolating survival to lifetime for younger patients or from shorter follow-up time. To enhance the credibility of the extrapolation by the chosen model, researchers should carefully select the model and conduct rigorous sensitivity analysis.