Bivariate copula regression models for semi-competing risks

Time-to-event semi-competing risk endpoints may be correlated when both events occur on the same individual. These events and the association between them may also be influenced by individual characteristics. In this article, we propose copula survival models to estimate hazard ratios of covariates on the non-terminal and terminal events, along with the effects of covariates on the association between the two events. We use the Normal, Clayton, Frank and Gumbel copulas to provide a variety of association structures between the non-terminal and terminal events. We apply the proposed methods to model semi-competing risks of graft failure and death for kidney transplant patients. We find that copula survival models perform better than the Cox proportional hazards model when estimating the non-terminal event hazard ratio of covariates. We also find that the inclusion of covariates in the association parameter of the copula models improves the estimation of the hazard ratios.


Introduction
Often in medical studies, patients who are lost to follow-up, or do not experience the event of interest during the study period, leave a censored observation.However, with semi-competing risk endpoints, the non-terminal event has the possibility of also being censored by the terminal event (Fine et al., 2001).One example of semi-competing risks, which will be analysed later in this paper, is graft failure and death following renal transplant, where the non-terminal event is graft failure and the terminal event is death.
The Cox proportional hazards model is widely used in practice to analyse timeto-event outcomes.The Cox model was originally developed to analyse all-cause mortality (Cox, 1972), of which there is no competing risk.The hazard ratios estimated from the Cox model, assuming an independent censoring mechanism, are potentially biased when analysing a non-terminal event (Berry et al., 2010).Some authors use copula regression models to jointly model non-terminal and terminal events.Peng and Fine (2007), Hsieh et al. (2008) and Chen (2012) propose semiparametric copula models for the regression on the marginal distributions.The analysis on the terminal event can be conducted using common survival methodology.For instance, Peng and Fine (2007) model the terminal event with a proportional hazards model marginally within their copula model.Hsieh et al. (2008) use a copula model, which allows for different dependence structures between covariate groups.The Bayesian Normal induced copula estimation model is developed by Fu et al. (2013).
Copulas are multivariate distribution functions which can model the marginal distributions separately, along with the dependence structure between them.Sorrell et al. (2022) introduced bivariate copula models to estimate the correlation between semi-competing risk endpoints, using the following four copula functions, the Normal, Clayton, Frank and Gumbel copulas.However, the hazard rates of the non-terminal and terminal events, along with the association parameter between them, may be influenced by covariates.Including covariates into the analysis of the correlation between survival endpoints can help understand how the association may be influenced by individual characteristics.Covariates may also be included in the analysis of marginal distributions, which allows to estimate hazard ratios and subsequently the comparison of risks of survival endpoints between groups.
In this paper, we develop copula survival regression models by using conditional copula (Patton, 2006), which allow both the association parameter between the survival endpoints and the hazard rates to depend on multiple binary covariates.This is also an extension from the copula survival model introduced in Sorrell et al. (2022), where no covariates are included.We estimate the hazard ratio using copula survival regression models with binary covariates.We also estimate the effects of these binary covariates on the association between semi-competing risks.By jointly modelling the non-terminal and terminal events and including the correlation between them, we hope to improve the inference about the non-terminal event.
We focus on the inclusion of covariates in the analysis of survival data with a semi-competing risk using conditional copulas.In the semi-competing risk setting, Chen (2012) developed a model allowing the association parameter of the copula function to vary with categorical covariates.A model allowing each covariate group to assume a separate Archimedean copula function is introduced by Hsieh et al. (2008).This allow to have different dependence structures describing the association between the semi-competing risk events for each covariate group.The effect of a discrete covariate on the non-terminal and terminal event and the association parameter between the semi-competing risk events is investigated by Ghosh (2006).Peng and Fine (2007) studied the effect of a covariate on the nonterminal event and the association parameter, by proposing a model with a timedependent copula and apply the methods to survival endpoints following AIDS diagnosis.Similarly, Zhu et al. (2021) propose a copula regression approach to examine covariate effects on the non-terminal and terminal events using a two-stage approach.In stage 1, the covariate effects on the marginal events are investigated and in stage 2, the association parameter of the copula model is estimated.Zhu et al. (2021) prefer the two-stage approach, compared to the one-stage approach by Peng and Fine (2007), as it allows for separate identification of misspecification of the marginal regression models and the copula model.
The rest of the paper is organised as follows.The motivating data for this paper are described in Section 2. We then introduce our proposed Copula regression survival methods to estimate the effect of covariates on the semi-competing risk events and on the association parameter using a variety of conditional copula functions in Section 3. In Section 4, we apply our proposed methods to the motivating data, followed by a simulation study to compare the copula models in Section 5. We conclude with a discussion.

Motivating data
The motivating data are from the United Kingdom Transplant Registry (UKTR), held by the National Health Service (NHS) Blood and Transplant.The outcomes, time to graft failure and time to death since kidney transplantation, are semicompeting risks.Our study population is kidney transplant recipients, who had their single and first transplant between 1995 and 2016 in the UK, with known age, sex and donor type at the time of transplantation.We present a novel application by using our proposed copula survival regression models.
We included 40, 348 single and first kidney transplants between 1995 and 2016 in the UK.For 78.01% of patients the graft failure time is censored, for 80.0% of patients the death time is censored and for 64.3% of patients both events are censored.
Table 1 shows the baseline characteristics of interest.The covariates to be included in the application later in this paper include donor type (deceased or living), recipient's sex (male or female), and recipient's age (> 50 years or ≤ 50 years).Donor type indicates whether a recipient had received a deceased donor or living donor kidney for their transplantation.These covariates were included as they are important factors influencing transplant outcomes according to the literature as follows.As reported in Terasaki et al. (1995) and Port et al. (2004), living kidney donor recipients have improved survival compared to deceased donor recipients.
Moreover, recipient's age has been shown to be a significant factor affecting survival following transplant (Becker et al., 2000;Fabrizii et al., 2004;Keith et al., 2004).In Fabrizii et al. (2004), the 5-year survival was found to be affected by recipient age group, however, these differences were not evident after controlling for confounding.(1995 − 2016).The number of recipients with the particular characteristic is given alongside the percentage in brackets.

Methods
In this section, we describe methods to include covariates in copula functions to examine the effects that they have on the semi-competing risk events.We let the association parameter of the copula function, along with the hazard rates from the marginal survival distributions be conditional on covariates.We use conditional copulas, first introduced by Patton (2006) who extended Sklar's Theorem (Sklar, 1959) to account for the conditioning of the random variables on covariates.
Let T 1 denote the time to the non-terminal event and T 2 denote the time to the terminal event with censoring time, C. The time to the first event is denoted by We estimate the association between the survival probabilities of individuals, using univariate survival functions in the copula function.This changes the interpretation of the strength of association between the endpoints (Renfro et al., 2015), compared to the use of the cumulative distribution function (CDF) in Patton (2006).We represent the joint survival function using the copula function, C θ , with association parameter θ, where the subscript B is used to indicate the bivariate nature of the function.
Then, the copula density function is as follows, The joint PDF can be expressed using the copula density function and the marginal PDFs, We consider four different types of copula, the Normal, Clayton, Frank and Gumbel copulas, for which the definitions are given in Sorrell et al. (2022).These copula functions cover a range of dependence structures and offer different shapes of the joint survival function in equation (1).For each of these copulas, the association parameter θ is univariate.For the marginal distributions for both events, we consider the Exponential, Weibull and Gompertz distributions, which are commonly used parametric survival models.We use them to illustrate the methods and applications, however these can be easily extended to other parametric survival distributions.

Likelihood
Let Θ be a vector of the parameters of the marginal distributions and the association parameter θ that are conditional on covariates.Then, the likelihood function is given as follows, where n is the total number of individuals.
Below, we give the formulae for the likelihood when the four types of copula, the Normal, Clayton, Frank and Gumbel, are used.Table 2 summarises the different link functions for each copula that are used to ensure the association parameters are within the permissible ranges within the respective copula functions.
For the Normal copula, the association parameter θ = ρ, where ρ is the Pearson Table 2: Link functions for the association parameter θ.
correlation coefficient, and the likelihood function of equation ( 4) is given by where Φ is the CDF of the standard normal distribution.
For the Clayton copula, the likelihood function is given by For the Frank copula, the likelihood function is given by Finally, for the Gumbel copula the likelihood function is given by For the Normal, Clayton and Gumbel copulas, the variance of the association parameter θ can be approximated using the Delta method.We describe for the case of one covariate as well as for several covariates in the Supplementary Materials, in Section 1.2 for the Normal, Section 1.3 for the Clayton and Section 1.4 for the Gumbel copula, respectively.

Event times models
We consider the Exponential, Weibull and Gompertz distributions for the marginal distributions of event times.For each model, we specify the link function for marginal parameters.An Exponential model assumes a constant hazard rate, while the Weibull and Gompertz models allow for increasing, constant or decreasing hazard rates.The methods can be easily extended to other parametric survival distributions such as a Log-normal or Log-logistic distributions.

Exponential event times
Assume that the marginal distributions for both events follow the Exponential distribution with hazard rates λ 1 and λ 2 for the non-terminal and terminal events, respectively.Then, the survival functions are given by S T 1 |W (t 1 ) = exp(−λ 1 t 1 ) and S T 2 |W (t 2 ) = exp(−λ 2 t 2 ) for the non-terminal and terminal events, respectively.We incorporate covariates W 1 , . . ., W p into the hazard rates of both events: where a 0 , ..., a p are regression coefficients for the non-terminal event and c 0 , ..., c p are regression coefficients for the terminal event.Therefore, the hazard ratios for a binary covariate W k , where k ∈ {1, . . ., p}, are given by for the non-terminal and terminal events, respectively.

Weibull event times
Assume that the marginal distributions for both events follow the Weibull distributions with the PDFs for j = 1, 2, where α j is the shape parameter and β j is the scale parameter and t j represents the event time for the non-terminal and terminal events, respectively.
The survival function and hazard function are given by the following We consider the case where the shape parameters α 1 and α 2 are constant and the scale parameters β 1 and β 2 depend on the covariates W = (W 1 , . . ., W p ) using the following calibration functions: where a 0 , ..., a p are regression coefficients for the non-terminal event and c 0 , ..., c p are regression coefficients for the terminal event.
Therefore, the hazard ratios for a binary covariate W k , where k ∈ {1, . . ., p}, are given by respectively.These are in the same form as for the Exponential event times described in equations ( 11) and ( 12).

Gompertz event times
Assume that the marginal distributions for both events follow the Gompertz distributions with the PDFs , where γ j is the shape parameter and λ j is the rate parameter and t j represents the event time for the non-terminal and terminal events, respectively.
The survival function and hazard function are given by the following We consider the case where the shape parameters γ 1 and γ 2 are constant and the rate parameters λ 1 and λ 2 depend on the covariates W 1 , . . ., W p using the following calibration functions: where a 0 , ..., a p are regression coefficients for the non-terminal event and c 0 , ..., c p are regression coefficients for the terminal event.
Therefore, the hazard ratios for a binary covariate W k , where k ∈ {1, . . ., p}, are respectively.These are in the same form as for the Exponential and Weibull event times.
For all three marginal models, the variances of the hazard ratios can be approximated by Var(HR T ) = exp(2ĉ k )Var(ĉ k ), ( 14) using the Delta method described in Section 1.1 in the Supplementary Materials.

Estimation
The regression coefficients for the marginal distributions and the association parameters are estimated by maximising the log-likelihood, i.e. the logarithm of (4).This is achieved by using the function optim in package R (R Core Team, 2021) in the practical application presented in this paper.The 95% confidence intervals (CI) are constructed using the Fisher Information matrix.

Application
We apply the methods described in Section 3 to the UKTR data set introduced in Section 2. The data set contains time to graft failures and time to death of kidney transplant recipients.To illustrate the use of the methods, we include the following binary covariates, recipient age group (> 50 years and ≤ 50 years), recipient sex (female and male) and donor type (living and deceased donors), in the analysis of both survival endpoints.We allow both the hazard rates and the association between the survival endpoints to vary with these covariates.We estimate the hazard ratios of the non-terminal and terminal events along with the association parameters for the reference and covariate groups.
We use Exponential, Weibull and Gompertz distributions to model the survival time.We apply four different copula functions to describe the association between the survival endpoints and we select the best fitting models using the Akaike Information Criterion, AIC (Akaike, 1974).The estimated hazard ratios for each covariate and the regression coefficients of the covariates for the association parameter are provided in Tables 3, 4 and 5 for the Exponential, Gompertz and Weibull survival distributions, respectively.The results for the Cox model are presented in Table 3.Moreover, in Tables 3-5 the computational time for each model is reported for a computer with processor 11 th Gen Intel (R) Core(TM) i7-1165G7 CPU @ 2.80GHz.

Graft failure following transplant
Across all considered models, sex is not found to be associated with the risks for graft failure.Compared to a deceased donor transplant, living donor transplant is associated with lower risk for graft failure.Older age (> 50 years) is found to be associated with increased risk across all considered copula models except for two cases: Normal and Gumbel copulas with Weibull survival model, where there is no association found.This is in contrast with the Cox model where the older age group is found to be associated with lower risk of graft failure (HR: 0.911, 95% CI: 0.872 to 0.952, Table 3).This may be in part due to the censoring of graft failure by death in the Cox model, the older individuals who die before experiencing graft failure may be seen as less likely to experience graft failure.
In the fitted Cox model for graft failure, there was evidence for non-proportional hazards for all three covariates.Using Grambsch and Therneau's approach to diagnose non-proportionality, we obtained P-values of 0.091, 0.017 and <0.001 for sex, age and donor type, respectively.

Death following transplant
For all considered models, including the Cox model and all copula models, all three covariates: sex, age and donor type, are found to be associated with death after transplant.In particular, female sex, living donor transplant and younger age (≤ 50 years) are associated with lower risk of death.In contrast, male sex, deceased donor transplant and older age (> 50 years) are associated with higher risk of death.These findings are consistent across all models and the estimated hazard ratios are fairly similar.
In the fitted Cox model for death, there was evidence for non-proportional hazards for donor type (P<0.001),whilst there was insufficient evidence for nonproportional hazards for sex (P=0.936) and age (P=0.882).

Association between graft failure and death
The results from all the copula survival models showed that the association between graft failure and death is stronger for individuals in the older age group compared to the younger age group.
In most of the fitted copula models, we observe no difference between female and male recipients for the association between graft failure and death.The exceptions are the Clayton copula models and Frank copula Gompertz model which show stronger association between these two end points for female recipients.
Similarly, in most of the fitted copula models, we observe no difference between living and deceased donors for the association between graft failure and death.The exceptions are the Clayton copula models and Frank copula Exponential model which show stronger association between these two end points for living donor recipients.

Results for the preferred model
In our real data analyses, we reported the AIC value for each model.However, we did not compare Cox model with the copula models using AIC.This is because the Cox model was fitted to graft failure and death, separately, and the Cox model has a partial likelihood instead of a full likelihood.In contrast, the copula survival model analysed both graft failure and death jointly, and has a full likelihood.Hence, the AIC values for the Cox model where outcomes were analysed separately and the copula-based parametric survival models where outcomes were analysed jointly are not comparable.However, we used the AIC to compare between the copula survival models.For each survival distribution, the Frank copula model is the preferred according to the AIC criterion (Tables 3, 4, 5).Moreover, for each copula model, the Weibull survival distribution provides the lowest AIC.
In the Frank copula Weibull survival model, the association between graft failure and death is affected by age, with individuals in the older age group having a stronger association compared to those in the younger age group.The hazard ratio of graft failure for living donors is 0.560 (95% CI: 0.532 to 0.588), and for death is 0.519 (95% CI: 0.490 to 0.549), indicating the living donor recipient group is at lower risk for both events.Female sex is associated with lower risk for death compared to men, with hazard ratio 0.920 (95% CI: 0.882 to 0.957).The hazard ratio of graft failure for the older age group is 1.202 (95% CI: 1.152 to 1.251) and for death is 3.734 (95% CI: 3.565 to 3.903), indicating the older age group is at higher risk of graft failure and death.
The Frank copula Weibull survival model suggests that in general male patients above 50 years who received a transplant from a deceased donor are at highest overall risk for death following kidney transplant.At the other extreme, younger female patients who received a transplant from a living donor are at the lowest overall risk for death.The hazard functions estimated from the preferred model are presented in Figure 1 for these two subgroups, respectively.For both subgroups, the hazard for graft failure is greatest immediately following the transplant and gradually decreases within 4 years where it is stabilised (Figure 1a).The hazard for death is stable over time since transplant (Figure 1b).For male recipients aged > 50 years with a deceased donor, for the first 2 years following kidney transplant, the hazard for graft failure is greater than for death and after that the order is reversed.

Simulation study
We conduct two simulation studies.In simulation study 1, we aim to assess the performance of the proposed bivariate copula regression models for semi-competing risk data.We simulate data to mimic the real example of renal transplant data.We compare the performance of three models: the Cox proportional hazards model; the bivariate copula regression models with predictors for the hazard rates (Copula model 1); and the bivariate copula regression models with predictors for both hazard rates and the association parameters (Copula model 2).
In simulation study 2, we aim to assess the effect of misspecification of the survival distributions on the estimation of the model parameters.We use the AIC value to select the best fitting model for the simulated data and calculate the percentage that the underlying survival model is correctly selected.

Design
We simulate data with known marginal hazard rates for the non-terminal and terminal events, and the correlation between the two event times.We assess the accuracy of the estimates for the hazard ratios of the non-terminal and terminal events, HR N T and HR T , respectively.The number of replications in both simulation studies was 1000.The simulation algorithm is provided below.
1. We generate the binary covariates, W k , where k = 1, ..., p for p covariates, in proportions that represent the renal transplant data.
2. We generate Pearson's correlation coefficient for the Normal copula, or association parameters for the Clayton, Frank and Gumbel copulas, respectively, as described in Table 2, with chosen b k .
3. Hazard rates for the non-terminal event, λ 1 , and terminal event λ 2 , with chosen a k and c k being generated by using equations ( 9) and ( 10), respectively.
4. We use the conditional distribution method (Nelsen, 2006) to simulate time to the non-terminal and terminal events from a specified copula.
(a) (U, V ) have the joint distribution function of the chosen copula, where U represents the uniformly transformed time to the non-terminal event, and V represents the uniformly transformed time to the terminal event.
(c) Generate V from C(v|u), which is the inverse of the conditional copula distribution function of V given U .
5. We obtain T 1 and T 2 from the respective inverse marginal survival functions, Here , when the survival distribution is Exponential for both events.
6.We simulate censoring time, C, independently from a Uniform distribution.In simulation study 1, the simulated data are analysed using Cox model, and the models with underlying copula function.For example, when analysing the simulated data generated using the Clayton copula, we use the Clayton copula model.Sorrell et al. (2022) conducted an extensive simulation study to evaluate the effect of misspecification of copula functions and we will not duplicate that simulation in this paper.

Choice of parameters
We generate data sets with 3000 individuals with hazard rates and association parameters mimicking the real data analysis from the UKTR data set.The binary covariates are generated from a Bernoulli distribution with parameters mimicking the real data.Specifically, the age group > 50 years with probability 0.40, females with probability 0.38 and living donor recipients with probability 0.30.
The regression coefficients are set to be equal to those in Tables S1 and S2 in the Supplementary Materials for simulation studies 1 and 2, respectively.The censoring time, C, is generated independently from Uniform(0, 25) distribution, representing a maximum follow up time of 25 years.

Performance measures
We maximise the log-likelihood described in Section 3 using the optim function in R to estimate the regression coefficients for the marginal and association parameters.The performance of the maximum likelihood estimates is evaluated using the bias, the mean squared error (MSE) and the coverage probability.

Results of simulation study 1: performance of the proposed bivariate copula regression models
For the non-terminal event, the hazard ratio of age group estimated from the Cox model has the following coverage probabilities, 11.7%, 0.0%, 0.0% and 18.9%, if data were simulated from Normal, Clayton, Frank and Gumbel copulas, respectively (Table 6).These coverage probabilities were far below the nominal level.In copula regression model with covariates included for the hazard rate but not for the association parameter (copula model 1), the equivalent coverage probabilities were 0.0%, 90.3%, 56.7%, 45.5% (Table 6).Using copula model 2, where covariates were included for the hazard rates and the association parameter, the coverage probability is around the nominal level for each copula, 94.6%, 95.1%, 94.5%, 95.2% (Table 6).We observe a reduction in the bias and MSE of the nonterminal hazard ratio for the age group in copula model 2, as compared to the Cox model.This explains the discrepancy in the findings between the Cox model and the copula model in the real data analysis.
For the hazard ratios of sex and donor type for the non-terminal event, both the Cox model and the Copula model 2 result in coverage probabilities close to the nominal level.However, the results from the copula models show slight reductions in bias and MSE.For example, when using the Cox model for data generated from the Normal copula, the bias and MSE of the hazard ratio are 0.240 and 0.062, respectively, for the non-terminal event with covariate age agre.Comparing this to Copula model 2, we find the bias to be −0.003 and the MSE to be 0.007.
The results for the hazard ratios of the terminal event are similar between the Cox model, Copula models 1 and 2 with coverage probability close to 95%.This is expected, because the terminal event can not be censored by the non-terminal event.However, using copula model 1 the coverage probability is 100.0%, for the hazard ratio of age group, where data are generated from the Clayton copula.
When including the covariate age group in the association parameter, in copula model 2, we find the coverage probability increases to 95.7%.

Results of simulation study 2: effects of the misspecification of survival distributions
We evaluate the use of alternative survival distributions and the effects of misspecification of the survival distribution.For the simulation studies investigating misidentification, we simulate data mimicking the real data set using four different copula distributions, with true values given in supplementary material Table S2.We also evaluate the use of AIC to select the survival distributions.We simulate data with the following combinations of survival distributions and copula functions, Exponential, Weibull and Gompertz survival distributions and Normal, Clayton, Gumbel and Frank copulas.
The results of simulation study 2 are given in Table S3 in the Supplementary Material.Here we give a summary of finding from this simulation study.Our simulation study shows that AIC can be used to select the survival distributions.The survival distribution is chosen correctly in almost 100% cases for the Weibull distribution, roughly 91% or above for Gompertz distribution, and around 78% or above for Exponential distribution.Where the underlying Exponential distribution is not correctly identified, about 10% of the times the survival distribution is misspecified as Weibull or Gompertz distribution.However, the estimation is in general robust to the misidentification of the Exponential distributions by Weibull or Gompertz distribution.The summary of performance of the models selected by lowest AIC show that the coverage probability is close to the nominal level.

Discussion
We propose a set of copula regression models to allow both the hazard rates and the association between times to non-terminal and terminal events to vary by covariates.We use conditional copulas (Patton, 2006) with predictors for the hazard rates of the semi-competing risk events and the association parameters.The advantage of the proposed method is the estimation of the hazard ratios by taking into account the correlation between the semi-competing risks and the flexibility of using a variety of copulas to describe different patterns in the relationship between the survival endpoints.
Our work demonstrates the importance of considering the correlation between the semi-competing risks.In the real data analysis, different conclusions were found for the effect of age group, where Cox model showed older age was associated with lower risk of graft failure (HR: 0.911, 95%CI: 0.872 to 0.952), while our proposed copula models showed older age was associated with higher risk of graft failure from all copula models.The estimated hazard ratio from Frank copula Weibull survival model, which has the lowest AIC, is 1.202 with 95%CI: 1.152 to 1.251 (Table 5).
We conduct simulation studies to assess the performance of the copula regression models and to evaluate the use of AIC to choose survival distributions.The estimation of the hazard ratio for the non-terminal event is improved by using the copula model 2, with coverage probability around the nominal level, compared to using the Cox model which has coverage probability 0% in some scenarios (Table 6).These results corroborate the findings from Leffondré et al. (2013).For the non-terminal hazard ratio, we found a reduction in bias and MSE using the copula regression models compared to the Cox model.Our work highlights the importance of acknowledging the semi-competing risk when analysing the effect of a covariate on an endpoint, where the covariate has a strong effect on a competing risk.
We have considered the effect of misspecification of survival distribution.Our simulation studies show that the AIC can be used to select the survival distributions in the majority of cases.In the remaining cases, misspecification is more likely to occur when the underlying distribution is Exponential or Gompertz.However, miss-specifying these two distributions by a Weibull distribution still holds good properties in terms of the parameter estimates.Further research may also investigate other parametric survival distributions or the use of non-parametric methods to model the marginal survival functions.
We have used a full maximum likelihood approach for estimating the model parameters.We acknowledge the two-stage estimation procedures for the association parameter may be more efficient for copula survival models (Shih and Louis, 1995).In the two-stage approach, the association parameter and the parameters in the marginal survival distributions are estimated separately.In stage 1, parameters in the marginal distributions are estimated assuming they are independent.In stage 2, the association parameter is estimated by fixing the two marginal distributions at the estimates from stage 1.This approach ignores the dependency structure in stage 1, however, it offers the advantage of being practically efficient especially when the models become more complex.The application of the two-stage approach in our proposed models and the comparison with the one-stage full maximum likelihood approach is a topic for future research.Another possible extension could be to estimate the marginal survival function using non-parametric methods (Li et al., 2020) or Cox proportional hazards model (Li et al., 2021).It would be also of interest to investigate the use of grid search methods to find starting values for optimising the likelihood function.
In this paper, we have extended the recent work in bivariate semi-competing risk models by including covariates.Another topic of future research could be to extend the model to include frailty terms to account for unmeasured covariates.Since we have bivariate semi-competing risk data, this extension will require to model the frailty terms by using a bivariate distribution to account for the potential correlation between the two frailty terms.
We have used Exponential, Weibull and Gompertz distributions to illustrate the methods and applications of our proposed models.However, this can be readily extended to other parametric survival distributions.We have assessed the performance of our proposed copula survival models using simulation studies and compare between them using AIC.Assessing the goodness-of-fit of the copula survival models may be developed in future research.
As this paper focuses on describing copula regression models with binary covariates, the dichotomous age groups (> 50 years or ≤ 50 years) representing younger and older age were included for illustration purpose.We acknowledge that age should be best analysed with more categories or as a continuous variable.
In our follow-up work, we are developing copula regression models to include more categories for age or include age as a continuous variable.
Further research may investigate the inclusion of continuous and categorical covariates.We have considered a linear function to incorporate covariates into the hazard rates and association parameter, however, alternatives may be considered.Acar et al. (2013) have developed methods to compare potential functions that describe the association parameter of the copula function's relationship with the covariate by using a generalised likelihood ratio test.We have used available case analysis to illustrate the application of our methods.Future research can use multiple imputation to deal with missing data when using our proposed methods.This could be time consuming when the Normal copula model is used.
As can be seen in Tables 3-5, it takes 2-4 hours to optimize the likelihood for Normal copula models.Future research may investigate how to speed up the optimization process for the analysis of a single data set, and the use of parallel computing in R for conducting simulation studies.and association parameters depending on the covariates.Data are generated from the Normal, Clayton, Frank and Gumbel copulas and the true copula distribution is used for estimation.We generate 1000 data sets with 3000 individuals in each.The non-terminal hazard ratio for each covariate is given by HR N T,covariate , and the terminal hazard ratio for each covariate is given by HR T,covariate .MSE refers to the mean squared error and CP refers to the coverage probability, given as a percentage.

Figure 1 :
Figure1: Estimated hazard functions for two subgroups of patients: male recipients above 50 years with deceased donor and female recipients aged below 50 years with living donor, using Frank copula with Weibull survival times.

7.
Set the time to the non-terminal event, X = min(T 1 , T 2 , C), with event indicator d 1 = 1 if T 1 ≤ min(T 2 , C) and d 1 = 0 otherwise.8. Set the time to the terminal event, Y = min(T 2 , C), with event indicator d 2 = 1 if T 2 ≤ C and d 2 = 0 otherwise.

Table 1 :
Baseline characteristics for single and first kidney transplant recipients from the UK Transplant Registry data set in and d 1 = 0 otherwise.The time to the second event is denoted by Y = min(T 2 , C) with event indicator d 2 = 1 if T 2 ≤ C and d 2 = 0 otherwise.We let the marginal hazard rates and the association parameter depend on covariates, W = (W 1 , ..., W p ).The marginal survival functions are given by S T 1 |W (t 1 |w) = P (T 1 > t 1 |w) and S T 2 |W (t 2 |w) = P (T 2 > t 2 |w) for the non-terminal and terminal events, respectively.The marginal probability density functions (PDFs) are denoted by f T 1 |W (t 1 |w) and f T 2 |W (t 2 |w) for the non-terminal and terminal events, respectively.

Table 3 :
Cox model and Exponential survival copula models: Results from analysing the UK Transplant Registry.The binary covariates are defined as follows, sex: female vs male (reference), donor type: living vs deceased (reference), and age group: > 50 years vs ≤ 50 (reference).

Table 4 :
Gompertz survival copula models: Results from analysing the UK Transplant Registry.The binary covariates are defined as follows, sex: female vs male (reference), donor type: living vs deceased (reference), and age group: > 50 years vs ≤ 50 (reference).

Table 5 :
Weibull survival copula models: results from analysing the UK Transplant Registry.The binary covariates are defined as follows, sex: female vs male (reference), donor type: living vs deceased (reference), and age group:> 50 years vs ≤ 50 (reference).