Assessing importance of biomarkers: A Bayesian joint modelling approach of longitudinal and survival data with semi-competing risks

Longitudinal biomarkers such as patient-reported outcomes (PROs) and quality of life (QOL) are routinely collected in cancer clinical trials or other studies. Joint modelling of PRO/QOL and survival data can provide a comparative assessment of patient-reported changes in specific symptoms or global measures that correspond to changes in survival. Motivated by a head and neck cancer clinical trial, we develop a class of trajectory-based models for longitudinal and survival data with disease progression. Specifically, we propose a class of mixed effects regression models for longitudinal measures, a cure rate model for the disease progression time ( T P ) and a Cox proportional hazards model with time-varying covariates for the overall survival time ( T D ) to account for T P and treatment switching. Under the semi-competing risks framework, the disease progression is the non-terminal event, the occurrence of which is subject to a terminal event of death. The properties of the proposed models are examined in detail. Within the Bayesian paradigm, we derive the decompositions of the deviance information criterion (DIC) and the logarithm of the pseudo-marginal likelihood (LPML) to assess the fit of the longitudinal component of the model and the fit of each survival component, separately. We further develop Δ DIC as well as Δ LPML to determine the importance and contribution of the longitudinal data to the model fit of the T P and T D data.


Introduction
A patient-reported outcome (PRO) is any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else (U.S. Food and Drug Administration Guidance for Industry, 2009). PROs can help identify symptoms or problems that may be missed during clinical queries, guide policy and health care delivery, and help patients choose between two equally efficacious therapies. Joint modelling of PRO and survival data can provide a comparative assessment of patient-reported changes in specific symptoms or global measures that correspond to changes in survival.
In joint modelling literature, there are usually a longitudinal component, a survival component and a dependence structure connecting those two components. There has been much work on joint modelling of one longitudinal measure and a survival measure, including Schluchter (1992), Hogan and Laird (1997), Law et al. (2002), Brown and Ibrahim (2003), Chen et al. (2004), Ibrahim et al. (2004), Brown et al. (2005), Chi and Ibrahim (2006), Chi and Ibrahim (2007), Ibrahim et al. (2010), Ye et al. (2008), Brown (2009), Rizopoulos (2012), Pawitan and Self (1993), De Gruttola and Tu (1994), Lavalley and Degruttola (1996), De Gruttola and Tu (1994), Xu and Zeger (2001b), Xu and Zeger (2001a), Jacqmin-Gadda et al. (2010) and Proust-Lima et al. (2014). Tudur-Smith et al. (2016) provided a recent review on joint modelling. In the recent development, multiple longitudinal processes and spline-based methods have been used in the joint modelling (Rizopoulos and Ghosh, 2011). Joint modelling has also been applied to more complex data structures such as competing risks data (Williamson et al., 2008;Elashoff et al., 2008;Huang et al., 2011;Li et al., 2012;Proust-Lima et al., 2016) and semi-competing risks data (Li and Su, 2017). In this article, we propose a mixed effects regression model for longitudinal measures, a cure rate model for the disease progression time (T P ) and a Cox proportional hazards model with time-varying covariates for the overall survival time (T D ) to account for T P and treatment switching. Note that Chen et al. (2020) provide a detailed elaboration of treatment switching, and they also develop a method for estimating the intended treatment effect accounting for treatment switching. In addition, the trajectory function from the longitudinal model is substituted into the hazard function of each survival submodel, thereby serving as another time-varying covariate in the survival submodel. Thus, our model is the trajectory model (TM). Although Li and Su (2017) address a very similar context as semi-competing risks within the joint modelling framework, they assume a probit model for the discrete time hazard of dropout and another probit model for the discrete time hazard of death. Li and Su (2017) further assume the shared parameter model (SPM), where the longitudinal model and the survival submodels share common random effects, which then induce correlation between the longitudinal and survival components. Zhang et al. (2017) developed a novel decomposition of DIC and LPML to assess the fit of the longitudinal and survival components of the joint model, separately. Based on this decomposition, they then proposed new Bayesian model assessment criteria, namely DIC and LPML, to determine the importance and contribution of the longitudinal data to the model fit of the survival data. In this article, we extend the decomposition of Zhang et al. (2017) to our joint model with semi-competing risks. We derive the decompositions of DIC and LPML (i.e., DIC = DIC Long + DIC Pg|Long + DIC Surv|Pg,Long and LPML = LPML Long + LPML Pg|Long + LPML Surv|Pg,Long , respectively) to assess the fit of the longitudinal component of the model and the fit of each survival component, separately. Let DIC Surv be the difference of DIC Surv|Pg,Long and DIC of the overall survival model alone without joint modelling. DIC Surv measures the additional information gained by adding disease progression time and longitudinal components into modelling of overall survival data. Similarly, DIC Pg measures the additional information gained by adding longitudinal components into modelling of progression data. Similar ideas apply to LPML Surv and LPML Pg . We also develop a simple but more attractive prior for the covariance matrix based on its Cholesky decomposition borrowing the idea of the Lewandowski, Kurowicka, and Joe (LKJ) prior for the correlation matrix C which has a density function of f (C|τ) ∝ |C| τ−1 (Lewandowski et al., 2009).
The rest of the article is organized as follows. Section 2 gives a detailed description of the motivating longitudinal and survival data from a head and neck clinical cancer trial. In Section 3, we propose a mixed effects regression model for the longitudinal measure, a cure rate model for T P and a time-varying covariates model for T D . The properties of the proposed models are discussed. The likelihood functions under the proposed models are derived in Section 4. In Section 4, we also present the prior and posterior distributions, develop an efficient Markov chain Monte Carlo sampling algorithm for carrying out the posterior computations, and provide the formulation of the decomposition of the LPML and LPML's for model comparison. A comprehensive analysis of the longitudinal and survival data from a head and neck cancer trial is given in Section 5. We conclude the article with a discussion including some extensions in Section 6.

Motivating longitudinal and survival data from a head and neck cancer clinical trial
The motivating study was an open-label, randomized, active-controlled phase III trial comparing the efficacy and safety of afatinib versus methotrexate in patients with recurrent/metastatic squamous carcinoma cancer of head and neck (HNSCC Machiels et al., 2015). Patients were treated on either afatinib or methotrexate, according to randomization, until the occurrence of disease progression or intolerable adverse events. Upon discontinuation of randomized treatment, a patient was free to take other anti-cancer therapies per protocol (except that patients discontinued methotrexate cannot start afatinib). It was reported that more than half of the patients who discontinued study medication started another anti-cancer treatment.
Of the total 483 patients, 332 patients were randomized to the treatment arm (methotrexate) and 161 patients were randomized to the control arm (afatinib). Furthermore, 246 patients switched the treatments, 358 died and 125 were censored. The median follow-up of 6.7 months (interquartile range (IQR): 3.1 -9.0). Figure 1 shows the counts of disease progression, treatment switch and death in each treatment arm. Three patient-reported measures: global health status (GH), head and neck pain scale (HNPA), and head and neck swallowing condition (HNSW) were collected during this phase III trial.   HNSW as shown in Table 1. In this article, we investigate the associations between changes in the longitudinal measures and risks of disease progression (T P ) and death (T D ), in which T D was the key secondary endpoint of the trial. We consider the following baseline covariates: patients' prior treatment with an epidermal growth factor receptor (EGFR)-targeted antibody, cetuximab (PEGFRA: 1 if and 0 if no), age more than 65 years (AGE: 1 if yes and 0 if no), gender (SEX: male/female), prior treatment with plantinum-based chemotherapy and radiotherapy Median survival time in days and its 95% confidence interval are 434 and (386, 470), median switch time, and its 95% confidence interval are 166 and (147, 190), and median progression time and its 95% confidence interval are 84 and (81, 86).

The proposed models
Let Y(t) denote the longitudinal measure observed at time t. Also, let T P denote the time from the study entry until disease progression and let T D denote the time from the study entry until death. In addition, we let x be the p-dimensional vector of baseline covariates including the randomized treatment but not the intercept.

Longitudinal component of the joint model
We first assume a mixed effects regression model for the longitudinal measure Y(t), which is given by where g(t) is a (q + 1)-dimensional vector of functions of t including the intercept, θ * 0 is a (q + 1)-dimensional vector of subject-dependent random effects, θ is the (q + 1)-dimensional vector of the overall effects, γ is a p-dimensional vector of regression coefficients, and (t) is a measurement error. We further assume that θ * 0 = (θ * 00 , . . . , θ * 0q ) and (t) for t ≥ 0 are independent; where is a (q + 1) × (q + 1) positive definite covariance matrix; and for t ≥ 0, where σ 2 > 0 is a variance parameter. We note that in (3.1), if q = 1, g(t) = (1, t) and (θ * 0 + θ) g(t) represents a linear trajectory, and if q = 2, g(t) = (1, t, t 2 ) and (θ * 0 + θ) g(t) leads to a quadratic trajectory. A more general form of g(t) is given by g(t) = (1, b 1 (t), . . . , b q (t)) , where b 1 (t), . . . , b q (t) are a set of base functions. We also note that in (3.1), θ * 0 captures the patient-level dependence of the Y(t) over time t. However, the covariance matrix depends on the data only via random effects θ * 0 , which can lead to slow convergence of Gibbs sampling. To overcome this issue, we consider the reparametrization: θ * 0 = θ * , where is the Cholesky decomposition of with all diagonal elements being positive such that = . Under this reparametrization, we have and a reparametrized version of (3.1) is written as Then for any values of b 1 , . . . , b q+1 and a ij , i = 2, . . . , q + 1, j = 2, . . . , i − 1, is a full-rank matrix. Thus, there are no constraints on b 1 , . . . , b q+1 and a ij , i = 2, . . . , q + 1, j = 2, . . . , i − 1. Consequently, a multivariate normal prior can be specified for these parameters. Therefore, in this sense, our parametrization approach may be more attractive than the one discussed in Lewandowski et al. (2009). In addition, we have g(t) θ * = q+1 j=1 q+1 i=j g i (t)a ij θ * j , where a jj = exp(b j ) for j = 1, 2, . . . , q + 1. We may view θ * j 's as 'known data points' and a ij 's as the 'regression coefficients'. Under some mild conditions of the longitudinal data, the parameters a ij 's are stochastically identifiable.

Survival component of the joint model
Let δ denote the disease progression status of subjects such that δ = 1 if the subject has disease progression and δ = 0 if the subject never has disease progression. We assume a logistic regression model for δ as logit(P(δ = 1|φ, x)) = log where φ = (φ 0 , φ 1 ) is a (p + 1)-dimensional vector of regression coefficients. Following Berkson and Gage (1952) and , we assume a cure rate model for T p with and a Cox-type model when δ = 1 with where λ 10 (t) is a proper baseline hazard function such that ∞ 0 λ 10 (t)dt = ∞, β 1 is a regression coefficient corresponding to the longitudinal trajectory function ( θ * + θ) g(t) and α 1 is a p-dimensional vector of regression coefficients corresponding to the baseline covariates. From (3.7) and (3.8), we see that T P can be written as where ∞ × 0 = 0, T * P is a proper random survival time that follows the distribution specified in (3.8) and 1{A} denotes the indicator function, which takes a value of 1 if A is true and 0 otherwise.
We assume a time-varying covariates Cox model for T D with hazard function defined as where β 2 is a regression coefficient corresponding to the longitudinal trajectory function ( θ * + θ) g(t), α 2 is a p-dimensional vector of regression coefficients and z(t) is the history of time-dependent covariate up to t with η being a vector of the corresponding regression coefficients.

Remark 3.2:
We note that although we use x to denote a vector of baseline covariates, different subsets of baseline covariates may be used in (3.1), (3.6), (3.8) and (3.10), respectively. Also x can be different in the longitudinal and survival models. For the notational simplicity, we use the same x for baseline covariates throughout the article.

Remark 3.3:
Let V T P (t) = 1{T P ≤ t}. The variable V T P (t) essentially tracks the status of disease progression so that V T P (t) = 0 if the patient has not had disease progression at time t and V T P (t) = 1 if the patient has had disease progression before time t. Similarly, we denote V T SW (t) = 1 if the subject has switched treatment at time t, and V T SW (t) = 0 if the subject has switched treatment at time t. Suppose that R denotes the indicator for the randomized treatment such that R = 1 if the subject is on the treatment arm and R = 0 if the subject is on the control arm. In (3.10), we take

Remark 3.4:
From the proposed joint model for the longitudinal data and survival data, we see that θ * captures the dependence between the Y(t) and (T P , T D ), while the joint distribution of T P and T D is specified by the product of the marginal distribution [T P ] and the conditional distribution [T D |T P ]. Since the event 'δ = 0' is not observed, φ is only partially identifiable as shown in . Thus, in our analysis of the head and neck cancer data and the simulation study, we specify a moderate informative prior for φ.

Remark 3.5:
The regression coefficient β 1 in (3.8) (β 2 in (3.10)) captures the association between the longitudinal measure Y(t) and T P (T D ). If β 1 = 0 and β 2 = 0, Y(t) is independent of T P and T D . If β 1 > 0, an increase in the longitudinal measure Y(t) is associated with a shorter disease programme time T P . Similarly, if β 2 > 0, an increase in Y(t) is associated with a worse overall survival mortality. In addition, the longitudinal trajectory function ( θ * + θ) g(t) is directly included in the hazard function for T P in (3.8) and the hazard function for T D in (3.10) as a time-dependent covariate. Therefore, the proposed joint model is the TM.

The likelihood function
Suppose that there are n subjects. For the ith subject, let y i (t) denote the longitudinal measure, which is observed at time t ∈ {a i1 , a i2 , . . . , a im i }, where 0 ≤ a i1 < a i2 < · · · < a im i and m i ≥ 0 . Note that y i (0) corresponds to the baseline value. Also let x i denote a p-dimensional vector of covariates, which may include the randomized treatment. Let t i and ν i denote the overall survival time and the right-censored indicator, respectively, where ν i = 1 if t i is a death time and 0 if t i is right-censored for the ith subject. Also let t Pi be the disease progression time and δ i be the status of disease progression. If disease progression has been observed, then t Pi < t i and δ i = 1. However, if disease progression has not yet been observed at time t i , t Pi and δ i are not observed. Let t SWi denote treatment switch time, and if treatment switch has not been observed yet at Write y i = (y i (a i1 ), . . . , y i (a im i )) and W i = (W i1 , W i2 ), which is an m i × (q + 1 + p) matrix with m i × (q + 1) matrix W i1 = (g(a ij ) , j = 1, . . . , m i ) and m i × p matrix W i2 = (x i , j = 1, . . . , m i ). Then, the complete data likelihood function for the longitudinal data y i is given by Using (3.2), the density of θ * i is given by Based on the definition of z(t), we see that z(t) depends on t Pi and t only through . We also let ζ i denote the group indicator.
Define the survival function for . Under the proposed cure rate and time-varying covariates model, the observed data can be divided into four groups of observations. We derive the likelihood function for each of the four groups as follows. Group 1. Subjects are observed to die at time t i and no disease progressions have been incurred. The survival data consists of The likelihood function is given by Group 2. Subjects are observed to have disease progression at time t Pi and then die at t i . The survival data consists of The likelihood function is given by Group 3. Subjects are observed to have disease progression at time t Pi and then censor at t i . The survival data consists of and ζ i = 3. The likelihood function takes the form Group 4. Subjects are censored at t i and no disease progression occurs before t i . The The likelihood function is written as Remark 3.6: The disease progression is a non-terminal event while the death is a terminal event. The overall survival time T D censors the disease progression time T P , but not vice versa. Thus, the disease progress and the death constitute the semicompeting risks. Under the proposed model, these semi-competing risks are factored into the constructions of the likelihood functions in Groups 1 and 4.

Bayesian model assessment
Based on the joint distributions, define the following conditional distributions:
Define λ * 2 (t|λ 20 , α 2 , x) = λ 20 (t) exp{α 2 x}, and Define model assessment criterion: DIC Surv measures the additional information gained by adding time to disease progression and longitudinal components into modelling of overall survival data. Large values of DIC Surv correspond to more gain in the fit of the overall survival data. We can also fit survival data alone with time-dependent covariates, assuming β 2 = 0.
Similarly, denote where DIC Pg0 is obtained by assuming β 1 = 0 (fitting the time to progression model alone). Define λ * 1 (t|λ 10 , α 1 , x) = λ 10 (t) exp{α 1 x}, P * (T P > t Pi |λ 10 , α 1 , and D Pg,obs = (t Pi , δ i , x i i = 1, . . . , n). DIC Pg measures the additional information gained by longitudinal components into modelling of progression data. Large values of DIC Pg correspond to more gain in the fit of progression data. As pointed out in Zhang et al. (2017), when the penalty for the additional parameters in the overall survival model (3.10) or the time to progression model (3.8) outweighs the gain of the fit in the respective survival component, DIC Surv or DIC Pg can be negative. When DIC Surv ( DIC Pg ) is zero or negative, the longitudinal measure is not associated with the overall survival (the time to disease progression). This phenomenon is also applied to the measures developed based on the CPO statistics below.

Analysis of the head and neck cancer data
We apply our proposed method in Section 3 to analyse the head and neck cancer data introduced in Section 2. Out of the total 483 patients, we exclude two patients from the analysis because of missing BSDTARG0 measures. We are interested in identifying which of the three patient-reported measures: GH, HNPA and HNSW contributes most to the model fit of the survival data. Each of GH, HNPA and HNSW was measured on a continuous scale from 0 to 100. Following Ibrahim et al. (2001), we apply transformation to the original longitudinal measures: √ 100 − GH, √ 100 − HNPA and √ 100 − HNSW, respectively, to achieve a better normal approximation. For each transformed longitudinal measure, we fit the joint model for the longitudinal measure, disease progression time and the overall survival, with a time-dependent covariate accounting for the model fit of the survival data. We include 11 covariates described in Section 2, namely ECOG, AGE, PEGFRA, FEMALE PCRTDC, TMSITED01, METLOC2, METLOC3, METLOC4, ETLOC5, BSDTARG, and treatment indicator R in the time to progression model (3.8) and the overall survival model (3.10). We include five covariates, namely ECOG, AGE, PEGFRA, FEMALE, and PCRTDC, and treatment indicator R in the cure rate model (3.6). In the time to progression model the survival model, and the cure rate model, all of the covariates including those binary endpoints, except for the treatment indicator R, are standardized to have mean 0 and standard division 1. We use a linear trajectory in the longitudinal model. No standardization is done for any of these covariates in the longitudinal model. In the overall survival model, we also include time-dependent indicator for treatment switch V T SW , time-dependent indicator for progression status V T P and an interaction term R * V T P .
We used DIC and LPML under the survival model alone for overall survival and the cure rate model alone for time to progression to determine K j , the number of pieces for the piecewise constant hazard function for j = 1, 2. As reported in the footnotes under Tables S.5 and S.6 in the supplementary materials, the 'best choices' were K 1 = 25 for time to progression and K 2 = 7 for overall survival according to both DIC and LPML. Under these choices of K 1 and K 2 , we fit the models of the survival data jointly with each of GH, HNPA and HNSW longitudinal data and calculated the decompositions of DIC and LPML. The values of DIC, DIC Pg , DIC Surv,1 DIC Surv,2 , LPML, LPML Pg , LPML Surv,1 , and LPML Surv,2 are shown in Figure  2 for GH and reported in Table 2 for all three longitudinal measures.
Larger values of DIC Surv and LPML Surv correspond to more gain in the fit of the overall survival data by adding time to disease progression and different longitudinal components. All values of DIC Surv,1 and LPML Surv,1 in Table 2 were positive, implying that all three longitudinal measures together with the time-dependent covariates helped in the fit of the overall survival data. All values of DIC Surv,2 and LPML Surv,2 were positive, indicating that all three longitudinal measures helped in the fit of the overall survival data in addition to the time-dependent covariates. Comparing the magnitudes of DIC Surv,2 and LPML Surv,2 in Table 2, in terms of gain in the fit of overall survival, GH is more associated with the overall survival time than HNSW and HNPA, while HNSW has a stronger association with the overall survival time than HNPA. Similarly, larger values of DIC Pg and LPML Pg correspond to more gain in the fit of disease progression data by adding different longitudinal components. The results in Table 2 show that GH is helpful in the fit of the time to progression data, while there were weak associations between HNPA or HNSW and the time to progression data. After comparing the values of overall DIC for the models jointly with different longitudinal measures in Table 2, we see that HNPA < GH < HNSW in terms of the corresponding DIC values, while based on DIC Pg|Long and DIC Pg|Long , GH has the smallest values. The reason for this is that there were different numbers of observations (1 589 for GH, 1 587 for HNPA and 1 525 for HNSW) for the three longitudinal measures. Thus, the values of the overall DIC were not comparable. The similar results of LPML and LPML decompositions were observed. Table 3 shows the posterior estimates and 95% HPD intervals of the hazard ratios of variables/parameters for the overall survival model (3.10) and the time to progression model (3.8), and the odds ratios of variables for the cure rate logistic regression model (3.6) and the posterior estimates and the 95% HPD intervals of  Table 3, we also see that the 95% HPD intervals of the hazard ratios corresponding to β 1 and β 2 are (1.053, 1.279) and (1.215, 1.482), respectively, which do not contain 1. Thus, a covariate is significant if its 95% HPD interval of the hazard ratio does not contain 1. These results imply that an increase in the time-dependent longitudinal trajectory function is associated with a shorter disease progression time as well as a worse overall survival mortality, which is clinically appealing since a large value of the longitudinal trajectory function corresponds to a small GH value. Under the proposed joint model, β 1 constitutes relationship between the longitudinal measure and the time to progression, while β 2 induces correlation between the longitudinal measure and the overall survival time. In particular, β 1 = 0 or β 2 = 0 implies that the longitudinal measure does not influence the risk of disease progression or overall survival. From Tables S.3 and S.4, we see that only β 2 was significant for both HNPA with respective 95% HPD intervals (−0.237, −0.087) and (−0.193, −0.095 . We see from these figures that the autocorrelations disappear even after lag 2 for some parameters and after lag 20 for most of the parameters. These trace and autocorrelation plots show good convergence and mixing of these MCMC chains. The HPD intervals were computed via the Monte Carlo method developed by (Chen and Shao, 1999). Computer code was written for the FORTRAN 95 compiler, and we used IMSL subroutines with double precision accuracy. The FORTRAN code is available from the authors upon request. In addition, an R interface of our Fortran code is developed, which can be downloaded from the journal website (http://www.statmod.org/smij/archive.html).

Discussion
In this article, we proposed a class of mixed effects regression models for longitudinal measures, a cure rate model for the disease progression time and a time-varying covariates model for the overall survival. In Section 5, we carried out a Bayesian analysis of the head and neck cancer data using the computational algorithm discussed in Section 4.2. To examine the empirical performance of the posterior estimates under the proposed joint model, we conducted a simulation study. The design and results of the simulation study are presented in Section S.3 of the supplementary materials. As shown in Table S.1, the posterior estimates are close to the true values for almost all of the parameters except for φ, and the CP's are close to 95% for most of the parameters. The simulation results empirically confirm convergence of the proposed Markov chain Monte Carlo sampling algorithm and identifiability of the model parameters except for φ under the proposed joint model. Within the semi-competing risk setting, we developed the decompositions of DIC and LPML to determine which PRO is most associated with the survival outcome. As discussed in Section 4.3, the DIC and DIC decompositions require a numerical approximation of an integral over random effects θ * , while the LPML and LPML decompositions can be computed using the Monte Carlo methods. Thus, the LPML and LPML decompositions are more advantageous when the dimension of random effects becomes large. The proposed joint models and the corresponding decompositions of Bayesian model selection criteria can be extended to discrete longitudinal biomarkers such as Eastern Cooperative Oncology Group (ECOG) performance status and other non-normal longitudinal biomarkers. Other extensions include the joint models of the multiple longitudinal measures and the survival data with semi-competing risks. These extensions are currently under investigation.