A support vector machine-based cure rate model for interval censored data

The mixture cure rate model is the most commonly used cure rate model in the literature. In the context of mixture cure rate model, the standard approach to model the effect of covariates on the cured or uncured probability is to use a logistic function. This readily implies that the boundary classifying the cured and uncured subjects is linear. In this article, we propose a new mixture cure rate model based on interval censored data that uses the support vector machine to model the effect of covariates on the uncured or the cured probability (i.e. on the incidence part of the model). Our proposed model inherits the features of the support vector machine and provides flexibility to capture classification boundaries that are nonlinear and more complex. The latency part is modeled by a proportional hazards structure with an unspecified baseline hazard function. We develop an estimation procedure based on the expectation maximization algorithm to estimate the cured/uncured probability and the latency model parameters. Our simulation study results show that the proposed model performs better in capturing complex classification boundaries when compared to both logistic regression-based and spline regression-based mixture cure rate models. We also show that our model’s ability to capture complex classification boundaries improve the estimation results corresponding to the latency part of the model. For illustrative purpose, we present our analysis by applying the proposed methodology to the NASA’s Hypobaric Decompression Sickness Database.


Introduction
Ordinary survival analysis techniques such as the proportional hazards (PH) model, the proportional odds (PO) model or the accelerated failure time (AFT) model are concerned with modeling censored time-to-event data by assuming that every subject in the study will encounter the primary event of interest (death, relapse, or recurrence of a disease etc.).However, it is not appropriate to apply these techniques to situations where a portion of the study cohort does not experience the event, e.g., clinical studies involving low fatality rate with death as the event.It can be argued that if these subjects are followed up sufficiently beyond the study period, they may face the event due to some other risk factors.Therefore, these subjects can be considered as cured with respect to the event of interest.The survival model that incorporates the effects of such cured subjects is called the cure rate model.Remarkable progress in medical sciences also necessitate further exploration in to the cure rate model where estimating the cure fraction precisely can be of great importance Peng & Yu (2021).
Introduced by Boag (1949) and exclusively studied by Berkson & Gage (1952), the mixture cure rate model is perhaps the most popular cure rate model.If T * denotes the lifetime of a susceptible (not cured) subject, then, the actual lifetime T for any subject can be modeled by where J is a cure indicator denoting if an individual is cured (J = 0) or not (J = 1).Further, considering S p (t) = P (T > t) and S u (t) = P (T * > t) as the respective survival functions corresponding to T and T * , we can express where π = P (J = 1).The latency part S u (t) = S u (t|x) and the incidence part π = π(z) are generally modeled to incorporate the effects of covriates x = (x 1 , . . ., x p ) T and z = (z 1 , . . ., z q ) T for any integers p and q.Note here that x and z may share the same covariates.
The properties of the mixture cure rate model with various assumptions and extensions are explored in details by several authors.Modeling lifetime of the susceptible individuals have been studied extensively.For example, a complete parametric mixture cure rate model is studied by Farewell (1982Farewell ( , 1986) by assuming homogeneous Weibull lifetimes and logit-link to the cure rate.Semiparametric cure models with PH structure of the latency is studied extensively by Kuk & Chen (1992), Peng & Dear (2000) and Sy & Taylor (2000), to name a few.Generalizations to semiparametric PO (Gu et al., 2011;Mao & Wang, 2010), AFT (Li & Taylor, 2002;Zhang & Peng, 2007, 2009), transformation class (Lu & Ying, 2004) and additive hazards (Barui & Yi, 2020) under mixture cure rate model were also investigated with various estimation techniques and model considerations.
On the other hand, the incidence part π(z) is traditionally and extensively modeled by sigmoid or logistic function where β = (β 0 , β 1 , . . ., β q ) T and z * = (1, z T ) T (Farewell, 1982;Kuk & Chen, 1992;Peng & Dear, 2000).As observed in the case of logistic regression, the logistic model works well when subjects are linearly separable into the cure or susceptible groups with respect to covariates.However, problem arises when subjects cannot be separated using a linear boundary.Other options to model the incidence include assuming a probit link function ( , where Φ is the cumulative distribution function of the standard normal distribution (Peng, 2003;Cai et al., 2012;Tong et al., 2012).However, these link functions do not offer non-linear separability and are not sufficient to capture more complex effects of z on the incidence.Non-parametric strategies, e.g., the generalized Kaplan-Meier estimate at maximum uncensored failure time (Xu & Peng, 2014) to estimate the incidence part π(z) and the modified Beran-type estimator (López-Cheda et al., 2017) to estimate the latency part in a mixture cure model, are also considered in the literature.Again, applying these strategies to multiple covariates can be challenging.Therefore, there exists necessity to identify a group of classifiers which would be able to model the incidence part more effectively by allowing non-linear separating boundaries between the cured and non-cured subjects.
To this end, the support vector machine (SVM) could be a reasonable choice.Introduced by Cortes & Vapnik (1995), the SVM is a machine learning algorithm that finds a hyperplane in multidimensional feature space that maximizes the separating space (margin) between two classes.The main advantage of the SVM is that it can separate nonlinear inseparable data by transforming it to a higher dimensional space using kernel trick.Consequently, this classifier is more robust and flexible than logit or probit link functions.Recently, Li et al. (2020) studied the effect of the covariates on the incidence π(z) by implementing the SVM.The new mixture model is seen to outperform existing cure rate models especially in the estimation of the incidence, and performs well for non-linearly separable classes and high dimensional covariates.However, Li et al. (2020) only considered data under non-informative right censoring mechanism.Motivated by this work, we propose to employ the SVM based modeling to study the effects of covariates on the incidence part of the mixture cure rate model for survival data subject to interval-censoring.
Unlike right-censored data, interval-censored data occur for a study where subjects are inspected at regular intervals, and not continuously Treszoks & Pal (2022).If a subject meets with the event of interest, the exact survival time is not observed and is only known that the event has occurred between two consecutive inspections.Interval-censored data marked by cure prospect are often observed in follow-up clinical studies (cancer biochemical recurrence or AIDS drug resistance) dealing with events having low fatality and patients monitored at regular intervals (Sun, 2007;Lindsey & Ryan, 1998).As in the case of right-censored data, some subjects may never encounter the event of interest, and are considered as cured.Mixture cure models with interval censored data are examined based on several estimation techniques for both semiparametric and non-parametric set-ups (Kim & Jhun, 2008;Ma, 2009Ma, , 2010;;Xiang et al., 2011;Aljawadi et al., 2012).
The rest of the article is arranged as follows.In Section 2, we discuss about the mixture cure rate model framework for interval-censored data and develop an estimation procedure based on the expectation maximization (EM) algorithm that employs the SVM to model the incidence part.
In Section 3, a detailed simulation study is carried out to demonstrate the performance of our proposed model in terms of flexibility, accuracy and robustness.Comparisons of our model with the existing logistic regression based mixture cure rate models are made in this section.The model performance is further examined and illustrated in Section 4 through an interval censored data on smoking cessation.Finally, we end our discussion by some concluding remarks and possible future research directions in Section 5.

SVM based mixture cure rate model with interval censoring 2.1 Censoring scheme and modeling lifetimes
The data we observe in situations with interval censoring are of the form (L i , R i , δ i , x i , z i ) for i = 1, . . ., n, where n denotes the sample size.For the i-th subject, L i denotes the last inspection time before the event and R i denotes the first subsequent inspection time just after the event.Note that L i < R i .The censoring indicator is denoted by δ i = I(R i < ∞), which takes the value 0 if R i = ∞, meaning that the event is not observed for a subject before the last inspection time, and takes the value 1 if R i < ∞, meaning that the event took place but its exact time is not known and is only known to belong to the interval [L i , R i ].Now, x i and z i are the respective p dimensional and q dimensional covariate vectors affecting the incidence and latency parts, respectively, of the mixture cure rate model.To demonstrate the effect of covariates on the latency part, we consider a proportional hazards structure to model the lifetime distribution of the susceptible or non-cured subjects.That is, for the susceptible subjects, we model the hazard function by where γ = (γ 1 , . . ., γ p ) T is the p dimensional regression parameter vector measuring the effects of x and h 0 (•) is the unspecified baseline hazard function.To facilitate our discussion, we assume the baseline hazard to be of the following form: , where α > 0. One is of course free to use other forms for the baseline hazard.Therefore, we have Note that (5) turns out to be the hazard function of a Weibull distribution with shape parameter α and scale parameter {e x T i γ } −1/α .Weibull distribution is a popular and flexible choice for modeling lifetimes or failure times in survival analysis.It is closed under proportional hazards family when the shape parameter remains constant, and it accommodates decreasing (α < 1), constant (α = 1) and increasing (α > 1) failure rates (Farewell, 1982;Tsodikov et al., 2003;Kleinbaum & Klein, 2010).From (2), the resulting survival function and density function of any subject in the study (irrespective of the cured status) are respectively given by and where

Form of the likelihood function
As missing observations are inherent to the problem set-up and model framework, we propose to employ the EM algorithm to estimate the unknown parameters (McLachlan & Krishnan, 2007;Sy & Taylor, 2000;Peng & Dear, 2000;Balakrishnan & Pal, 2016).For implementing the EM algorithm, we need the form of the complete data likelihood function.Let us define ∆ 0 = {i : δ i = 0} and ∆ 1 = {i : δ i = 1}.Missing observations that appear in this context are in terms of the cure indicator variable J, where J is as defined in (1).Note that J i 's are all known to take the value 1 if i ∈ ∆ 1 .However, if i ∈ ∆ 0 , J i can either take 0 or 1, and is thus unknown or missing.Using these J i 's as the missing data, we can define the complete data as (L i , R i , δ i , J i , x i , z i ), for i = 1, . . ., n, which contain both observed and missing data.Under the interval censoring mechanism, we can now express the complete data likelihood function and log-likelihood function as: and where S u (t Pal & Balakrishnan, 2017a).It can be further noted that where is a function that depends on the incidence part only and is a function that depends on the latency part only; see Pal (2021).

Modeling the incidence part with support vector machine
Let us assume that J i for i ∈ ∆ 0 are observed by some mechanism to assist our theory.Support vector machine algorithm maximizes the linear or non-linear margin between the two closest points belonging to the opposite classification groups (cured and susceptible).That is, SVM solves the following optimization problem for d i ; i = 1, . . ., n: where C is a parameter that trades off between the margin width and misclassification proportion.Smaller values of C cause optimizer to look for a larger margin width allowing higher misclassification.Φ k (., .) is a symmetric positive semi definite kernel function, which we consider to be the radial basis function (RBF) given by Φ . RBF is a popular choice of the kernel function owing to its robustness by implementing the idea that a linear classifier in higher dimension can be used as a non-linear classifier in lower dimension.The parameter σ 2 determines the kernel-width.Both hyper-parameters C and σ 2 are to be tuned to obtain the highest classification accuracy using cross-validation methods (Chang & Lin, 2011).Grid search can be implemented to determine C and σ 2 .Low values of σ 2 result in overfitting and jagged separator, while high values of σ 2 result in more linear and smoother decision boundaries.Also, it is recommended to standardize the covariate vector z.
The mapping J i to 2J i − 1 converts the respective 0 and 1s to -1 and +1s, which aids in formulation of the optimization problem under the SVM framework.Once d i 's are obtained, we can derive a , for some d j > 0. For any new covariate vector z new , the optimal decision or classification rule is given by As suggested by Li et al. (2020), the sequential minimal optimization method (SMO), introduced by Platt (1999), can be applied to solve (13).As opposed to solving large quadratic optimization problems to train a SVM model, SMO solves a series of smallest possible quadratic problems.Thus, SMO is relatively time inexpensive algorithm.Any subject with covariate z new is assigned to the susceptible group if ψ(z new ) > 0 and to the cured group if ψ(z new ) < 0.
In the given context, note that it is not enough to just classify subjects as being cured or susceptible.
It is also of our interest to obtain the estimates of uncured probabilities π(z i ) or equivalently the cured probabilities 1 − π(z i ).For this purpose, we use the Platt scaling method to obtain an estimate of π(z i ) from the classification rule ψ(.) (Platt et al., 1999).The estimate of π(z i ) by Platt scaling method is given by where A and B are obtained by maximizing the following function: Here, and n (1) and n (0) represents the number of subjects in the susceptible and cured groups, respectively.
We started our discussion on the SVM based modeling of the incidence part above with the assumption that J i s are observed and available for training purpose.However, in practice, the cure status J i is not known for i ∈ ∆ 0 .Multiple imputation based approach can be applied here to obtain π(z i ) with imputed values of J i for i = 1, . . ., n.The steps are as follows: 1.For a pre-defined integer N * and n * = 1, 2, . . ., N * , generate {J

Development of the EM algorithm
The E-step in the EM algorithm involves finding the conditional expectation of the complete data log-likelihood function in (9) given the current estimates (say, at the (r + 1)-th iteration step) and the observed data, which is equivalent to finding the conditional expectation of J i given the observed data, π(z i ) and (α, γ T ) T , as where r) .Note that (18) implies that w (r+1) i = 1 for all i ∈ ∆ 1 .We obtain the conditional expectation of l c by simply replacing J i 's with w (r+1) i in (9).We denote the aforementioned conditional expectation by where and The M-step updates the parameters in Q c1 and Q c2 .For r = 0, 1, . . ., the procedure for the (r + 1)-th iteration step of the EM algorithm is given below.
1. Carry out the multiple imputation technique, as described in Section 2.3, by considering p 2. Obtain (α (r+1) , γ (r+1)T ) by maximizing the function Q c2 , as defined in (21), with respect to α and γ.That is, find 3. Check for the convergence as follows: where ), > 0 is some predetermined and sufficiently small tolerance and || • || 2 is the L 2 -norm.If the above criterion is satisfied, then, stop the algorithm.In this case, π(r+1) (z i ), for i = 1, . . ., n, and (α (r+1) , γ (r+1)T ) T are the final pointwise estimates.On the other hand, if the above criterion is not met, continue to Step 4.

Update w
where and m r+1) .

Calculating the standard errors
The standard errors are estimated by non-parametric bootstrapping.For b = 1, . . ., B, b -th bootstrapped data set is obtained by resampling with replacement from the original data.The sample size of the b -th bootstrapped data is the same as the original data.Then, we carry out steps 1-5 of the EM algorithm as detailed in Section 2.4 to obtain the estimates of model parameters for each bootstrapped data.This gives us B estimates for each model parameter.For each parameter, the standard deviation of these B estimates provide an estimate of the standard error of the parameter.

Finding the initial values
To start the EM algorithm, we need to provide initial values of π(z i ), for i = 1, . . ., n, along with α and γ.To come up with an initial guess of π(z i ), first, we can consider the censoring indicator δ i , i = 1, . . ., n, as the cure indicator (i.e., δ i = 0 would imply J i = 0 and δ i = 1 would imply J i = 1).Then, we can apply the SVM to come up with the classification rule, as given in ( 14), and, finally, we apply the Platt scaling method, as given in (15), to obtain π(z i ).To obtain an initial guess of the latency parameters α and γ, we make use of the form of the survival function of the susceptible subjects, i.e., S u (t i ) = exp {− (t i /m i ) α } , where m i = {e x T i γ } −1/α .Note that this form implies that log{− log S u (t i )} = α log t i + x T i γ, i = 1, . . ., n.Hence, we can fit a linear regression model using log{− log S u (t i )} as the response to obtain estimates of α and γ, which can be used as the initial guesses.For this purpose, S u (t i ) can be the estimated using the non-parametric Kaplan-Meier estimates.Since the form of the data is interval censored, we can take , for all i = 1, . . ., n.Note that this procedure may result in negative estimates of α.As such, we can take the initial guess of α as 0.05 or 0.1 if the estimate of α turns out to be negative.

Simulation study
In this section, we assess the performance of the proposed SVM based EM algorithm to estimate the model parameters of the mixture cure rate model for interval censored data.We generate two random values x 1 and x 2 independently from the standard normal distribution and assume x = z with x = (x 1 , x 2 ) T .We consider two different sample sizes: n = 300 and n = 400 and use the following links to generate uncured probabilities π(z): Note that Scenario 1 represents the standard logistic regression model which captures a linear classification boundary.On the other hand, Scenarios 2 and 3 capture non-linear or more complex classification boundaries, as shown in Figure 1. Figure 2 shows the plots of simulated uncured probabilities and how they vary with respect to the covariates z 1 and z 2 .
We assume lifetimes of the susceptible subjects follow the proportional hazards structure with the hazard function where h 0 (t) = αt α−1 .As discussed before, the above hazard function implies that the susceptible lifetime follows a Weibull distribution with shape parameter α and scale parameter {exp(γ 1 x 1 + γ 2 x 2 )} − 1 α .We consider the true values of (α, γ 1 , γ 2 ) as (0.5, 1, 0.5).The censoring time is generated from a Uniform distribution in (0, 20).Under these settings, the cure probabilities range from 50% − 65%, whereas the overall censoring proportions range from 60% − 75%.To generate interval censored lifetime data (L i , R i , δ i ), i = 1, 2, • • • , n, we carry out the following steps: Step 1: Generate a Uniform (0,1) random variable U i and a censoring time C i ; Step 2: If U i ≤ 1 − π(z i ), set L i = C i , R i = ∞, and δ i = 0; Step 3: If U i > 1 − π(z i ), generate T i from a Weibull distribution with shape parameter α and scale parameter {exp(γ 1 x 1i + γ 2 x 2i )} − 1 α ; Step 4: , and generate L 1i from Uniform (0.2, 0.7) distribution and L 2i from Uniform (0, 1) distribution.Next, create intervals (0, All simulations are done using the R statistical software (version 4.0.4) and all results are based on M = 500 Monte Carlo runs.To employ our proposed methodology, we consider number of imputations in the multiple imputation technique to be 5, which is in line with Li et al. (2020); see also Wu & Yin (2013).In Table 1, we report the bias and mean squared error (MSE) of the estimated uncured probability π(z) and the susceptible survival probability Ŝu = Ŝu (., .; x).These are calculated as: where π (k) (z i ) and S (k) u (L i , R i ; x i ) are the true uncured probability and susceptible survival probability, respectively, corresponding to the i-th subject and the k-th Monte Carlo run.Similarly, u (L i , R i ; x i ) are the estimated uncured probability and susceptible survival probability, respectively, corresponding to the i-th subject and the k-th Monte Carlo run.In the above expressions, note that S (k) From Table 1, it is clear that the bias and MSE of the estimated uncured probability from the logistic based EM algorithm is smaller than that from the proposed SVM based EM algorithm when logistic regression is the correct model (Scenario 1).However, when the true model for the uncured probability is not the logistic regression in Scenarios 2 and 3, the proposed SVM based EM algorithm produces smaller bias and MSE in the estimated uncured probability.Figure 3 presents the biases of the estimates of the individual uncured probabilities plotted against each covariate.
For the estimates of the susceptible survival probability, when the logistic regression model (Scenario 1) is the true model for the uncured probability, the logistic based EM algorithm produces smaller biases and MSEs compared to the SVM based EM algorithm.On the other hand, when the true model for the uncured probability is non-logistic (Scenarios 2 and 3), the SVM based EM algorithm results in smaller biases and MSEs when compared to the logistic based EM algorithm.Figure 4 presents the biases of the estimates of the susceptible survival probabilities when plotted against each covariate.These findings clearly indicate that the SVM based EM algorithm is able to capture more complex and non-linear classification boundaries, where the standard logistic based EM algorithm produces relatively larger bias and MSE.
In Table 2, we present the estimation results corresponding to the latency parameters.In particular, we compare bias, standard deviation (SD) and MSE of the estimates of the latency parameters corresponding to the proposed SVM based mixture cure rate model and the traditional logistic regression based mixture cure rate model.We can see that the bias, SD and MSE corresponding to the logistic regression based EM algorithm are smaller when the logistic regression is the true model for the uncured probabilities (i.e., Scenario 1 is true).However, when the true model for the uncured probabilities is non-logistic (i.e., Scenarios 2 and 3 are the true models), the SVM based EM algorithm, in general, results in smaller bias, SD and MSE (note that in some cases, the estimates of parameters tend to have larger biases, SDs and MSEs in SVM method than in logistic method).With an increase in the sample size, the bias, SD and MSE tend to decrease further, which is what we would expect.
Summarizing the findings from both Table 1 and Table 2, we can conclude that the proposed SVM based EM algorithm performs better than the standard logistic regression based EM algorithm, both in terms of the incidence part and the latency part of the mixture cure rate model, when the true classification boundary is non-liner and complex.This clearly demonstrates the ability of the proposed SVM based model to handle complex non-linear classification boundaries.
Although, in practice, the cured status is unobserved for a real data, we do know which observations can be considered as cured when we simulate data.Using such information on the cured status for simulated data, we can easily compare the proposed SVM based mixture model with the logistic  3.These results are based on 500 Monte Carlo runs with n = 400 in each run.It is once again clear that under Scenarios 2 and 3 (i.e., when the classification boundaries are non-linear), the performance (or the accuracy) of the SVM based model is better than the logistic regression based model.Note, in particular, that the performance of the SVM based model is significantly better under Scenario 2. However, under scenario 1 (i.e., when the classification boundary is linear), the logistic regression based model performs slightly better than the SVM based model.

Comparison with spline-based mixture cure model and using non-parametric baseline survival function
To demonstrate the superiority of our proposed model, we also compare our model with the spline regression-based mixture cure model which can also capture complex patterns in the data.We also relax the parametric assumption on the baseline hazard function and estimate the baseline survival function non-parametrically using the Turnbull type estimator.Considering scenario 3 and three different sample sizes (n=300, 600, 900), we present the results in Table 4.The corresponding ROC curves are presented in Figure 6.It is once again clear that our proposed SVM-based model performs better when compared to both spline-based and logistic regression-based models.

Illustrative example: smoking cessation data analysis
We further demonstrate our proposed methodology using a dataset on smoking cessation study (Murray et al., 1998;Wiangnak & Pal, 2018).The study contains 223 subjects who had enrolled  (Banerjee & Carlin, 2004;Kim & Jhun, 2008).Only those subjects who had tried to quit smoking at least once and who had identifiable Minnesota zip codes during the study period are considered in the analysis set.These subjects were all smokers at the time of enrollment, and were randomly assigned to two groups, namely, the smoking intervention (SI, treatment group) and the usual care (UC, control group).The subjects were monitored once every year for a period of 5 consecutive years.Information on whether they had relapsed or not (1:Yes and 0:No) are present in the data set.A relapse implies resumption of smoking and the event of interest for our illustration is the time to relapse.Obviously, the exact relapse time was unobserved since the relapse could have happened anytime in between two consecutive annual visits.Hence, the study falls under the scope of interval censored data analysis.Information on several additional variables are also available, e.g., gender (GEN, 1:Female and 0:Male), duration of smoking (DUR, time in years elapsed between commencement of smoking and entry to the study) and average number of cigarettes smoked per day (AVGCIG) before the study period.These variables are treated as covariates since these factors supposedly can influence the relapse.Out of those who relapsed, most did so in the first year of their smoking cessation trial (see Figure 7).In Figure 8, we present the Kaplan-Meier curve.Clearly, we can see that the curve levels off to a significant non-zero proportion.This indicates that there could be a greater likelihood of the presence of cured fraction in the data.In Table 5, we present few important descriptive statistics related to the study.In our application, we consider DUR (x 1 ), AVGCIG (x 2 ) and GEN (x 3 ) as covariates of interest.We fit the proposed SVM based mixture cure rate model and, for comparison, we also fit the logistic regression based mixture cure rate model.First, we draw inference on the incidence part of the model.In Figure 9, for each gender, we plot the estimates of the uncured probabilities against

Conclusion
The support vector machine has received a great amount of interest in the past two decades.It has been shown that the SVM performs well in a wide array of problems including face detection, text categorization and pedestrian detection.However, the use of the SVM in the context of cure rate models is new and not well explored.In this manuscript, we have proposed a new cure rate model that uses the SVM to model the incidence part and a proportional hazards structure to model the latency part for survival data subject to interval censoring.The new cure rate model inherits the properties of the SVM and can capture more complex classification boundaries.For the estimation purpose, we have proposed an EM algorithm where sequential minimal optimization together with Platt scaling method are employed to estimate the uncured probabilities.In this regard, due to the unavailability of some cured statuses, we make use of a multiple imputation based approach to generate missing cured statuses.Due to the complexity of the proposed model and the estimation method, we approximate the standard errors of the estimated parameters using non-parametric bootstrapping.Through a simulation study, we have shown that when the true classification boundary is non-linear the proposed SVM based model performs better than the standard logistic regression based model.This is true with respect to both incidence and latency parts of the model.As future research, it is of great interest for us to extend the proposed model to accommodate a competing risks scenario (Balakrishnan & Pal, 2015;Davies et al., 2021).It is also of interest to explore other machine learning algorithms (e.g., neural network or tree-based approaches) to study more complicated cure rate models such as those that look at the elimination of risk factors (Pal & Balakrishnan, 2016, 2017a,b, 2018;Majakwara & Pal, 2019) and those that belong to a transformation family of cure models Wang & Pal (2022).We are currently looking at some of these problems and we hope to report the findings in our upcoming manuscripts.

Figure 1 :Figure 2 :
Figure 1: Simulated cured and uncured observations for the three considered scenarios

Figure 3 :
Figure 3: Bias of the uncured probabilities with respect to each covariate for the three considered scenarios

Figure 4 :
Figure 4: Bias of the susceptible survival probabilities with respect to each covariate for the three considered scenarios

Figure 6 :
Figure 6: Figure 2: ROC curves for different mixture cure models (MCM) and sample sizes

Figure 7 :
Figure 7: Number of relapses in between every consecutive annual visits from study entry

Figure 8 :
Figure 8: Kaplan Meier curve for the smoking cessation data

Table 1 :
Comparison of Bias and MSE of the uncured probability and susceptible survival probability

Table 2 :
Estimation results corresponding to the latency parameters

Table 4 :
Comparison of SVM-based model with spline-based and logistic regression-based models

Table 5 :
Distribution of proportion of relapse, average duration and average number of cigarettes smoked per year by gender and treatment group SI: smoking intervention, UC: usual care, n: sample size, %: percentage of the total, pr: proportion of relapse, CI: confidence interval, Avg Dur: average of DUR, Avg Cig: average of AVGCIG, SD: standard deviation

Table 6 :
Estimation results corresponding to the latency parameters for the smoking cessation data