Adaptive aggregation for longitudinal quantile regression based on censored history process

Most of the studies for longitudinal quantile regression are based on the correct specification. Nevertheless, one specific model can hardly perform precisely under different conditions and assessing which conditions are (approximately) satisfied to determine the optimal one is rather difficult. In the case of the mixed effect model, the misspecification of the fixed effect part will cause a lack of predicting accuracy of random effects, and affect the efficiency of the cumulative function estimator. On the other hand, limited research has focused on incorporating multiple candidate procedures in longitudinal data analysis, which is of current emergency. This paper proposes an exponential aggregation weighting algorithm for longitudinal quantile regression. Based on the secondary smoothing loss function, we establish oracle inequalities for aggregated estimator. The proposed method is applied to evaluate the cumulative τ th quantile function for additive mixed effect model with right-censored history process, and an aggregation-based best linear prediction for random effects is constructed as well. We show that the asymptotic properties are conveniently imposed owing to the smoothing scheme. Simulation studies are carried out to exhibit the rationality, and our method is illustrated to analyze the data set from a multicenter automatic defibrillator implantation trial.


Introduction
Quantile regression (Koenker and Bassett 1 ) is an increasingly prevalent tool. It provides a constructive strategy for examining the effect of covariates on the entire response distribution and offers a more flexible, robust approach for data analysis. In longitudinal studies, the conditional th quantile regression of a given probability level ∈ (0, 1) is assumed as where y(t), x(t) is the response variable and p × 1 dimensional covariate vector at time t, respectively. q (x(t)) is an unspecified function which contains conditional th quantile of model errors among x(t). For a determined q (x(t)), there is a great deal of literature for related statistical inferences. Yin and Cai 2 proposed a working independent method for linear quantile regression (LQR) with correlated failure time data. Lu and Fan 3 estimated parameters in weighted quantile regression. Tang et al. 4 proposed a B-spline-based variable selection procedure for varying-coefficient quantile regression, and Wang and Sun 5 extended the approach to partial linear varying coefficient model with the within-subject correlation structure.
One of the significances for longitudinal quantile regression is dealing with censoring. Galvao et al. 6 proposed a threestep estimator in the sense of panel quantile regression with fixed individual effects and left censoring. Harding and Lamarche 7 developed a penalized estimation with a parametric or nonparametric mechanism to reduce attrition bias in Big Data, which is applicable to censored data as well. Recently, Liu et al. 8 discussed the estimation of the cumulative th quantile function ( -CQF) for history process with right-censored time-to-event variable. Nevertheless, most of the existing methodologies are based on the fixed specification of model (1). Actually, there are a number of sampling designs, which require consideration for the correlation between observations that belongs to the same unit or cluster. The random effect provides a flexible medium in terms of complex data analysis. Koenker, 9 Geraci and Bottai, 10 and Kulkarni et al. 11 generalized equation (1) to the longitudinal quantile mixed effect model (LQMM) with linear fixed effect and random intercepts. More recently, Geraci 12 developed the algorithm for the covariate-affected random effect structure by Gaussian quadrature technique, and Geraci 13 proposed a spline-based approximation for the nonparametric additive form of the fixed effect pattern to make an inference.
In practice, identifying an extremely correct model is rather difficult for the data at hand. Traditional model selection is commonly used to search for the most approximate one. Machado 14 proposed a robust selection criterion for general M-estimators which is applicable to quantile regression, and Koenker 15 suggested the traditional Akaike information criterion. The shortcoming of the above methods is frequently ignoring the level of uncertainty associated with the choice of the model while reporting precision estimates. One approach to incorporating such uncertainty is model averaging, which combines competing specifications via propensity weights. Since the pioneering contribution of Hjort and Claeskens, 16 the number of works on the topic has proliferated in the past few years. For instance, Hansen 17 proposed the Mallows' criterion for selecting weights of least squares regressive models; Lu and Su 18 used a jackknife model averaging strategy to sort weights in general quantile regression; and Wang et al. 19 generalized the method with high-dimensional covariates, etc.
It is noted that the aforementioned thoughts are based on the weights via vested information criterion, and emphasize the parametric model assumption. To make the comparison with complex nonparametric forms, Yang 20 proposed an adaptive estimation for least squares regression by model mixing. Recently, Shan and Yang, 21 Gu and Zou 22 imposed the method to contribute the aggregation algorithm for quantile regression and expectile regression, respectively. The estimators are theoretically shown to approach the best candidate. However, few literary works considered the aggregation for longitudinal quantile regression, especially in view of -CQF evaluating. To address the void, we contribute an adaptive estimation for longitudinal quantile regression. An exponential aggregation weighting (EAW) algorithm is adopted to combine multiple candidate estimating procedures, which efficiently guards against the misspecification of the fixed effect q (x(t)) and exhibits optimally without knowing which of the original procedures works the best.
In an effort to acquire the asymptotic properties of aggregated estimator, we establish the oracle inequalities under particular risk measures. Since the risk bound under the squared error loss can be hardly derived in quantile regression, (Gu and Zou 22 ), we propose a novel smoothing scheme for the quantile check loss function, which supplies the strong convexity and the L-smoothness. Furthermore, we utilize EAW algorithm to additive mixed effect model and construct the estimation of -CQF. One can inspect that the estimator behaves consistently with that of the best procedure under the secondary smoothing loss. We also develop an EAW-based best linear prediction (EAW-BLP) of random effects via the proposed weights. A real dataset from multicenter automatic defibrillator implantation trial (MADIT) is analyzed to report the usability of the proposed methodology.
The rest of the paper is organized as follows. In Section 2.1 we introduce the EAW algorithm for (1). We propose a secondary smoothing approximation to check the loss function and employ it during the weighting algorithm. We apply EAW algorithm to additive mixed effect model and propose an estimator for -CQF based on right-censored history process in Section 2.2, and the prediction of the random effects is modified with the aggregated weights in Section 2.3. We establish oracle inequalities of EAW estimator under original prediction risk and squared loss in Section 3.1, and related consistency of -CQF estimator in Section 3.2, respectively. Simulation studies are presented in Section 4.1 to examine the validity of our algorithm, and we check the predicting accuracy of EAW-BLP for random effects in Section 4.2. We use the method to the MADIT dataset in Section 5. All technical proofs are attached in Appendix.

Methodology
In this section, we establish several aggregated estimators for longitudinal quantile regression. We firstly propose the EAW algorithm in the sense of model (1), and apply it to evaluate the cumulative th quantile function ( -CQF) for additive mixed effect model with right-censored history process. In addition, the prediction of the random effects is discussed in the aggregation framework as well.

EAW algorithm
Suppose there is a sample of n subjects to be observed and for the ith subject, y i (t ij ) and x i (t ij ) are collected at time points t ij (j = 1, … , n i ), where n i is the respective total number of observations. Assume that the observed times of all subjects are bounded by T. There are a large corpus of strategies on obtaining a consistent estimator from observation pairs (x i (t ij ), y i (t ij ), t ij ) if the structure of equation (1) is known. However, the specification may be changeable in different contexts and a unique estimator is sensitive to the failure of efficiency. Let Δ = { k : k > 1} be a set of candidate procedures and k be the kth procedure of q (x(t)). Further, letq (k) (x(t)) be the related estimator from k . There is no special restriction on k as they can be either parametric or nonparametric. For instance, 1 is a standard parametric linear regression, 2 is a varying-coefficient regression, 3 represents the nonparametric additive model, etc. On the other hand, the cardinality of Δ can be either finite or countably infinite. The goal in this section is to combine multiple candidate procedures in Δ to produce an adaptive estimator.
Gu and Zou 22 proposed the aggregated regression for the asymmetric squared loss. Note that for quantile regression the loss function is (v) = v( − I(v < 0)). 15 One can hardly obtain a similar inequality with (2.4) in Gu and Zou 22 since it cannot be justified as strongly convex or L-smooth, while this will provide a succinct course in the follow-up theoretical study. Although Shan and Yang 21 considered a surrogate of (v), which is nondifferentiable at v = 0, they only contributed the oracle inequality via the surrogate loss. Comparatively, directly smoothing strategies such as piecewise of (v) 23,24 involve the first order power (or equivalently, the absolute) of v, and thus it is hard to guarantee the strong convexity, which is essential to attain the risk bound under the squared error loss. To overcome the difficulty, we propose a secondary smoothing approximation of (v) that * where a(h) is a function of h on (0, +∞), and ,h (v) is the smoothing approximation of (v) that is convex and L-smooth in ℝ. In this paper we use the structure of Aravkin et al. 24 as From Lemma 1 in Appendix, it can be proved that * ,h (v) is strongly convex, L-smooth and has the minimum at the same point as (v).
There are two smoothing coefficients in the secondary smoothing loss function (2). Figure 1 plots the curves of * ,h (v) under varying pairs of (h, a(h)). One can see that the skewness between the tails of * ,h (v) and (v) is dominated by a(h) and the global approximation for original check loss is adjudicated by h. Therefore, in order to warrant * ,h (v) close to (v) across the board, a(h)h should be adjusted and a(h) be evaluated smaller than h. Related discussions of the constraints are presented in the next sections to protect the asymptotic efficiency for the aggregated estimation.
In what follows, we propose EAW algorithm for longitudinal quantile regression. The algorithm is based on the multiple splits for the observed dataset, which is typically considered in machine learning, and utilizes the secondary smoothing loss during the weighting process.
Step 4. Repeat steps 1-3 by B times, the aggregated estimator of q (x(t)) is obtained bŷ Remark 1. In addition to smoothing coefficients, there are several tuning parameters in Algorithm 1: controls the propensity of corresponding procedures in the weighting process; { k : k ≥ 1} are regarded as prior weights of candidates. In longitudinal quantile regression, the performance is similar to that in Shan and Yang, 21 Gu and Zou 22 : → 0 to average all of the candidate procedures and → ∞ to select the best historic smoothing loss on the test set. Theoretically, we can set } as a reasonable criterion and the upper-bound of will be exhibited in Section 3.1.

Remark 2.
Although the EAW algorithm will be computationally intensive when the number of random partition is large, it eliminates the lack of accuracy due to randomly splitting sample. Indeed, multiple splits can be carried out in parallel as the results of each loop are mutually independent. All of the above will be shown during the theoretical proof and empirical studies. Besides, if the sequential updating mechanism is of less importance in practice, one can use the full-data set to implement Step 2 and to produce weights via split sets.

Estimating -CQF with censored history process
In this subsection, we evaluate -CQF for LQMM with right-censored history process. For the data in the form , t ij ∈ [0, T], i = 1, … , n, j = 1, … , n i , we consider the additive mixed effect model whose th conditional quantile is: where u i is the q-dimensional subject-specific zero-mean random effects vector and z i (t ij ) be an endogenous time-dependent covariates associated with u i . Generally, we assume u i = (u 1,i , … , u q,i ) ⊤ is independent of the model error term, identically distributed according to a determined density f u with zero-mean and a -dependent q × q covariance matrix Φ . Moreover, denote z i = (z i (t i1 ), … , z i (t in i )) ⊤ , y i |x i , z i , u i is supposed to be independently distributed with the joint asymmetric Laplace (AL) distribution, which provides a possible parametric link between minimizing the sum of check loss and maximizing the AL-likelihood (see in, Koenker and Machado, 25 Geraci and Bottai 26 ). Note that y i |x i , z i , u i ∼ AL( , , ) has unknown parameters and , where the jth component of is Fitting model (4) is equivalent to estimating parameters in (5). If q F, (x i (t ij )) is correctly specified, a number of methodologies have been proposed to make related inference including parametric methods (e.g. Geraci 12 ) and nonparametric methods (e.g. Geraci 13 ). As it cannot be determined under different conditions, the EAW algorithm is applicable for constructing the best approximation. Actually, let Δ = { k : k = 1, … , K} be the set of total K-multiple candidate procedures, an aggregated estimator for the fixed effect part in equation (4) iŝ is the estimator by k with the bth random split,Ω b k,i is given in (3). Since the random effect part is specified as a known structure, the intention has been involved inq (k) F, ,b (x(t)) and it will not be added into weighting process. For the ith subject, let T i and C i represent the terminal time and censoring time, respectively, and , t ij ≥ 0} hence can be taken as observations of the history process that are ceased by the observed event time be the survival function of actual cut-off time T * i . Without loss of generality, we assume that censoring time is independent of terminal time and the history process for each subject. In the light of Liu et al., 8 the th quantile state function ( -QSF) of (4) at time t is defined as H (t) = Q (y(t)|x(t), z(t), u). We combine the inverse probability of censoring weighting (IPCW) method with EAW estimator equation (6) to estimate -QSF that Using the above estimator we can obtain that However, in practice, one can only observe y i (t), covariate x i (t) at discrete time points t ij , j = 1, … , n i ; i = 1, … , n rather than in a continuous interval. Therefore, the linear interpolation in Deng 27 is exploited to compute (7).
, n i } be the set of actual observed time points for the ith subject, and t i0 = 0 means that observations of all subjects can be gathered in the initial time t = 0. We rearrange t ij for all subjects as 0 = t (0) < ⋯ < t (N) , and D = {t (k) , k = 0, … , N} is the set of ordered distinct observed time points. For t (k) ∈ D 1 ∪ ⋯ ∪ D n , the fitted -QSF is given asĤ and for t ∈ (t (k−1) , t (k) ] in a time quantum, As a consequence, the estimator of -CQF at all observed time points It is obviously shown that the estimation of (7) can be expressed by using {̂(t (k) ) : k = 1, … , N} in (8). To be computational convenience forĤ (t), we use a nonparametric approximation forK , and the estimators of H (t), (t) are reformed byH (t) and̃(t), replacing eachĤ (t (k) ) term with

EAW-based prediction for random effects
Predicting random effects for LQMM is an ongoing research issue. Geraci and Bottai 26 proposed a best linear prediction (BLP) of u i in terms of linear fixed effect. For a given probability level, the predictor depends on the estimation of parameters in (5) with correct specification of q F, (x i (t ij )) and the construction of u i , and satisfies that its mean squared error (MSE) reaches the minimum. In (4) we assume that random effects have a linear pattern z ⊤ i (t ij )u i , for the ith cluster, the BLP of u i is given asû and Cov i is regarded as the covariance matrix of random errors in (4). However, since diverse procedures result in changeable estimators, the main purpose of this subsection is to find the most effective prediction. A natural thought is to utilize the exponential aggregated weights to contribute the predictor among candidate q F, (x(t)).
As we mentioned previously, candidate predictors can be generated with a full-data set as long as we ignore the sequential updating mechanism. It is suitable since the predictor may be lack of efficiency if we only use a fraction of the samples to predict all random effects. Letû ( ),k i be the BLP of the procedure k with respect to the ith cluster, andΩ b k,m (N 0 +1 ≤ m ≤ n) be the exponential aggregated weights from Algorithm 1. Denote that Then the EAW-based BLP of the random effects can be computed aŝ

Asymptotic properties
In this section, we build the asymptotic properties of proposed estimators. The outperformance of EAW estimator is critiqued by the oracle inequalities to show that the risks are automatically close to those of the best candidate. By the secondary smoothing loss function (2), we derive a homologous risk bound under quadratic losses. On the other hand, the consistency of the proposed -CQF is given as long as the candidate set involves a consistent procedure, which is conveniently verified under the smoothing strategy.

Oracle inequalities
We illustrate the theoretical properties of EAW estimator of equation (1). The intended adaptivity of the estimator is emerged via oracle inequalities, which provide statistical risk bounds in the sense of original check loss and squared error loss, respectively. For longitudinal data, risk bounds depend on the formulation of data collection as well. Define the counting process of time points for observations on the ith subject as N i (t) = ∑ n i j=1 I(t ij ≤ t), we refer to Fan and Li 28 to assume N i (t) to be a random sample from a certain population in the finite interval. Moreover, let the L 2 norm of a generic function f with respect to the distribution P X of X be ‖f ‖ 2 = ∫ f 2 (x)dP X . Following conditions are imposed for theoretical results.
Condition (A.1) and the upper-boundedness of the sub-exponential norm are fairly common in the related work of aggregations. The constraints are mild to be satisfied only if q (x(t)) and the variance of e (t) are bounded almost surely. In addition, the boundedness of the integral of ,h performs close to . Based on the above conditions, it can be shown that the EAW estimator behaves optimally with the best procedure via the following theorem: Moreover, the risk of EAW estimator under the squared error loss satisfies where M 0 = exp{(4TK ) −1 + (4e) −1 }, H 1 , K , c and c are positive, which will be presented in the proof.

Remark 3.
Through the proof we have H 1 = Th max{ 2 , (1 − 2 )}, the risk bound under the original check loss is unavoidably dominated by the smoothing parameter as the aggregated weights are contributed via the secondary smoothing loss function. On the other hand, the smoothing parameter h is restricted by K ′ so that it cannot be too large. When h is chosen near zero, the EAW estimator has an approximate performance with that by original loss. Comparing with Shan and Yang, 21 we can find that the converge rate of EAW estimator is fast as long as In practice if we only care about the prediction risk, one can use However, it provides an advantageous demonstration for the asymptotic properties of -CQF estimator. From a model specification perspective, (12) shows that the EAW estimator has an analogous consistency as long as Δ contains the correct candidate. This requires that c∕c is upper-bounded theoretically, which is by no means mutually exclusive with Condition (A.2).

Consistency for -CQF estimator
In this subsection, we anchor our investigation to demonstrate the consistency of estimators in Section 2.
in the support of Φ .
Aforementioned assumptions are widely considered in Zeng and Cai, 29 Deng, 27 and Liu et al., 8 which are essential to longitudinal time-to-event data with censoring mechanism. Condition (B.3) is typically recommended to respond the misspecification of q F, (x(t)), and guarantees the convergency of candidate procedures as well. Condition (B.6), combined with Condition (A.2), claims that the smoothing coefficients of * ,h (v) are controlled in an intercell near the right side of the origin to guarantee both the approximation of (v) and the consistency of -QSF. Based on the above conditions, we establish the consistency of the estimators for -QSF, -CQF by the following theorem:  (7) is consistent to (s).

Remark 5.
Theorem 2 implies that the consistency of -CQF estimator is constructed by the effective estimation of the fixed effect, i.e. Δ contains the correct specification of q F, (x(t)). Instead of predicting random effects, the constraint of model (4) can be changed to degenerate into a classical case for the particular practice. Traditional quantile loss estimation may be considered as one of the candidate procedure. See Galvao et al. 6 , Harding and Lamarche 7 for corresponding discussions.
It is worthily shown that the same construction of Theorem 2 will be derived in terms of̃(s) andH (t), which performs much more popular for handling practical problems.

Simulation
In this section, we report the performance of EAW estimators proposed in previous sections by Monte Carlo simulation studies for longitudinal quantile regression. We firstly design an experiment for equation (1), then fit -QSF and -CQF for mixed effect model.

Experiment 1.
In this experiment, we consider a general longitudinal quantile model as follows: For ∈ (0, 1), model (13) U(10, 30). The random errors i = ( i (t i1 ), … , i (t in i )) ⊤ are generated from a multivariate normal distribution with mean 0 1×n i and an AR(1) structure variance-covariance matrix where = 0.5 to make the term with medium correlation. Besides, the covariates x 1,i (t) ∼ U(t∕20, 2 + t∕20), and (  According to the above specification, the th conditional quantile function is presented as Q (y(t)|x(t)) = q 1, N(0, 1). Intuitively, under particular quantile levels, different terms dominate the performance of the true model (such as when = 0.5, Q (y(t)|x(t)) = q 1, and the varying-coefficient regression approximates the model better as near 0.75, among others). To compare EAW estimator with a single procedure, we construct three candidate models to fit (13): 1. LQR:

Nonprametric additive quantile regression (NAQR):
We use the LQR for estimating LQR, and B-spline estimations for VCLQR and NAQR. We take the size of subjects as n = 200 with 500 independent replications. In the procedure of EAW, all subjects are randomly partitioned into a training set of N 0 = n∕2 and the test set for others, k = 1∕3 for k = 1, 2, 3 and set the tuning parameter = 1. The algorithm is averaged over B = 50 random splits. Moreover, smoothing parameters of the secondary smoothing loss function are set up as h = 1 and a(h) = 0.01. Denote the estimation of the true quantile function Q asQ from a specific procedure. The estimated (original check loss based) prediction risk (PR) and the estimated root MSE (RMSE) are designed as quantization criteria that Table 1 reports the performance measures for all candidate procedures and EAW estimator under different probability levels and thickens the best among candidates. One can notice when < 0.5 the nonlinear form q 3, plays a leading role and NAQR fits the model better than LQR and VCLQR. Conversely, when is larger than 0.75, VCLQR is more suitable for the specified model. As tends to the median, three kinds of specifications result in the satisfying PR but the LQR model has a smaller RMSE than the other two nonparametric estimators. All the above results are fundamentally consistent with our prediction. Figure 2 typically presents the frequency of the value of aggregated weights among computations, which indicates that our weights are chosen around the best candidate and, the PR and RMSE of the EAW estimator emerge close to those of the "most approximate" specification of the true model under different probability levels. This phenomenon confirms the outperformance of our adaptive aggregation method in the longitudinal quantile model. The number in brackets represents the standard error of corresponding indicators, which shows that the EAW estimator can keep a stable deviation if one of the above candidates leads an effective result.  To demonstrate the rationality of the methodology in Section 2.2, we conduct the following simulation.
To specify different observed times of an unbalanced design, we utilize the following hazard function to generate the survival time T: Besides, the censoring point C can be sampled from a uniform distribution U(a C , b C ). For each subject, the process with censored observations in (14) is similar to Deng,27 where the lifetime random sample {v i : i = 1, … , n} with h(t) is generated via the following expression: where i = 1, … , n and s = 1, … , ⌈t i ⌉. We specify LQR and NAQR with constant coefficients to fit the fixed effect part in (15) by using the "LQMM" package, and supply a misspecified linear quantile model The EAW algorithm is adopted to combine candidate models and to estimate -QSF, -CQF, respectively. The simulation studies are carried out by 500 successful completions of algorithm. Simple computation leads that the "true function" of H (t) and (t) are ) ) where M f is averaged by the sample of f (x 2,i ) generated among simulations. We implement the aforementioned experiment with sample size n = 120 and 300, split 2∕3 of the total subjects for training and other 1∕3 for testing. Smoothing parameters of the secondary smoothing loss function are taken with the same value as those in Experiment 1, and the weighting process is carried out with a single partition. We present the estimated PR and RMSE of the fixed effect part in Tables 2 and 3. Moreover, the measure of estimators for H (t) and (t) is displayed as average MSE (AMSE) that is the corresponding estimated function of f (t) in the rth replication. It can be seen that in most cases the aggregated estimator shows the gratifying performance regardless of the change of and . The PR and RMSE of the fixed effect emerge the same conclusion as those in Experiment 1. Note that model (15) approaches to the linear structure for = 0.5, and the nonparametric specification leads a more appreciable result otherwise. Although the split for the aggregation is set by B = 1, the EAW estimator still leads to expected properties and hence is verified to be feasible. For the estimator of -QSF and -CQF, we can find that the performance of AMSEs is consistent with that of PRs and RMSEs in different specifications: When = 0.5, LQR has smaller AMSEs than NAQR while latter fits the model much more conveniently otherwise, and the EAW estimator preserves an excellent accuracy in all cases. Figure 3 visualizes the above conclusions, whose tails of fitted curves are partly biased since the number of observed subjects decreases due to the failure of observations and censoring. Particularly, for median, we supply an additional misspecified mixed effect model MLQR to make the comparison. The results show that our EAW estimator can effectively distinguish the incorrect model as well.

Performance of EAW-based predictors
To check the efficiency of (10), we design a series of simulations for the similar mixed effect model in Experiment 2 but with different distributions of u i : Case I (u 1,i , u 2,i ) ⊤ follows a bivariate normal distribution with the same covariance in Experiment 2; Case II u 1,i and u 2,i are independently distributed with the symmetric Laplace distribution with ( 1 , 2 ) = (1, 2).
Taking normal error as an example, the simulation is carried out in view of = 0.25, 0.5, 0.9. Other options such as candidate procedures, splitting strategies are set same as that in Experiment 2. To reflect the predicting accuracy, Table 4 reports the comparison of MSE between EAW-based BLPû ( ) and other separate specifications. It implies that the EAW predictor performs generally well in all probability levels, and is close to that of "correct estimating procedure" as n  increases. The scatter-curves of MSEs with different cluster sizes are plotted in Figure 4, which shows that the EAW-based BLP is pretty efficient for the part of significant difference among different specifications. It is relevant to point out that when the cluster size is small, large variations ofû ( ) 's MSE may be enacted and we recommend to increase the number of random splits for EAW to eliminate the volatility.

MADIT data analysis
To illustrate the practical superiority of the EAW estimation, we apply the methodology to analyze the MADIT data. The dataset has been studies by Deng 27 and Liu et al. 8 Note that most of the statistical modeling and estimating are based on the particular specification, and have not explicated the outperformance of the fitted model or estimating methodologies. We will demonstrate the rationality and the advantage of our method. The MADIT was designed to evaluate the effectiveness of an implantable cardiac defibrillator (ICD) in preventing sudden death in high-risk patients. In the collected database, a total of 181 patients from 36 centers in the United States were recruited to be fully sequential and randomly observed. Of which 89 patients were assigned to the ICD group and others were assigned to the conventional therapy group, and all of the observations were censored in large degree. Concretely, the data set consists of patient ID code, Treatment code (1 for ICD and 0 for conventional), observed survival time in days, death indicator (1 for death and 0 for censored), cost type, and daily cost from the start to the completion of the trial. Following types of cost were present in the experiment: Type 1 is for hospitalization and emergency department visits; Type 2 is for outpatient tests and procedures; Type 3 is for physician/specialist visits; Type 4 is for community services; Type 5 is for medical supplies, and Type 6 is for medications.
It is widely accepted that the medical costs are not affected by Treatment Types as they often appear in the survival part to influence the risk rate of a subject, thus we will not consider it in the outcome model. On the other hand, patients may have more than one cost types simultaneously even at the same time point, seven significant covariates x 1,i (t), … , x 7,i (t) are contained, where x r,i (t) = 1 if the observation of Type r (r = 1, … , 6) is 1 and x r,i (t) = 0 otherwise, and x 7,i (t) = f (t) is imposed to depict the affect of the treatment time. To contrast with Liu et al., 8 the same data preprocessing method is adopted to make the comparison, that is, compresses the data set by summing the daily cost in terms of each 12-day periods. We set up both longitudinal quantile regression (denoted as "Liu et al.") and additive mixed effect models (denoted as "AQMM") to fit the MADIT data: where f (t) is specified as either f (t) = t or f (t) = log(t) such that Q (1) (y i (t)|x i (t)) represents the conditional quantile in Liu et al. 8 In the second model u i = (u 1,i , u 2,i ) ⊤ are assumed as the zero-mean random effects with a symmetric positive-definite variance-covariance matrix. We implement EAW estimator among Q (1) (y i (t)|x i (t)) and Q (2) (y i (t)|x i (t), u i ) in (16) with 50 independent splits (the smoothing parameters h=1 and a(h) = 0.01). Table 5 lists the estimated 5-year and total treatment period cumulative quantile costs under = 0.25, 0.5 and 0.75, and explains similar features with those in Liu et al. 8 Note that the fitted residuals (marked by "Res" in Table 5) illustrate that additive quantile mixed effect model (AQMM) is more suitable as it contributes the between-subject variability by means of random effects, the change of cumulative costs is relatively flat among various quantile levels. On the other hand, EAW estimator based on either Liu et al. or AQMM could maintain the optimistic result among different probability levels, while candidate models performs differently. It is consistent with the conclusions of simulation studies.
In order to inspect the efficiency of different models we randomly split 120 of the 181 subjects from the data set to train the specified models, and use the remaining 61 subjects to the out-of-sample test. The prediction test prediction risk (PT-PR) and the prediction test MSE (PT-RMSE) of the estimated Q are proposed to test the goodness of fit for the models of MADIT data, where  (y i (t), x i (t), t) ∈ {Test Dataset},û 1,i ,û 2,i are the respective BLP of random effects for test data and tû 1,i +û 2,i = 0 for Liu et al. The split procedure is repeated by 50 independent times and the results are presented in Table 6, which visually shows that the fitting effect of different models is discrepant in different probability levels. For instance, two AQMMs report less PT-PR and PT-RMSE than those of Liu et al. in most cases, and the horizontal comparison shows that the mixed effect models with different f (t) have their own merits for different values of (e.g. the model with f (t) = t is better fitted as values far away from the median), while Liu et al. with f (t) = log(t) is more suitable than that with f (t) = t. The diversities are enlarged by importing the random effects. However, the EAW estimator overcomes the uncertainty, outperforms compared to a particular one in the case of large gap of different fitted models, the corresponding aggregated weights perform almost the same propensities as those of EAW results as well. Consequently, it can be considered as the "uniformly optimal fitting" among candidate specifications and different quantile levels for MADIT data.

Concluding remarks
This paper investigates the adaptive estimation for longitudinal quantile regression via EAW algorithm. Under timedependent covariates and right-censored history process, EAW estimator and IPCW method are combined to derive -CQF for mixed effect model. Based on the secondary smoothing approximation of the check loss function, the oracle inequalities are easily established. We show that the estimator of -CQF is consistent as long as one of the candidate converges to the correct specification. Further, the BLP of the random effects are modified by EAW. The applicability of the method has emerged in MADIT data analysis. It is worthy to point out that the secondary smoothing loss has some good theoretical properties: It supplies the differentiability at origin, guarantees the convexity and strong smoothness as well. On the other hand, the quadratic smooth pattern provides the analogous strong convexity as that of squared-type losses, which is essential for the risk bound under squared error loss, and simplifies the verification of the consistency of cumulative quantile function estimator. However, in practice, smoothing parameters should not be valued too large to affect the aggregated result, searching a convenient recommendation for h and a(h) among a real dataset is not an effortless work that remains to be further explored.
The idea of aggregation is not difficult to be extended into fixed effect models: when the individual effect is specified, candidates of q (x(t)) are estimated by the corresponding methods (Kato et al. 30 and Koenker,9 among others) and EAW estimator is weighted from eachq (k) (x(t)). Due to the layout we did not discuss this in detail, which will be potential in the sense of complex censoring mechanisms as well. Through the paper we use a nonparameter estimation for censoring and the methods in Galvao et al., 6 Harding and Lamarche 7 are both under a specific procedure of the mechanism. One can naturally generalize it as a multiply robust case when both the conditional quantile and the response probability tend to be misspecified. Besides, the aggregated estimator of (4) depends on the linear structure of random effects and we wonder how sensitive is the performance to varying degrees of misspecification. These will become valuable study in our further research. ( )} we can obtain that with probability one, where i (t ij ) = L ′ ,ij . Note that by taking the expectation (denoted by E i ) for both sides of above inequality with respect to individuals y i conditional on x i ∪ (x k , y k ) i−1 k=1 , we have for p ≥ 1,  34 for c = (2e) −1 and C = 2e 2 , we have and for |t| ≤ c∕‖ C i ‖ SEXP , On the other hand, the similar discussion of Lemma 2 of Gu and Zou 22 can be used to prove for 2|t| ≤ c∕‖ C i ‖ SEXP , Since above inequality holds as 2 C ≤ c∕(2TK ), denote that M = exp(8 2 C 2 CTK + 2 C TK ) and H 1 = max{ 2 , (1 − ) 2 } ⋅ hT, the inequality (A.2) further can be written as ) + H 1 when c −1 2 exp(2 cTC 2 ){ √ 2(4TK ) 2 M + 16c 2 TC 2 M } ≤ . Therefore, In addition we have The above inequality is available for any bth split in Algorithm 1, and by the convexity of the check loss, ,b (x i (t ij )) ) hence (11) in Theorem 1 holds by a simple algebraic arrangement. To prove (12), we use the first inequality of Lemma 1 to obtain Since the secondary approximated loss function has the same minimum point with that of check loss in ℝ, E[∫ * ′ ,h (y(t) − q (x(t)))dN(t)|x] = 0. Therefore, and for k ∈ Δ, the analogous inference leads that