Adaptive group sequential survival comparisons based on log-rank and pointwise test statistics

Whereas the theory of confirmatory adaptive designs is well understood for uncensored data, implementation of adaptive designs in the context of survival trials remains challenging. Commonly used adaptive survival tests are based on the independent increments structure of the log-rank statistic. This implies some relevant limitations: On the one hand, essentially only the interim log-rank statistic may be used for design modifications (such as data-dependent sample size recalculation). Furthermore, the treatment arm allocation ratio in these classical methods is assumed to be constant throughout the trial period. Here, we propose an extension of the independent increments approach to adaptive survival tests that addresses some of these limitations. We present a confirmatory adaptive two-sample log-rank test that allows rejection regions and sample size recalculation rules to be based not only on the interim log-rank statistic, but also on point-wise survival rate estimates, simultaneously. In addition, the possibility is opened to adapt the treatment arm allocation ratio after each interim analysis in a data-dependent way. The ability to include point-wise survival rate estimators in the rejection region of a test for comparing survival curves might be attractive, e.g., for seamless phase II/III designs. Data-dependent adaptation of the allocation ratio could be helpful in multi-arm trials in order to successively steer recruitment into the study arms with the greatest chances of success. The methodology is motivated by the LOGGIC Europe Trial from pediatric oncology. Distributional properties are derived using martingale techniques in the large sample limit. Small sample properties are studied by simulation.


Introduction
The log-rank test 1 is presently the gold standard method for analysing differences in survival data in randomised clinical trials. For this reason adaptive survival tests are commonly based upon the log-rank test statistic and its independent increments structure. 2,3 However, these designs suffer from some limitations we want to address. One limitation is that effectively only the interim log-rank statistic may be used for design modifications (such as data-dependent sample size recalculation). 4 Moreover, the treatment arm allocation ratio in these classical methods is assumed to be constant throughout the whole trial period. However, in the context of seamless phase II/III designs or early phase trials it may be desirable to include point-wise survival rates (e.g. 1 year survival rates) in the decision making, since survival rates at a given time-point of interest are chosen as a primary endpoint regularly in such trials. Likewise, data-dependent adaptations of the treatment arm allocation ratio could be helpful in multi-arm trials in order to successively steer recruitment into the study arms with the greatest chances of success. Therefore we propose an extension of the independent increments approach to adaptive survival tests, which can rely on both: (i) the pointwise Neelson Aalen type survival rates estimator and (ii) the log rank test statistic. More specifically our approach extends the commonly used methodology by Wassmer, 3 which neither supports the use of point-wise survival rate estimates nor foresees data-dependent adaptations of the treatment arm allocation ratio. In doing so, our approach avoids those difficulties associated with alternative methods based on the patient-wise separation principle, which have the common disadvantage that the test procedure may either neglect part of the observed survival data or tend to be conservative. We will show by simulation that our extended methodology maintains the performance of the current standard methodology while offering various new design possibilities.
The methodology presented here is motivated by the LOGGIC Europe trial (Eudra-CT: 2018-000636-10). LOGGIC Europe is a randomized, international multicentre phase III therapy optimization trial for children and adolescents with low-grade glioma. Primary endpoints of the trial are the progression-free survival (PFS) and the disease control rate (DCR). PFS addresses long-term efficacy of treatment and is defined as time from randomization up to progression of disease or death for all reasons whatever occurs first. DCR addresses short-term efficacy of treatment redand is essentially defined as the PFS-rate at some early timepoint.
The paper is organized as follows. We start by settling notation and stochastic assumptions. Section 'Joint martingale representation of the log-rank statistic and cumulative hazard difference' presents briefly the bivariate representation of the two test statistics and its distributional properties. The design algorithm and corresponding planning methodology is presented in section 'Adaptive log-rank test with simultaneous use of interim log-rank statistic and cumulative hazard rate difference'. In section 'Example: A two-step log-rank test with futility criterion based on short-term survival rate' we present some example use-case in order to illustrate practical implementation of our method. Small sample properties are studied by simulation in section 'Simulation'. We conclude with a discussion of our findings and prospects for future research. Mathematical proofs are shifted to the supplemental material.

Notation and stochastic assumptions
Let (Ω, F , P) denote the probability space upon which all random variables are defined. Unless otherwise specified, random variables are denoted by capital Latin letters, whereas realizations of random variables are denoted by the corresponding lower case Latin letters. We set 0/0 : = 0 whenever formal division of zero by zero occurs in sequel.
We consider the problem of testing the equality of survival distributions for two treatments A and B, say, based on accumulating survival data across several stages of a sequential design. After each stage a confirmatory (interim) analysis is performed with the possibility for interim decisions (e.g. binding futility stop or sample size recalculation) based on (i) the observed interim log-rank statistic and (ii) interim estimates of s 0 -years survival rate differences for some prefixed time-point s 0 > 0.
In this context we will assume an initial trial design with l stages. The stages will recruit patients successively, i.e. patients from stage k are recruited between calendar times k−1 i=1 a i and k i=1 a i where a i > 0 are the recruitment period lengths of the stages. We set a : = l i=1 a i as the overall recruitment period length. The final analysis will be performed at calender time a + f . Patients from stage k will therefore have at least a follow-up period length of f k = l i=k+1 a i + f . An example timeline for l = 2 stages is given in Figure 1. The planned annual recruitment rate is denoted with r.
For this purpose, let N x,k denote the set of patients from treatment group x = A, B, who entered the trial at stage k (i.e. between calendar time k−1 i=1 a i and k i=1 a i ), and let n x,k : = #N x,k denote the number of such patients. Let N k : = x N x,k denote the set of all patients from stage k pooled over both treatment groups, and N : = x,k N x,k the overall set of trial patients. Let n k : = x n x,k and n : = x,k n x,k . The parameter n will index the arrival process and asymptotic results will be derived in the limit n ∞. Accordingly, we assume that group sizes grow uniformly as total sample size increases, i.e. we assume there exist constants v k > 0 such that #N A,k /n k 1 1+v k and #N B,k /n v k 1+v k in probability as n ∞. Thus the constants v k are the asymptotic, stagewise allocation ratio between the treatment groups. We furthermore assume that the stages also grow uniformly as total sample size increases, i.e. #N k /n = n k /n a k /a in probability as n ∞. To patient i is associated a random triplet {E i , C i , T i }. E i is the entry time into the study, the possibly infinite random variable C i is the time of censoring after entry, and T i is the survival time after entry. Our stochastic assumptions are as follows: (1) T i , C i and E i are mutually independent for fixed i, and (2) data from different patients are independent and identical distributed within treatment groups.
Based on the observed data, we will calculate the number of events in stage k from treatment group x = A, B up to study time s ≥ 0 as and the number at risk by study time s ≥ 0 in stage k and treatment group x = A, B as Finally, let J x,k (s) : = I(Y x,k (s) > 0) and L k (s) the log-rank weight factor For each s ≥ 0, let F s be the σ-algebra generated by for i ∈ N . We consider D x,k , Y x,k , J x,k , L k as stochastic process in study time s ≥ 0, adapted to the filtration (F s ) s≥0 . The filtration (F s ) s≥0 comprises the information that is observed in the study. Whenever we want to emphasize the dependence of above processes on n ∈ N, we will index them additionally by n e.g. D n x,k instead of D x,k . As usual, we let λ x (s) : = lim δ 0 P(s ≤ T i < s + δ|T i ≥ s)/δ denote the hazard of a patient i from treatment group x = A, B. We denote by Λ x (s) : = s 0 λ x (u)du and S x (s) : = exp ( − Λ x (s)) ≡ P(T i > s) the corresponding cumulative hazard and survival functions for treatment group x = A, B, respectively.
In this context, we consider testing the two-sided null hypothesis that the survival functions in the two treatment arms coincide within some prefixed interval [0, s max ]. We proceed as follows to test H 0 . Using martingale techniques, we will first derive the joint distribution of (i) the stage-wise log-rank test statistics and (ii) the stage-wise difference in the Nelsen-Aalen estimates between the two treatment arms evaluated at some prefixed study time s 0 . On this basis, we provide a confirmatory adaptive twosample log-rank test where provision is made for interim decision making and design modifications based on both (i) the interim log-rank statistic and (ii) interim estimates of the cumulative hazard rate differences at timepoint s 0 . With a view to practical application, sample size recalculation is one of the most common design modifications. Therefore, sample size recalculation based on conditional power will be elaborated and studied in detail, analytically and by simulation.

Joint martingale representation of the log-rank statistic and cumulative hazard difference
The weighted two-sample log-rank statistic in stage k is defined as which are both F s -adapted processes. It follows from theorem A2, that under mild regularity assumptions and the proportional hazards assumption λ B = ωλ A for some ω > 0, the following distributional approximation holds: where σ 2 LR,k (s) : = plim n k ∞ D k (s)/n k · v k (1−v k ) 2 and σ 2 Δ,k (s) are some deterministic functions (see equations (14) and (13) below) and The left hand side of (8) has also approximately independent, bivariate normal distributed increments as stated in theorem A2. For given ω > 0 we set In practice the time-dependant correlation parameter on the right hand side of (8) is unknown. However, for a fixed time point s 0 > 0 it can be consistently estimated at time of the interim analysis (see (24)). Under further planing assumptions it is possible to deduce closed formulas for the functions σ LR,k and σ Δ,k . Assuming (in addition to above mentioned mild regularity conditions of theorem A2): • No loss to follow-up: • Uniform recruitment: the following two equations hold (see appendix for proofs): Adaptive log-rank test with simultaneous use of interim log-rank statistic and cumulative hazard rate difference The design algorithm For the sake of notational simplicity we will focus on two-step designs in the sequel (i.e. l = 2). The two-step adaptive design will proceed as follows: Assume an initial design with accrual of patients between calender time 0 and a years, and a final analysis at calender time a + f (corresponding to minimum follow-up period of f years). We assume that the value of f is prefixed by clinical consideration. Choice of a will be detailed in section 'Initial sample size calculation' based on power arguments. Patients recruited prior to calender time a > a 1 > 0 define the set of first stage patients N 1 , and patients recruited between calendar time a 1 and a : = a 1 + a 2 define the set of second stage patients N 2 . The interim analysis will take place at time a 1 + s 1 for some 0 < s 1 < a 2 and will include the patients of stage one with their first s 1 years of follow-up. At the interim analysis the log-rank statistic in stage 1 patients based on information up to study time s 1 and the standardized cumulative hazard rate difference at some prefixed (early) study time 0 < s 0 ≤ s 1 will be calculated. B 1 is an interim estimate of the difference in short-term response. More specifically, Δ 1 (s 0 ) is an interim estimate of log (S B (s 0 )/S A (s 0 )). The design algorithm is as follows: The design stops at the interim analysis with rejection of H 0 if the observed value z 1 for Z * 1 exceeds some critical value u 1 . The design stops for futility if either z 1 falls below some futility bound u 0 or if the observed value b 1 for B 1 drops below some prefixed boundary b 0 . Otherwise, if u 0 ≤ z 1 < u 1 and B 1 > b 0 , the design continues to stage two. At this time, the recruitment period length of stage two a 2 can be data-dependent recalculated. The recalculated recruitment period length a ′ 2 : = a ′ 2 (Z * 1 , B 1 ) of stage two is chosen in dependence of the observed values for Z * 1 and B 1 subject to the constraint s 1 < a ′ 2 ≤ a max − a 1 . Here, a max > a 1 denotes a maximum trial recruitment period length that is fixed in advance in order to avoid unrealistic or unfeasible trial duration. We set a ′ : = a 1 + a ′ 2 and f ′ 1 : = a ′ 2 + f . The final analysis will take place at calendar time a ′ + f and will include both, patients of stage one N 1 with their full follow-up data of at least f ′ 1 years and the set of second stage patients N 2 with their follow-up time of at least f years. At the time of the final analysis, the increment of the log-rank statistic in stage one patients beyond study time s 1 will be calculated as well as the log-rank statistic of stage two patients Notice that Z 12 and Z 22 are conditionally independent given Z * 1 and B 1 . The null hypothesis H 0 will be rejected at the final analysis if the second stage test statistic exceeds some critical value u 2 , where the prefixed weight factors amount to the expected variance of the log-rank statistics under some initial planning alternative K 1 (see section 'Calculation of the critical bounds'). Their values are given in (13) and (14). The weights η ij have to be fixed in advance and remain unchanged while the trial is ongoing.

The rejection region
The design algorithm described in section 4.1 corresponds to the rejection region of the null hypothesis H 0 . It is crucial that the design parameters b 0 , η 11 , η 12 , η 22 and 0 < s 0 ≤ s 1 < f as well as the critical bounds u 0 , u 1 are prefixed and remain unchanged during the trial. Note that the critical bound u 2 will be calculated at the interim analysis according to formula 22 when the correlation ρ of Z 1 and B 1 can be estimated to obtain a rejection region which exhausts the full significance level. The calculation of the critical bounds b 0 , u 0 , u 1 , u 2 is elaborated next.

Calculation of the critical bounds
The rejection region R defines a level α test of the null hypothesis H 0 if the critical bounds b 0 , u 0 , u 1 , u 2 are chosen according to the proviso that P H 0 (R) = α, i.e. such that Notice that the critical bounds depend on the nuisance parameter which is in fact unknown during the trial if one does not know the true hazard function λ A . However, it may be estimated consistently at time of the interim analysis via Nevertheless there are infinite parameter combinations of the critical bounds which satisfy equation (22). It is therefore crucial, that one parameter constellation (u 0 , b 0 , u 1 ) is chosen in advance and remains unchanged during the trial. The critical bound u 2 will then be calculated at the interim analysis as the unique solution to (22) with ρ plugged in for ρ.

Initial sample size calculation
Initial sample size calculation is performed under the planning alternative hypothesis and under the assumption, that no sample size recalculation is performed i.e. a ′ 2 = a 2 . For the initial sample size calculation we need to fix the proportion π : = a 1 /a ∈ (0, 1) of accrual to stage 1. Note that the weights η ij are fixed in advance and must not be changed while the trial is ongoing. In fact they have to be calculated simultaneously with the sample size. For given weight factors η ij , the condition to reject null hypothesis H 0 with probability 1 − β under planning alternative K 1 is P K 1 (R) = 1 − β. Using the distribution approximation (8) this proviso is tantamount to Notice that B 1 and Z * 2 are independent given Z * 1 . Thus the right hand side of (26) equals Using again the distribution approximation (8) we get the identities and Using the identities (28) and (29), formulas (13) and (14) for σ LR,k (s) and σ Δ,k (s) and the identity n k = r · a k in equation (26), one can solve (26) and (22) numerically to obtain the critical bound u 2 and the needed recruitment period lengths a 1 and a 2 . We provide R syntax in the supplemental material to do so. At the interim analysis, a 2 and thus n 2 may be modified in a data-dependent way to keep up adequate power performance of the trial, as will be detailed in the next section.

Data-dependent sample size recalculation at the interim analysis based on conditional power
At the interim analysis, we are free to revise the length of the stage two accrual period a 2 in the light of Z * 1 (interim log-rank statistic) and B 1 (observed difference in short-term response) without compromising type I error rate control. This is a consequence of the independent increments structure of the bivariate process given by the left hand side of (8). For this purpose, we will first calculate the required length of the accrual period a CP 2 to achieve a desired conditional power. To avoid unrealistic large trial duration, the revised length of the accrual period will finally be chosen as Recall that a 1 + s 1 is the calendar time of the interim analysis and a max is a prefixed maximum trial recruitment period length. Likewise, we are free to revise the allocation ratio between treatment groups in the light of Z * 1 and B 1 . Let v ′ 2 denote the revised allocation ratio of stage two patients to treatment group B as referred to treatment group A. Furthermore we may use an updated recruitment rate r ′ to adjust for new experience.
To calculate a CP 2 , we estimate the true hazard ratio ω via Notice that LR 1 (s 1 ) and N 1 (s 1 ) are observed at the interim analysis. We can also estimate σ LR,1 (s 1 ) consistently at the interim analysis through the estimator σ 2 LR,1 (s 1 ) : = [M LR 1 ](s 1 ). Sample size recalculation will be performed under the revised planning alternative suggested by the observed interim estimate ω of the true hazard ratio. The condition to achieve a conditional power where μ k = − n k √ · log ( ω) is the estimated drift. Plugging in the identities n 2 = a CP 2 · r ′ and the formulas for σ LR,2 (a CP 2 + f ) and σ LR,1 (a 1 + a CP 2 + f ) given by (14) with updated values v 2 → v ′ 2 and r → r ′ , we can solve above equation (33) to obtain a CP 2 . Note that the equation can not be solved if ω ≥ 1 holds. In this case we define a CP 2 : = ∞. The revised length of accrual a ′ 2 is finally chosen according to (30). We will provide R syntax to do so in the supplementary material.

Example: A two-step log-rank test with futility criterion based on short-term survival rate
In this section, we illustrate application of our methodology using the example of a two-step log-rank test with binding futility criterion based on a short-term survival rates and sample size recalculation based on conditional power. Recall that the underlying null hypothesis is H 0 : S A (s) = S B (s) for all 0 ≤ s ≤ s max for some prefixed s max > 0. The underlying physical units of s will be "years".
In general, our two-step test of H 0 depends on a set of design parameters that have to be fixed in advance: (a) parameters b 0 , u 0 , u 1 , u 2 defining the rejection region acc. to (22), (b) parameters s 0 and s 1 steering the amount of follow-up included into interim decision making, (c) parameters a 1 , a 2 , f defining the initially planned lengths of stage one accrual, stage two accrual, and follow-up period (d) parameters r, v 1 , v 2 defining the initial accrual rate, and stage-wise treatment arm allocation ratios (e) weights η 11 , η 12 and η 22 of the stage-wise log-rank increments acc. to (20) and (14). More specifically, let us assume that we aim for a two-step, Pocock-type log-rank test of H 0 with binding stopping for futility if the observed 6 months survival rate in the experimental arm is worse than in the standard arm. This futility condition is realized by choosing b 0 = 0, u 0 = −∞, and s 0 = 0.5. The Pocock condition means choosing u 1 = u 2 . 5 Note that an uncountable number of alternative functional relationships between u 1 and u 2 could have been chosen. The difference s 1 -s 0 is the interval between the time when the short-term endpoint B 1 becomes known and the date of the interim analysis. For practical reasons, s 1 − s 0 ≥ 0 should not be chosen too large. On the other hand, s 1 should be sufficiently large such that the interim log-rank statistic Z 1 is informative. In our exemplary setting, we consider s 1 = 1 as sensible choice. The parameters f and r are determined by the clinical frame conditions. Let us assume a desired follow-up period of f = 2 years, and an annual overall accrual rate of r = 75. Also assume that we aim for equal randomization to both arm (i.e. v 1 = v 2 = 1) as well as an interim analysis after half of the planned overall accrual period, i.e. π : = a 1 /(a 1 + a 2 ) = 0.5. Finally, assume that we set a significance level of 5%, that we aim for a power of 80% if the true hazard ratio ω 0 equals 2/3 (planning alternative hypothesis), and that there are exponentially distributed survival times with scale parameter of λ = 1 to a good approximation in the standard therapy arm.
With these specifications, the parameters u 1 and a 1 remain as the only unknown from the parameters listed under a)-d). Whereas the weight η 11 is also fixed by above specifications, the weights η 12 and η 22 remain as functions in a 1 acc. to equation (20), since s 1 = 1, a 1 + f 1 = 2 · a 1 + f , f 1 = a 1 + f . We are now in a position to determine the rejection region (see Section 'The rejection region') and to perform the initial sample-size calculation (see Section 'Calculation of the critical bounds'). Using b 0 = 0, u 0 = −∞, u 1 = u 2 and ρ acc to (8) the equations (22) and (26) may be solved simultaneously for the two remaining free parameters u 1 and a 1 . Doing so, yields a stage-one recruitment period length of a 1 = 1.7 years (corresponding to n 1 = r · a 1 = 125 patients), together with a stage-one critical boundary u 1 = 2.18. On this basis, the weights may be calculated as η 11 = 0.158, η 12 = 0.247, η 22 = 0.233 using (20) and (14). To ensure that the rejection region does not depend on our initial planning assumptions regarding ρ, the value of the critical bound u 2 will be updated and ultimately fixed as described below at the time of the interim analysis, when an estimate of ρ becomes available. After 1.7 + 0.5 = 2.3 years, instead of B 1 can be evaluated. Assume that we find a value of instead of B 1 = 1.08 > 0 = b 0 , which concludes that the trial can continue (no stopping for futility). After 1.7 + 1.0 = 2.7 years, the interim log-rank statistic Z 11 becomes known and the interim analysis has to be performed. Let us assume that a test statistic of Z 11 = 1.34 < 2.18 = u 1 is observed as well as an estimated hazard ratio of ω = 0.731. In this case the trial continuous to stage two and the sample-size can be adapted in the light of this new information.
In a first step we now estimate the covariance parameter ρ according to (24) in the light of the interim data. Assume that we find an estimated value of ρ = 0.733. With this estimate we calculate the final value of the stage-two critical boundary u 2 by solving (25) with our estimate plugged in as ρ, and all remaining parameters as specified as above. Doing so yields in the value u 2 = 2.17 and ensures that the rejection region does not depend on our initial planing assumptions regarding ρ.
Having determined the final rejection region, let us now recalculate the sample-size such that a conditional power of 1 − β 2 = 0.8 is achieved, say, under the constraint that the overall accrual period is at least a 1 + s1 = 2.7 years, but must not exceed a max = 5 years. Notice that it is principally possible to adapt the recruitment rate r or the allocation ratio v 2 depending on B 1 or Z 11 at time of the interim analysis. For simplicity, we here assume that neither accrual rate nor allocation ratio shall be adapted, i.e. we choose r ′ = r and v ′ 2 = v 2 . In order to carry out sample size recalculation according to these specification, we first calculate the required length a CP 2 of the second stage accrual period to realize the desired conditional power of 1 − β 2 = 0.8. This can be done by solving equation (33) for the only remaining indeterminate a CP 2 , which in our case yields a CP 2 = 3.0. To implement the constraint on the minimum and maximum length of accrual, the revised length a ′ 2 of the second stage accrual period is finally chosen according to (30). With a max = 5, a 1 = 1.7, s 1 = 1 equation (30) yields a ′ 2 = a CP 2 = 3.0, corresponding to n 2 = 226 patients in stage two. Finally after 1.7 + 3.0 + 2.0 = 6.7 years after start of the trial, the final analysis is due. At this time the test statistics Z 12 and Z 22 become known. Assuming that Z 12 = 1.67 and Z 22 = 3.14 are observed, we finally obtain the final test statistic Z * 2 according to (19) which concludes a successful trial with rejection of H 0 after stage two. We will present an example design for a seamless phase II/III trial in detail in the supplemental material.

Design of the main scenario
We consider testing the hypothesis formulated in equation (5) H 0 : S A (s) = S B (s) for all 0 ≤ s ≤ s max using the two-step adaptive design presented in section 4.1.
In the context of the LOGGIC Europe trial, it was of interest to show a positive effect on the short term PFS-rate at an interim analysis to obtain the preliminary conditional marketing authorisation. Only with this conditional marketing authorization it was desired to continue recruitment of patients and to additionally test the effect on the long-term progression free survival.
More specifically, a design with rejection region of the form Notice that we set the critical boundaries b 0 = 0 and u 0 = −∞. We set s 0 = s 1 = 1.5, f = 2 and π = 0.5. This corresponds to a two-step log-rank test with binding futility criterion based on the 18-months response rate.
The following frame conditions were chosen as the main scenario for this simulation study: Patients are allocated equally to both treatment arms (allocation ratio v 1 = v 2 = 1). Survival times are Weibull distributed with scale parameter of m = 1/log(2) and shape parameter k = 1, which corresponds to a scaled exponential distribution with median survival of 1 year. To study the performance of our algorithm we ran also simulations with shape parameters k = 0.5 and k = 2. Planing was done under the planing alternative S B (s) = S A (s) ω 1 , where ω 1 = 2/3. We also ran simulations with ω 1 = 4/5. We let the true hazard ratio ω range between 0.5 and 1 in steps of 1/15. The one sided type 1 error rate was set to α = 0.025 and the desired power was set to 1 − β = 0.8. We set the conditional power parameter β 2 such that it satisfies the equation This choice tries to stabilize the power of the whole trial despite the adaptation. The recruitment rate was set to r = 60. The maximal trial duration a max was set as PF = 1.5 times the duration of a corresponding single-step two-sample log-rank test. 1 In some of our scenarios ( Figure 2) we let the parameter PF vary in the set [1.25, 1.75] as a fine-tuning parameter.
No loss to follow-up was assumed as well as block-randomization and uniform recruitment assumptions as required by theorem A2.
For each simulation the required recruitment period lengths of stage one a 1 and stage two a 2 were calculated according to section 'Initial sample size calculation'. In our simulations we additionally distinguished between (i) a Pocock-type design with u 1 = u 2 and (ii) a design without early stopping where u 1 : = ∞. Note that the critical bounds b 0 and u 1 have to be fixed in advance, whereas the value for u 2 is calculated according to equation (22) at the interim analysis, when the estimator ρ for ρ becomes available. Thus the theoretical equality "u 1 = u 2 " in the Pocock setting is effectively only realized approximately.
With above values for r, a 1 , a 2 and f , the weights η 11 , η 12 , η 22 were calculated according to equations (20). Then r · a 1 patients were simulated as first stage patients, with preliminary censoring at study time s 1 , which represents the data we are allowed to use at the interim analysis. Based on this simulated data the interim statistics Z * 1 , B 1 , N 1 , ρ and ω were calculated.
The test statistics Z * 1 and B 1 were then compared to the prefixed critical bounds b 0 and u 1 to determine whether early successful stopping or stopping for futility has occurred.
In the case of an ongoing trial, i.e. B 1 > b 0 and Z * 1 < u 1 , the critical bound u 2 is obtained by solving equation (22) with the estimator ρ plugged in. Additionally, the required recruitment period length of stage two a CP 2 was calculated such that a conditional power of 1 − β 2 is achieved under the revised planing alternative hypothesis K ′ 1 : S B = S ω A corresponding to the observed hazard ratio ω. The actual recruitment period length of stage two patients a ′ 2 was then updated as stated in (30), to stipulate the boundary conditions. We then proceeded (i) to simulate a ′ 2 · r patients of stage two and (ii) to update the censoring date of stage one patients to calendar time a ′ + f .
Finally the test-statistic Z * 2 was calculated according to (19) and compared to the critical bound u 2 derived at the interim analysis to obtain the final test decision.
The above presented simulation algorithm was run 10,000 times for each scenario.

Results
The simulation results are presented briefly in Table 1. Reassuringly the designs hold the aimed significance level of 2.5%, even in the small sample size case. Note that with 10,000 simulations per scenario the estimated accuracy of our type one error rate estimator given through 95%-confidence intervals is ±0.31%. Accordingly in no scenario the empirical type I error rate exceeded the aimed significance level of 2.5% in a statistically noticeable way. The empirical power however shows a little more variation. This is due to the fact, that the initial sample size calculation does not factor in the randomness introduced by ω, which effects the sample size recalculation based on conditional power. This is a well-known effect of such adaptive designs.
Main simulation scenario. The strength of adaptive designs is undoubtedly the possibility for correction when the initial planning assumptions seem to be wrong. When the treatment effect is small one can stop for futility or increase the sample size to hold on the desired power. Conversely when the treatment effect is larger than expected one can decrease the sample size while still holding the desired power. We simulated our main scenario (k = 1, ω 1 = 2/3, r = 60, u 1 = ∞) with some variations. We used the parameter PF ∈ [1.25, 1.75] as a fine-tuning parameter to level out the variation introduced by ω and to match the aimed 80% power quite exactly. The choice of these fine-tuning parameter is presented in the table within Figure 2.
We compare our test algorithm with a standard adaptive design based on the standard methodology by Wassmer. 3 To assure comparability we implemented a futility stop, when the short term log-rank test Z 1 based on the first half of patients shows a negative result. More specifically in the non Pocock designs, we compared our design to an adaptive design with rejection region where Γ 1 is chosen such that P H 0 (R simple,1 ) = α. In the Pocock scenario we compared our design to a design with rejection region where Γ 2 is chosen such that P H 0 (R simple,2 ) = α. These are rejection regions, which can be used within the methodology of Wassmer and are included in our methodology. We set the required sample size such that the standard design also holds the desired power of 1 − β = 80% under the planing hypothesis K 1 .
The operating characteristic of our test algorithm in the main simulation scenario (k = 1, ω 1 = 2/3, u 1 = ∞) is presented in Figure 2 together with some variations of the scenario.
Across all scenario variations, the power and sample-size performance of our test statistic fits the performance of the standard methodology quite well.
In the main scenario the mean sample-size difference between the standard methodology and our new methodology is 0.74% at maximum. Under the planing hypothesis the maximal increase of the mean sample-size across all scenario variations was 0.5%, while in some cases the new design reduced the mean sample-size about 1.0%.
This suggests the use of easily interpretable survival rate differences as an interesting option for interim decision making in survival trials.
By using various Weibull shape parameters, planning hypothesis and design types we assured that the performance consistency is not dependant on our specific scenario assumptions.

Discussion
The confirmatory adaptive two-step log-rank test proposed here extends the one proposed by Wassmer. 3 Whereas the test proposed by Wassmer essentially only allows the use of the interim log-rank statistic for data-dependent design modifications, our approach allows simultaneous use of the interim log-rank statistic and observed differences in cumulative hazard rates at time s 0 for interim decision making, while avoiding those problems arising with methods based on patient wise separation. [6][7][8] Next to an adaptation of sample size, our approach also allows modification of the allocation ratio between the treatment arms or the recruitment rate, which neither has been described by Table 1. Empirical type I error rate and power in the simulation scenarios. Empirical type I error rate (TOE) was obtained from simulations where the true hazard ratio ω = 1. Empirical power was obtained by simulations where the true hazard ratio equals the planing hazard ratio (ω = ω 1 ). For further simulation details see section 6. Wassmer 3 nor Jenkins. 6 This is of importance when thinking about application of our methodology in a multiarm, multistage setting. Even though the focus of this paper was on a trial design with two treatment arms and two analyses, the generalization to more than two arms and more than two analyses is straightforward using the methodology described by Hommel et al. 9 Our adaptive two-step log-rank test exploits the independent increments structure of the limiting Gaussian process of the joint bivariate process defined by the log-rank statistic and the Nelson-Aalen difference at some time s 0 . Therefore, we emphasize that the full use of arbitrary interim data for design modifications is still not admissible here. 4 However, our approach makes provision for the simultaneous use of (i) the interim log-rank statistic and (ii) differences in cumulative hazard rates at an arbitrary time s 0 .
The calculation of rejection regions and sample size formulas were based on distributional approximation of the bivariate test statistic in the large sample limit. Our methodology used mild regularity assumptions as well as the proportional hazards assumption. It is well known, that the log-rank test is less efficient and its distribution depends on the distribution of censoring times, when the proportional hazards assumption is violated. 10 This is likely to be inherited by our method. The small sample properties were studied by simulations. The validity of the proposed design does not depend on specific model assumptions underlying these simulations such as exponentially distributed survival times. In view of the flexibility offered by our approach, however, applicants are recommended to assess different choices of design parameters in order to identify those parameter constellations with best operating characteristics as compared to a standard single-step twosample log-rank test. For this purpose, we provide an R program in the supplemental material that enables easy assessment of operating characteristics and thus optimal calibration of the design parameters in a specific trial setting. The R program also underlies our simulation.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental material
Supplemental material for this article is available online.

Appendix A
We will now deduce the distributional approximation presented in (8). The proofs presented here are formulated for a single-step design. However, the extension to a multi-step design is straight forward using the independent increments structure. We therefore drop the stage indices k for notational simplicity. It is well known that for a patient i from treatment group x = A, B, is an F s -martingale. 11 In particular, with M x (s) : = i∈N x M i (s) and for any F s -adapted left-continuous process H(s), is an F s -martingale with optional and predictable covariation process 12 We aim for the joint distribution of the weighted two-sample log-rank statistic, which has the integral representation and the difference of the group-wise Nelsen-Aalen estimates as F s -adapted processes, i.e. we aim for the distribution of the bivariate process Ψ(s) : = n −1/2 LR(s) Introducing the bivariate drift process Since 〈M x 〉(s) = Above equations are easily checked (see Aalen et al. 13 Sec. 2.2.5). On this basis we may deduce the distributional properties of the bivariate process Ψ(s) = M(s) + μ(s) in the large sample limit, as stated in the following theorems. The proofs of the theorems A1, A2 and equations (13) and (14) are presented after some additional results, which we need.
Theorem A1 Fix s max > 0 and assume that the hazard functions λ A and λ B are bounded on the interval [0, s max ] and P(C i > s max ) > 0. Assume furthermore that the treatment group allocation is done by block randomisation i.e. there exists a constant BL ∈ N, that for all n ∈ N it holds In particular, the following distributional approximation holds: Theorem A2 Assume the conditions from theorem 1. Then, under the contiguous alternatives Λ B (s) = ω n Λ A (s) with ω n : = e −n −1/2 γ for γ ≥ 0, we have for some continuous variance functions σ 2 LR (s) and σ 2 Δ (s). More explicit we have σ LR (s) = π(s) √ v √ 1+v , where π(s) : = plim n ∞ N (s)/n. In particular the limit π(s) exists.
Since Ψ(s) = M(s) + μ(s), we conclude that the processes in (38) below have independent increments with the following distributional large sample approximation: Using the independence of increments we get for 0 < s 0 ≤ s 1 ≤ s max which is the distributional approximation we stated in (8).
To prove our theorems we need some additional results.
Proposition A3 Let (X n (s)) n∈N be a sequence of stochastic processes, f a borel measurable function and s max > 0 with which also satisfy one of the following conditions. Proof. We will show that conditions (1), (2)  Therefore the preconditions of Hellands proposition are satisfied. Assume now that conditions (1),(2) and (b) are satisfied. Without loss of generality we can assume, that |X n (s)| is monotone increasing in s. Otherwise we transition to |X n (s max − s)|. For arbitrary δ > 0 we choose some arbitrary but fixed ϵ > 0 and define k δ (s) : = |{h(s max ) + ϵ} · g(s)|. It holds P |X n (s)| ≤ k δ (s)for all s ∈ [0, s max ] =P |X n (s)| ≤ |{h(s max ) + ϵ} · g(s)|for all s ∈ [0, s max ] ≥P |X n (s) · g(s)| ≤ |{h(s max ) + ϵ} · g(s)|for all s ∈ [0, s max ] In the last equality we used, that |X n (s)| is monotone increasing in s. The convergence holds because of the convergence of |X n (s)| in probability. Above inequality yields in the preconditions of Gills proposition. □ Lemma A4 (Simple weak law of large numbers)Let (μ n ) n∈N be a sequence in [0, 1] with lim n ∞ μ n = μ ∈ [0, 1] and let (X (n) i ) i=1,...,n be a sequence of independent Ber(μ n ) distributed random variables. Then it holds Proof. We will proof the convergence in probability by showing, that the convergence holds in L 2 . Define Lemma A5 (Weak law of large numbers) Let (# n ) n∈N be an infinite sequence of N -valued, monotone in n ∈ N increasing random variables, which satisfy the following conditions Let furthermore (μ n ) n∈N be a sequence in [0, 1] with lim n ∞ μ n = μ ∈ [0, 1] and (X (n) i ) i=1,...,n a sequence of independent Ber(μ n ) distributed random variables. Then it holds Proof. We will show the convergence in L 2 . Define Z n : = # n i=1 X (n) i and S n : = ⌊cn⌋ i=1 X (n) i . Then it holds The first summand satisfies the inequality For the second summand it holds For the convergence we used, that the second factor is bounded. The last summand vanishes analogously to the proof of prior lemma in the limit. Therefore the whole sum converges to 0. Multiplying both sides with c concludes the assertion. □ Proof of theorem A1. We want to show, that plim n ∞ 〈M〉(s) exists. For this purpose we first take a closer look at the random variables Y n A (u) and Y n B (u) for some fixed u ∈ R + . If patient i ∈ N is under treatment A we use the notation P A : = P (analogously P B : = P) to emphasize the stochastic influence of the treatment. It holds In the first equation we used the independence of T i and C i . With the independence of patients, it follows Y n A (u) ∼ Bin #N n A , S A (u) · P(C i > u) , analogously it follows, that Because of block randomisation, the convergences For the sake of readability we introduce the Notation −A : = B and −B : = A. Then 〈M〉(s) has components The limit in probability of the integrals can be computed with use of proposition A3. The integrands in (41) satisfy the and g(u) = λ x (u), with c A = (1 + v) and c B = (1 + v)/v. Note that the convergence holds because of Slutsky's theorem and equation (40). The integrands in (42) and (43)  Moreover, the jumpsize of M(s) is of order n −1/2 (and thus vanishes in the limit n ∞), because N x has jumpsize 1 and Y x is of order n. Thus, by a multivariate version of Rebolledo's martingale central limit theorem 12  Proof of theorem A2. Let y(u) : = (1 + v) · y A (u), with y A as in (40). Analogously to (39) it follows, that Y n B (u) ∼ Bin #N n B , S ω n A (u) · P(C i > u) . Using the weak law of large numbers, we therefore conclude We will now calculate the limit π and thus show its existence. Recalling that n −1/2 M A (s)D N (0, 1), we have in the limit n The convergence of the integrals holds using (40), Slutsky theorem and proposition A3 (a). To prove the assertion regarding μ k (s), notice that in the limit n ∞ in probability (1 + v) 2 · π(s).
The convergence of the integrals holds again using (40), Slutsky theorem and proposition A3 (a). In the last equality we made use of (46). Since J A (u) 1 and J B (u) 1 as n ∞, we likewise conclude In the last equality, we made use of (46). □ (13) and (14). Our preliminary work enables a quick calculation of σ LR (s) and σ Δ (s). For this purpose we first take a closer look at y A (u). Under the no loss to follow up and uniform recruitment assumptions, it holds

Proof of equations
Analogue to proof of theorem 1 and with use of (45) it holds M Δ n (s) Using equation (48), the identity λ A (u) = −S ′ A (u)/S A (u) and the substitution z = S A (u), we get equation (13). With (47) and (48), we also conclude M LR n (s) Using the same identities and substitution as in (49), we get equation (14). □ A seamless phase II/III design In this section we elaborate application of our design algorithm in the context of a two-armed randomized seamless phase II/III survival trial. In the phase II part, we assume that the two treatments are compared regarding the short-term endpoint survival rate at time s 0 . I.e. as phase II part, we consider a local level α test of the confirmatory null hypothesis H 0,1 : S A (s 0 ) = S B (s 0 ) on the s 0 survival rates using the rejection region R 1 realizes a single step test of H 0,1 . Only in the case of rejection of H 0,1 (i.e. B 1 > Φ −1 (1 − α)), we continue the trial in order to compare the two treatments also regarding long-term survival. I.e. as phase III part, we consider a local level α test of the confirmatory null hypothesis H 0,2 : S A (s) = S B (s) for all 0 ≤ s ≤ s max for some prefixed s max > 0 using the rejection region