Adding new experimental arms to randomised clinical trials: Impact on error rates

Background: Experimental treatments pass through various stages of development. If a treatment passes through early-phase experiments, the investigators may want to assess it in a late-phase randomised controlled trial. An efficient way to do this is adding it as a new research arm to an ongoing trial while the existing research arms continue, a so-called multi-arm platform trial. The familywise type I error rate is often a key quantity of interest in any multi-arm platform trial. We set out to clarify how it should be calculated when new arms are added to a trial some time after it has started. Methods: We show how the familywise type I error rate, any-pair and all-pairs powers can be calculated when a new arm is added to a platform trial. We extend the Dunnett probability and derive analytical formulae for the correlation between the test statistics of the existing pairwise comparison and that of the newly added arm. We also verify our analytical derivation via simulations. Results: Our results indicate that the familywise type I error rate depends on the shared control arm information (i.e. individuals in continuous and binary outcomes and primary outcome events in time-to-event outcomes) from the common control arm patients and the allocation ratio. The familywise type I error rate is driven more by the number of pairwise comparisons and the corresponding (pairwise) type I error rates than by the timing of the addition of the new arms. The familywise type I error rate can be estimated using Šidák’s correction if the correlation between the test statistics of pairwise comparisons is less than 0.30. Conclusions: The findings we present in this article can be used to design trials with pre-planned deferred arms or to add new pairwise comparisons within an ongoing platform trial where control of the pairwise error rate or familywise type I error rate (for a subset of pairwise comparisons) is required.

where t and t are the information times as defined before (0 t < t 1) and η kk = 1 π 0 1/π 0 + 1/π k 1/π 0 + 1/π k (2) where π 0 , π k and π k are the probability of assigning subjects to control and the experimental arms k and k . If the allocation ratio to all experimental arms is the same (A k = A), then η kk = A A + 1 .
As an example, we computed π 0 , π k , η, and Cov(Z k , Z k ) -using eqn. (1) and (2) -for the test statistics of all original pairwise comparisons (including interim stages) in the STAMPEDE trial. Since in the original design of STAMPEDE, A k = A = 0.5 and there were 5 experimental arms, then π 0 = 2 7 , π k = π k = 1 7 (k = 1, ..., 5), and η = 0.33. In the next section, we derive the analytical formula for Corr(Z k , Z k ). In the survival scenario, we carried out simulations for the STAMPEDE design at the indidual patient data level -results not shown. Our results were identical to those obtained via the trial-level simulation approach developed by Bratton et al. 2 .

Correlation in survival outcomes
Here, we show how the elements of correlation matrix Σ are estimated when a new arm is added mid-course a trial in trials with survival outcomes. To achieve this, we make use of the asymptotic properties of the log-rank test statistic. Tsiatis 3 showed that over time the sequence of log-rank test statistics approximately has an independent and normally distributed increment structure. We define S 1 and S 2 as the unstandardised log-rank score at times t 1 and t 2 where t 2 > t 1 . Then approximately, where θ is the log hazard ratio and V 1 and V 2 are the information for θ at times t 1 and t 2 .
We can then write down the following Z statistics: Z 1 based on data at time t 1 and Z 2 based on the accumulating data between t 1 and t 2 : Tsiatis 3 showed that the overall Z statistics for all data at time t 2 , i.e. including those in t 1 , is: where d(t 1 ) and d(t 2 ) are the number of total primary outcome events at times t 1 and t 2 , and Z 2 is the corresponding test statistic of the individuals recruited after information time t 1 .
Now, let T be the time of the final analysis, i.e. t = 1, and Z k (T ) be the corresponding test statistic for the kth pairwise comparison, i.e. comparison of experimental arm k versus control. Also, let T be the time and stage of the final analysis for the second comparison to be added later on, and Z (k+1) (T ) be the corresponding test statistic at final analysis for the added (K + 1) th experimental arm. According to eqn. (3), the log-rank test statistic of the new comparison Z (K+1) (T ) at the final analysis can be decomposed into two mutually independent parts: 1) the log-rank test statistic of the first part where the new comparison and the existing family of comparisons overlap, Z 1(K+1) (t 1 ); and 2) the logrank test statistic of the remainder where there is no overlap, i.e. Z 2(K+1) (t 2 ).
Given eqn. (3), where d (K+1) is the total number of events in comparison K + 1 at T , and d 1(K+1) and d 2(K+1) are the total number of events in part 1 and part 2 (excluding those occurred in Part 1), Under the proportional hazard (PH) assumption where e 1(K+1) , e 2(K+1) , and e (K+1) (= e 1(K+1) + e 2(K+1) ) are the control arm events in the newly added (K + 1) comparison in Part 1, Part 2 and overall control arm events at time T , i.e. the time of final analysis. Therefore, : Then, under the design conditions and with equal allocation ratio among all comparisons the total and shared control arm events between the new K + 1 experimental arm and the k arms (k = 1, 2, ..., K) that start together at the begining are the same, i.e. e For trials with continuous and binary outcomes, a similar analytical derivation can be obtained since the corresponding test statistics in these scenarios also has an independent and normally distributed increment structures. However, for these outcome measures the proportion of the common control arm shared primary events -as represented by a ratio in the above equation -is replaced by the proportion of common control shared observations -see below the details.

Correlation in binary outcomes
For binary outcome, we show that the Z test statistics of the difference in proportions, i.e. p 1 − p 0 , has independent and normally distributed increment structure. This means that at information time t > t Then a similar analytical derivation to that of time-to-event outcomes can be used to derive the formula for ρ * 12 in designs with binary outcomes.
In trials with binary outcomes, the outcomes of n 0 individuals in the control (C) arm are X 10 , X 20 , ..., X n00 ∼ Bern(p 0 ) and those of the experimental (E) arm are X 11 , X 21 , ..., X n01 ∼ Bern(p 1 ) where Bern(p) stands for the Bernoulli distribution with parameter p. Within our formulation of the null and alternative hypothesis, see Methods section of main text, H 1 0 : p 1 ≥ p 0 is tested against the (one-sided) alternative hypothesis H 1 1 : p 1 < p 0 . For simplicity, consider the 1 : 1 randomisation, i.e. n 0 = n 1 = n, with p = 1 2 (p 0 + p 1 ). In this case, the test statistic is given by If t is the information time when there are m (m < n) observations in each group, i.e. t = m n , and t the information time at t = n n = 1, then the Z test statistic can be decomposed as:

Correlation in continuous outcomes
For continuous outcomes, it has already been shown that the Z test statistics of the difference in means, i.e. µ 1 − µ 0 , has independent and normally distributed increment structure -see the main text, Methods section, and references 1 3 . Then a similar analytical derivation to that of time-to-event outcomes can be used to derive the formula for ρ * 12 in designs with continuous outcomes.

Section B: Shared events in STAMPEDE
To calculate the correlation between different test statistics, we needed to estimate (or predict) the shared control arm events of the corresponding pairwise comparisons. For the original family, primary survival results have previously been presented and therefore the observed shared control arm events with future pairwise comparisons (6-7) were used to calculate the correlation; there was no overlap with pairwise comparison 8. For the 6 th and 7 th pairwise comparisons, we used the ARTPEP software 4 firstly to predict when the primary efficacy analysis for each of these new experimental arms will take place. The predictions were done based on the survival function we observe for the control arm of the STAMPEDE trial and the number of control arm events required for that comparison. Then, under similar assumptions, we predicted the number of control arm events likely to be observed at the primary analysis that are shared with other previously added arms. The observed accrual and event rates amongst only those control arm patients shared across the deferred comparisons were used as input to the ARTPEP software. This then outputs the number of control events expected over time for those shared patients. Based on the first predictions, for when each of the deferred arms might report primary results, the ARTPEP output from the shared control arm patients enabled an estimation of the number of shared control arm events at that time.

Section C: RAMPART design
In RAMPART, patients in E 1 receive 1500mg of durvalumab for one year, and patients in E 2 receive a combination therapy of durvalumab and tremelimumab. Disease-free survival (DFS) is the primary outcome used throughout the trial at all analyses. For the sample size calculations, the target hazard ratio (HR) for E 1 vs. C comparison was assumed 0.75. However, a larger effect size of 0.70 is targeted for the combination therapy comparison. As a result, the two pairwise comparisons have different followup periods. All pairwise comparisons share some of the control arm events, with the two pairwise comparisons that start at the same time sharing the most, and the deferred comparison sharing the least control arm information. The design has formal looks for both lack-of-benefit and efficacy. (Technical detail on the implementation of the efficacy stopping rules in Royston et al. design can be found in articles by Blenkinsop et al. 5 6 .) Table 2 presents the design parameters for each of the pairwise comparisons, and the total number of control arm events required to trigger the final analysis. For full details of the design, please see the trial protocol at https://www.rampart-trial.org/.
. Table 1. Correlation structure between the newly-added comparisons as well as with those of the original ones in the STAMPEDE trialsee Figure 1 in the main text. Key: e 0 , total control arm primary outcome events required at the final analysis; k = 1, 2, 4, k = 6, 7, 8 n 0,kk , shared control arm patients; e 0,kk , (projected) number of shared primary outcome events in control arm; ρ * , the estimates of correlation between test statistics of pairwise comparisons.