Re-randomisation trials in multi-episode settings: Estimands and independence estimators

Often patients may require treatment on multiple occasions. The re-randomisation design can be used in such multi-episode settings, as it allows patients to be re-enrolled and re-randomised for each new treatment episode they experience. We propose a set of estimands that can be used in multi-episode settings, focusing on issues unique to multi-episode settings, namely how each episode should be weighted, how the patient's treatment history in previous episodes should be handled, and whether episode-specific effects or average effects across all episodes should be used. We then propose independence estimators for each estimand, and show the manner in which many re-randomisation trials have been analysed in the past (a simple comparison between all intervention episodes vs. all control episodes) corresponds to a per-episode added-benefit estimand, that is, the average effect of the intervention across all episodes, over and above any benefit conferred from the intervention in previous episodes. We show this estimator is generally unbiased, and describe when other estimators will be unbiased. We conclude that (i) consideration of these estimands can help guide the choice of which analysis method is most appropriate; and (ii) the re-randomisation design with an independence estimator can be a useful approach in multi-episode settings.


Introduction
In many clinical settings, patients may require treatment on multiple occasions. For example, patients who experience acute sickle cell pain crises will require treatment for each pain crisis they experience; patients who experience severe asthma exacerbations will require treatment for each new exacerbation; and patients who develop febrile neutropenia as a result of chemotherapy would require treatment for each new round of chemotherapy leading to a neutropenic episode. [1][2][3] In these settings, patients would typically be given the same treatment for each new episode. We refer to these as 'multiepisode' settings.
Re-randomisation trials have been used to evaluate interventions in multi-episode settings. [4][5][6] The re-randomisation design involves re-enrolling and re-randomising patients for each new episode they experience; overviews of this design are available in references. 1,4,5,7 Importantly, the number of times each patient is enrolled in the trial is not specified in advance, but depends on how many treatment episodes they experience during the trial. 5 The re-randomisation design has been applied to trials of acute sickle cell pain crises, 2 severe asthma exacerbations, 3 influenza vaccination, 8 platelet transfusions for thrombocytopenia, 9 complications of cirrhosis, 10 febrile neutropenia, 11 ambient light to perform biophysical profiles, 12 pre-term births, 13 and in-vitro fertilisation. 14 These trials are typically analysed by comparing all episodes allocated to the intervention versus all episodes allocated to the control, for instance using a t-test or linear regression model on the individual episodes 1 (counterintuitively, the approach of ignoring within-patient correlation between episodes can provide valid standard errors under certain assumptions, e.g. see Dunning and Reeves 4 and Kahan et al., 5 although robust standard errors which allow for clustering may be more appropriate in case these assumptions fail 15 ).
With the publication of the ICH-E9 addendum, 16 there is growing recognition of the importance of estimands [17][18][19][20][21][22] (a precise definition of the treatment effect to be estimated), as careful consideration of the estimand can clarify research objectives and ensure the trial design and statistical methods are aligned with those objectives. However, estimands have not yet been defined for the re-randomisation design, or for multi-episode settings more generally. Therefore, it is not immediately clear what estimand the analysis approach discussed above corresponds to, or whether this estimand is in line with trial objectives.
In this article, we (i) propose a set of estimands that can be used for multi-episode settings; (ii) show that the analysis approach used in previous re-randomisation trials corresponds to one of these proposed estimands (the 'per-episode addedbenefit' estimand); (iii) propose a set of estimators for the re-randomisation design which correspond to the proposed estimands; (iv) discuss the bias of these estimators; and (v) discuss these estimands in the context of a trial of sickle cell pain crises. We focus on independence estimators (where estimates are obtained using a working independence correlation structure, often in conjunction with robust standard errors). For simplicity, we limit ourselves to the setting where the interventions under study do not affect whether future episodes occur; that is, patients would experience the same number of episodes during the trial period regardless of which treatments they receive. We note that re-randomisation trials could still be used if treatment affects the occurrence of future episodes (e.g. see Nason and Follmann 6 ), but that some of the estimands specified in this article would not be directly applicable. However, we do allow for non-enrolment in this article (where future episodes do occur, but patients are not re-enrolled into the trial).

Notation
Let i index patient, and j index the episode number within the trial. M i represents the number of episodes for which patient i is enrolled in the trial, M T the total number of episodes enrolled, and M T (j) the total number of patients for whom M i = j (i.e. the number of patients enrolled for j episodes). There are N T total patients enrolled in the trial, and N j represents the number of patients who are enrolled for at least j episodes. For example, M i = 3 indicates patient i was enrolled for three episodes, and N 3 = 45 denotes there were 45 patients enrolled in the trial at episode 3.
Let Y ij denote the outcome for patient i during episode j, and Z ij denote the treatment allocation (where 0 = control, 1 = intervention).Z ij represents the treatment allocations in the patient's previous episodes (referred to as the 'treatment history'). For example,Z 13 would be the vector (Z 11 , Z 12 ).
Let Y represent patient i's potential outcome at episode j under Z = 0 and treatment historyZ ij =z ij (and similarly for Z = 1). For clarity, we drop subscripts inside the brackets, as these are the same as subscripts on the outside of the brackets; for example, Y (Z=1,Z=z) ij is the same as Y

Estimands
An estimand 'summarises at a population level what the outcomes would be in the same patients under different treatment conditions being compared'. 16 It consists of five components: (1) the treatment regimens; (2) the population; (3) the outcome; (4) a population-level summary denoting how outcomes between treatment arms will be compared; and (5) how intercurrent events which may influence the interpretation of the treatment effect, such as treatment discontinuation, will be handled.
All of these components will need to be defined for re-randomisation trials. For some components, the considerations will generally be the same in a re-randomisation trial as in a parallel group or alternate trial design (e.g. the outcome definition, or how treatment discontinuation is handled), and so we do not discuss those here; instead, we focus on the aspects that are unique to multi-episode settings, and thus to re-randomisation trials.
The three aspects we focus on in this article are (i) whether each episode or each patient is given equal weight in the estimand; (ii) how the patient's 'treatment history' (i.e. the treatments they were allocated to in previous episodes) is handled; and (iii) the set of episodes included in the estimand.

Weighting by episode or by patient: Per-episode versus per-patient estimands
Here we discuss whether to give equal weight to each episode or to each patient in calculating the treatment effect.
Consider a setting where patients with more severe underlying diseases are pre-disposed to experience a larger number of episodes, but are less likely to respond to intervention than other patients. Imagine the treatment effect is β 1 for patients who experience one episode, β 2 for patients who experience two episodes, and that 50% of patients experience one episode and 50% experience two episodes. Then, the average treatment effect if we give equal weight to each patient is 0.5β 1 + 0.5β 2 , whereas the average treatment effect if we give equal weight to each episode is 0.33β 1 + 0.67β 2 (this difference comes from weighting the treatment effects by the percentage of patients vs. by the percentage of episodes that they correspond to).
In this section, we define two different approaches to weighting: the per-episode estimand, and the per-patient estimand. The approach we use to define these estimands is based on a sampling scheme framework, as has been used to define estimands in other settings (notably in the informative cluster size setting). [23][24][25][26][27][28][29] 3.1.1 Per-episode estimands The per-episode estimand (denoted by β E ) gives the average effect across all episodes. For instance, in the example listed above, the per-episode estimand would weigh the treatment effects β 1 and β 2 according to the number of episodes, they correspond to.
Let I and J be random variables, where I = i and J = j with a specified probability; I represents a randomly selected patient from the trial, and J represents a randomly selected episode from patient I = i. Then, Y IJ represents the outcome for a randomly selected episode from the trial, where each episode has a certain probability of being selected.
Then, let Y (IJ ) E denote an episode selected completely at random (i.e. where each episode has an equal probability of being selected). This is accomplished by letting Y (IJ ) E denote Y I E J E where I E = i with probability M i M T , and, conditional on I E , J E is uniformly distributed on (1, . . . , M I E ) (where M I E represents the total number of episodes, M i , for which patient I E was enrolled in the trial). By multiplying these two probabilities M i M T and 1 M i , Y (IJ ) E represents the outcome from a randomly selected episode from the trial, where each episode is selected with equal probability, equivalent to 1 M T . Then, the estimand is defined as That is, it is the difference in potential outcomes, averaged across all episodes. This estimand can be related to the potential outcomes by the expression ij (i.e. the potential treatment effect for patient i at episode j).

Per-patient estimands
The per-patient estimand (denoted by β P ) gives the average effect across patients. Consider again the example listed earlier; the per-patient estimand would weigh the treatment effects β 1 and β 2 according to the number of patients they correspond to (rather than according to the number of episodes, as in the per-episode estimand). Let Y (IJ ) P denote Y I P J P where I P has a uniform distribution on (1, . . . , N T ), and, conditional on I P , J P is uniformly distributed on (1, . . . , M I P ) (where M I P represents the value of M i for the randomly selected patient I P ). Then Y (IJ ) P denotes the outcome from a randomly selected episode from a randomly selected patient. It is based on a two-stage sampling scheme, where a patient is randomly selected in the first stage (each with equal probability), then an episode from within that patient is randomly selected in the second stage (each with equal probability). Because there are N T patients in the population of interest, the probability of selection for each patient is 1 N T , and because each patient has M i episodes, the probability of selection for each episode, given that patient i has been selected, is 1 M i . Therefore, the overall probability of selection for each episode is 1 N T 1 M i It can be seen as taking an average of the patient-specific treatment effects (where each patient-specific treatment effect is the average treatment effect for that patient over the episodes they experience).
The estimand is defined as This estimand can be related to the potential outcomes by the expression

Treatment history: Added-benefit versus policy-benefit estimands
One key difference in the multi-episode setting compared to many other settings with clustered data is that in the multiepisode setting episodes occur sequentially in time, and outcomes or treatment effects in a patient's current episode may depend on the treatments they received in previous episodes. For example, imagine that patient outcomes follow the model where Z i,j−1 represents the treatment in a patient's previous episode (and is defined as 0 for the patient's first episode). Under this model, the intervention effect carries forward into the next episode by the amount γ. Conditional on the treatment received in episode one, the treatment effect in episode two is β. However, we may wish to assess the effect of a treatment policy, where patients receive the intervention for all episodes versus the control for all episodes (i.e. at episode two it would compare treatment sequences Z = (1, 1) vs. Z = (0, 0)). In this case, the treatment effect at episode two would be β + γ. We can therefore define different estimands based on different ways of incorporating a patient's treatment history.
In this section, we define two different approaches for incorporating treatment history into the estimand definition: the policy-benefit estimand, and the added-benefit estimand. For the moment, we do not distinguish between per-episode and per-patient sampling schemes, and instead, use Y IJ as a placeholder for one of these two sampling schemes.

Policy-benefit estimands
The policy-benefit estimand (denoted by β PB ) gives the effect of a treatment policy where patients are allocated to either receive intervention for all episodes, or control for all episodes.
We define the policy-benefit estimand as the expected difference in potential outcomes for a randomly selected episode (based either on a per-episode or per-patient basis), which has been allocated intervention for the current and all previous episodes versus control for the current and all previous episodes. It can be thought of as the average effect of always versus never treating.
The estimand can be written as whereZ =1 means the patient was assigned to intervention for all previous episodes in the trial, andZ =0 means they were assigned to control for all previous episodes. This estimand can be related to the potential outcomes by the expression ij (i.e. it represents the difference in potential outcomes for patient i at episode j under a policy of all intervention vs. all control). (The above expression relates to a per-episode approach, but could easily be adapted to a per-patient strategy.)

Added-benefit estimands
The added-benefit estimand (denoted by β AB ) gives the additional effect of being assigned the intervention in the current episode (i.e. 'what is the benefit of the intervention in this episode, over and above the benefit from previous episodes?'). Consider again model (1); the added-benefit effect would omit the term γ, as this represents carried-over benefit from previous episodes, rather than any new benefit from the intervention in the current episode.
We define the added-benefit estimand as the expected difference in potential outcomes for a randomly selected episode (based either on a per-episode or per-patient basis), based on both the intervention and control potential outcomes sharing the same treatment history.
The estimand can be written as Here, the expectation is over both the IJ and the distribution ofZ, that is, the expectation is taken over the different treatment histories according to their probability of being observed for each patient. The reason for this is that the treatment effect may depend on treatment history; for instance, if the intervention became more or less effective the more often it is used, then its benefit in a particular episode will depend on the number of times a patient has received it previously. This estimand represents a weighted average across all possible treatment histories, with weights based on the probability of treatment historyZ =z being observed at episode j for patient i. This probability, denoted as P(Z ij =z ij ), depends on two factors; the allocation probabilities used in the study (e.g. 1:1 allocation ratio vs. 2:1 allocation ratio), and whether the patient would be enrolled in the trial at episode j under treatment historyZ ij =z ij (for instance, patients who experience two episodes may decide to enrol in the trial for their second episode under one treatment history, but not under a different history; if they would not enrol for a certain episode, that episode is not included in the estimand). Note that this probability is not known in practice, as although we know the allocation ratio, we will not know whether patients would be enrolled in the trial under different treatment histories.
The estimand can be related to the potential outcomes as follows ij represent the difference in potential outcomes for patient i at episode j under treatment historyZ =z; this can be thought of as the potential treatment effect for patient i at episode j under that specific treatment history. Then, β AB ij is a weighted average of the β AB(Z=z) ij , with weights equal to the probability of that treatment history being observed where the summation Z ij (·) is taken across all possible values ofZ ij at episode j. There are several important implications that follow on from this definition. The first is that the estimand depends on the distribution ofZ, and so changes to this distribution may lead to different values of the estimand (e.g. changing the allocation ratio from 1:1 to 2:1 will change the distribution of the treatment history, and hence the value of the estimand). This implication is inherent to any definition of the estimand which averages over different treatment histories. Of note, if the treatment effect is not affected by treatment history, then the true value of the estimand does not depend on the distribution ofZ. Then, the manner in which the treatment histories are weighted does not matter, as all definitions will be equivalent.
Another implication of this definition is that the estimand excludes episodes corresponding to treatment histories for which the patient would not be enrolled in the trial. For instance, if a patient would re-enrol in the trial for their second episode if they received intervention in the first episode, but not if they received control, then P(Z i2 = (1)) = 1 and P(Z i2 = (0)) = 0, and hence β AB(Z= (1)) i2 gets all the weight, and β AB(Z=(0)) i2 is excluded from the calculation. One benefit of defining the estimand to exclude episodes that would not be enrolled in the trial is that this definition is compatible with the setting where treatment allocation may influence the occurrence of subsequent episodes (for instance, in a trial evaluating an intervention to induce pregnancy in couples with difficulty conceiving), as this definition excludes episodes which would not have occurred under specific treatment histories. Although we do not consider this setting in this paper, we feel it is useful for definitions to apply to more complicated settings when possible.

The set of episodes to include: Average effects across all episodes versus episode-specific estimands
In the previous sections, we defined estimands that included all enrolled episodes. These provide a single overall average treatment effect across all episodes (with the type of average being defined by whether a per-episode or per-patient approach is used). This is in line with common practice in randomised trials, where a single average effect is typically provided for the main estimand, recognising that this average effect will not necessarily correspond to the true effect within each specific subgroup.
In multi-episode settings, it may sometimes be useful to define episode-specific estimands (denoted by β j for episode j). These estimands represent the treatment effect at episode j (note that β j only includes patients for whom M i ≥ j). For example, β 2 represents the treatment effect at episode 2, in patients with M i ≥ 2. Episode-specific estimands can be defined under either an added-benefit or policy-benefit framework; however, we note the per-episode versus per-patient framework does not apply, as each patient with M i ≥ j contributes only one episode for each β j .
Under the added-benefit framework, the episode-specific estimand is And under the policy-benefit framework, the estimand can be defined as where I ES is a random variable which represents a randomly selected patient (with equal probability) at episode j (from the subset of patients with M i ≥ j). We use the ES superscript to denote episode-specific. Of note, if we want to know whether the intervention is more effective the first time it is used versus the second time, a comparison between β 1 versus β 2 does not tell us this, because β 1 applies to patients for whom M i = 1, whereas β 2 does not. If the treatment effect is different in those for whom M i = 1 compared to M i > 1, then β 1 and β 2 may differ, even if the effect is the same both the first and second time the intervention is used.
An alternate way to define episode-specific estimands is to restrict the subset of patients for each β j to those where M i ≥ c, where c is the number of episodes we are interested in. For example, if we want to know whether the treatment effect is the same the first three times the intervention is used, then we could restrict to the subset of patients for whom M i ≥ 3 and compare

Comparison between estimands
A full set of the estimands described is shown in Table 1. We now discuss some of the differences between the per-episode versus per-patient, and the added-versus policy-benefit estimands.

Comparison between per-episode versus per-patient estimands
Whether the per-episode and per-patient estimands coincide will depend on whether there is 'informative cluster size', [23][24][25][26][27][28][29] where the patient acts as the cluster, and the size is determined by M i . There are two types of informative cluster size: the first is when values of Y ij differ across different values of M i (e.g. patients who experience more episodes have different outcomes than those who experience fewer). The second is when the association between Z ij and Y ij is different across different values of M i , that is, the treatment effect in a given episode depends on the number of episodes for which that patient is enrolled.
For collapsible treatment effect measures (such as a difference in means, risk difference, or risk ratio), the per-episode and per-patient estimands should coincide unless the second type of informative cluster size occurs (where the treatment effect depends on M i ). In this case, the value of the two estimands will differ.
For non-collapsible effect measures (such as an odds ratio), the per-episode and per-patient will coincide in the absence of both types of informative cluster size; if either type occurs (either the outcome or treatment effect depends on M i ) the value of the two estimands will differ.
A simple way to evaluate whether the cluster size is informative under a particular data generating model (for a difference in means) is to take the mean of the potential outcomes and the mean of the potential treatment effects (either β AB ij or β PB ij ) across episodes for each patient; then if the mean potential outcome for each patient differs according to M i , then the first type of informative cluster size has occurred, and if the mean potential treatment effect for each patient differs according to M i , then the second type has occurred (as this indicates the association between Z ij and Y ij is different across different values of M i ).
We provide a number of examples in the supplementary material of scenarios where these estimands either coincide or are different (for a difference in means).
We note that identifying informative cluster size can be more challenging for non-collapsible treatment effect measures, such as the odds ratio (OR). For instance, consider the setting where patients for whom M i = 1 and M i = 2 both have the same OR, but the baseline probability of an event differs between the two sets of patients. Here, the per-episode and per-patient estimands will provide the same conditional OR (conditional on M i ), but not the same marginal OR. Similarly, when considering covariate adjustment for baseline characteristics, the per-episode and per-patient conditional ORs will coincide provided the conditional OR does not vary across values of M i . Conversely, the marginal ORs will coincide provided the marginal OR does not vary across values of M i (i.e. the marginal OR based on all episodes for patients with a given value of M i ), and the baseline event rates do not vary by values of M i .

Comparison between added-benefit versus policy-benefit estimands
The added-benefit and policy-benefit estimands will coincide if β AB ij = β PB ij . This will occur when the treatment historyZ ij does not affect either . When this is not the case then the added-benefit and policybenefit estimands will usually differ.
We provide a number of examples in the supplementary material of scenarios where these estimands either coincide or are different.

Independence estimators
In this section, we describe a set of independence estimators that can be used in re-randomisation trials for the estimands described in Table 1. Independence estimators use a working independence correlation structure (i.e. they are based on a working assumption that episodes from the same patient are uncorrelated). We focus on a difference in means for a continuous outcome, however, the approaches in this section can be extended to estimate different types of treatment effects for different outcomes. They can be used in conjunction with robust standard errors which allow for clustering. 30 We do not explicitly define estimators for the episode-specific estimands, though these can be easily adapted from the estimators listed.
Provides the additional effect of being assigned the intervention in the current episode, over and above the benefit of being assigned the intervention in previous episodes Provides an average effect across episodes Per-episode policy-benefit Provides the effect of a treatment policy where patients are assigned intervention versus control for all episodes Provides an average effect across episodes Per-patient added-benefit Provides the additional effect of being assigned the intervention in the current episode, over and above the benefit of being assigned the intervention in previous episodes Provides an average effect across patients Per-patient policy-benefit Provides the effect of a treatment policy where patients are assigned intervention versus control for all episodes Provides an average effect across patients Episode-specific added-benefit Provides the additional effect of being assigned the intervention in the current episode, over and above the benefit of being assigned the intervention in previous episodes Provides an average effect at episode j, in patients with M i ≥ j Episode-specific policy-benefit Provides the effect of a treatment policy where patients are assigned intervention versus control for all episodes Provides an average effect at episode j, in patients with M i ≥ j Estimators are summarised in Table 2, along with an example Stata code to implement these estimators. We provide some mathematical results evaluating the bias of the per-episode added-benefit and per-patient added-benefit estimators in the Supplemental Material and discuss when the policy-benefit estimators may be biased in the following sections.

Per-episode added-benefit estimator
The independence estimator for per-episode added-benefit estimand iŝ That is, the per-episode added-benefit treatment effect is estimated as the difference in means between all intervention episodes versus all control episodes. This matches the analysis strategy that has typically been employed in previous re-randomisation trials, 1 implying that analyses of these trials have targeted a per-episode added-benefit estimand.
In the Appendix, we show that this estimator is unbiased under a wide variety of different data-generating mechanisms.

Per-patient added-benefit estimator
The per-patient estimator can be obtained by weighting each patient by the inverse of their number of episodes, that is, We show in the Appendix that the per-patient added-benefit estimator is unbiased under all data generating mechanisms we considered, except when there is differential non-enrolment. Non-enrolment occurs when a patient does not enrol in the trial for a particular episode; for instance, if they enrolled in the trial for the first episode they experience, but did not re-enrol for their second episode. We note that non-enrolment is not a form of dropout; patients only enrol in the trial for a single episode at a time, and there is no expectation that they must re-enrol for all subsequent episodes. Differential non-enrolment occurs when different types of patients from the intervention and control groups re-enrol for their next episode. For example, in a setting where patients enrol for a maximum of two episodes, differential Per-episode policy-benefit Step 1: Step 1: Step 2: Policy-benefit estimators are based on a maximum of two episodes per patient; this is because different causal models would need to be specified for the setting of ≥3 episodes. Added-benefit estimators can be used for any number of maximum episodes. For Stata code (version 15), 'y' denotes patient outcome, 'z' denotes treatment allocation, 'id' is a unique identifier for a patient, 'm_i' denotes the number of episodes for which the patient is enrolled in the trial, 'z_prev' denotes the patient's treatment allocation in their previous episode (and is set to 0 if it is the patient's first episode), 'x_ep' is an indicator for episode 2, 'prop_1st_ep' and 'prop_2nd_ep' represent the proportion of episodes in the trial which are first and second episodes, respectively, and 'prop_has_1ep' and 'prop_has_2ep' denote the proportion of patients enrolled in the trial for one and two episodes, respectively. In order to run the above code in Stata, 'prop_1st_ep', 'prop_2nd_ep', 'prop_has_1ep', and 'prop_has_2ep' must be saved as Stata local macros.
non-enrolment could occur if in the episode 1 intervention group, healthier patients were more likely to re-enrol in the trial for episode 2 than sicker patients, but in the episode 1 control group, sicker patients are more likely to re-enrol.

Per-episode policy-benefit estimator
To estimate the per-episode policy-benefit treatment effect, we must first estimate the β PB ij 's (as defined earlier) and then use these to calculate the overall treatment effect. We can do this by specifying a causal model for the effect of treatment history (Z ij ) on the potential outcomes. For example, in a trial where patients experience a maximum of two episodes, we might assume the following model where Z i,j−1 is the treatment allocation in the previous episode (and is set to 0 for j = 1), and X ep ij is an indicator for episode 2 (i.e. X ep ij = 1 for episode 2, and 0 otherwise). This model allows the effect of the intervention in episode 1 to carry forward into episode 2 (the term γ), and for the intervention to get more (or less) effective the second time, it is used (the term δ). After obtaining estimates for β, γ, and δ, we can use these to estimate the β PB ij 's; for example, following on from the definition of the per-episode policy-benefit estimand and model (2) above, theβ PB ij for all first episodes isβ, and for all second episodes isβ +γ +δ.
We can then use these estimates to get an overall estimate of β PB E as followŝ Although the term β ep is not directly used to estimate the treatment effect, it is necessary to include X ep ij in model (2), as estimates may be biased otherwise. This is because X ep ij is associated with Z i,j−1 , and so may act as a confounder if omitted from the model, resulting in biased estimates of γ. For trials with j > 2, separate indicator variables for each episode should be included in the model. This estimator will be unbiased if (a) we correctly specify the causal model in (2); and if (b) we are able to obtain unbiased estimates of the causal parameters (e.g. β, γ, and δ in model (2)). In some instances, this estimator will be unbiased even if we misspecify the causal model, for instance by including an additional unneeded parameter (e.g. including Z i,j−1 in the model when potential outcomes in the current episode are not affected by treatment history). However, it will generally be biased if we omit a true causal parameter from the model (e.g. if we omit Z i,j−1 from the model if potential outcomes are affected by treatment history).
We note that even if the causal model in (2) is correctly specified, we may still obtain biased estimates of the causal parameters; for example, under differential non-enrolment, the parameter estimate of γ will be biased, and hence the overall estimator will also be biased. 15

Per-patient policy-benefit estimator
For the per-patient policy-benefit estimator we need to obtain estimates from model (2) using weighted least-squares, where each patient is weighted by the inverse of their number of episodes, W i = 1 M i . After obtaining estimates and calculating thê β PB ij 's, the overall treatment effect is calculated asβ Using analysis model (2) above, this equates tô where This estimator will be biased in the same settings as the per-episode policy-benefit estimator 15 (incorrectly specified causal model, biased parameter estimates due to differential non-enrolment), and in the same settings as the per-patient added-benefit estimator (differential non-enrolment based on the outcome from the previous episode).

Example: Trial in acute sickle cell pain crises
An example of a setting where re-randomisation has been used is trials in acute pain crises in patients with sickle cell disease. 2 Pain crises can be recurring, and so some patients may experience more than one pain crisis in a given period of time. 5 Patients who experience sickle cell pain crises typically require hospitalisation and treatment to manage symptoms. Treatment generally consists of morphine, which can have unwanted side effects. Therefore, alternative treatment options would be useful; for example, one trial assessed the use of high-dose ibuprofen compared to a placebo to reduce the amount of morphine required to manage pain symptoms. 31 We note that the interventions under study in this example (ibuprofen, placebo) will not affect the occurrence of future episodes (e.g. ibuprofen would not prevent future pain crises from occurring).
There are two important considerations when choosing the most estimand. First, the distribution of pain crises across patients is often skewed, where most patients may experience only one or two episodes during the course of the trial, but a small number of patients will experience a much larger number of episodes. 5 Therefore, differences between the per-episode and per-patient estimands may be large if the treatment effect differs in patients predisposed to experience a large number of pain crises. Both estimands address clinically important questions. The per-episode has the interpretation 'the average effect across each time the intervention is used'. It gives more weight to patients who experience frequent pain crises, and it could be argued that it is more important that the intervention work well in these patients compared to those who experience infrequent crises (as they will undergo the intervention more frequently). However, it may also be useful to know how well the intervention works for the average patient; for instance, if the intervention works well for the 80% of patients in the trial who experience only one or two pain crises, this would still be clinically important to know, even if it worked less well in the other 20% of patients who experienced more frequent pain crises. Therefore, both estimands could be used, one as the primary and the other as a secondary estimand. Given that per-episode effects can be estimated under less restrictive assumptions (i.e. it does not require the assumption of no differential non-enrolment, unlike the per-patient estimator), we suggest this could be considered for the primary estimand, though this choice largely depends on the overall goals of the trial.
Second, the interventions used to treat symptoms from acute sickle cell pain crises (e.g. ibuprofen 5 ) are unlikely to influence either the outcome or treatment effect in subsequent episodes. Therefore, the added-benefit and policy-benefit estimands should be similar. Given the additional complexity of estimating policy-benefit effects, we suggest an added-benefit approach be used here. Coupled with a per-episode approach, this estimand provides an interpretation of 'the additional benefit of the intervention averaged across all pain crises for which it would be used', and can be easily estimated using the difference between all interventions episodes versus all control episodes.

Discussion
We have proposed a set of estimands that can be used in multi-episode settings, as well as a set of independence estimators that can be used in re-randomisation trials. Our main result is to show that the analytical approach most commonly used in re-randomisation trials (comparing all intervention vs. all control episodes directly) corresponds to a per-episode addedbenefit estimand. This implies these trials have been estimating the average effect of the intervention across episodes, over and above any benefit conferred from the intervention in previous episodes. We have found the per-episode addedbenefit estimator to be generally unbiased, which suggests results from trials using this approach are valid.
However, other estimands we have proposed here may also be useful in future trials. For instance, in trials for which some patients experience a large number of episodes, per-patient estimands may give a better idea of how well the intervention works for the average patient in such settings. Similarly, in trials for which the treatments are expected to influence outcomes or treatment effects in subsequent episodes, policy-benefit estimands may provide a better picture of the overall benefit to adopting such interventions into routine practice.
Our derivations show that while the per-episode added-benefit estimator is generally unbiased in re-randomisation trials, other estimands (per-patient added-benefit and policy-benefit estimands) require untestable assumptions for unbiasedness. Sometimes we can determine that the assumptions are plausible based on subject matter knowledge (e.g. the assumptions underpinning the policy-benefit estimator in the sickle cell example given earlier), but in other settings, the required assumptions might be quite strong (e.g. it may be unlikely we are able to determine an appropriate causal model for the policy-benefit estimator under complex carryover mechanisms, or when carryover or treatment effects interact with disease progression). In these settings, it may be desirable to specify policy-benefit or per-patient as secondary estimands, with the per-episode addedbenefit used as the primary (as it can be reliably estimated without strong assumptions). Alternatively, if policy-benefit is the main question of interest and it is anticipated that a plausible causal model cannot be identified, then it may be useful to consider alternative trial designs which can more easily estimate this type of estimand, such as a cluster design where participants are assigned to receive the same treatment for each episode; however, only limited research has been undertaken on this type of trial design to date, and further research is warranted before it is used regularly.
One drawback of the added-benefit estimand is that it averages over different (randomised) treatment histories, which will not reflect the treatment histories seen in clinical practice. If the treatment effect is not modified by treatment allocation in previous episodes (as would be expected in the sickle cell example given earlier), then the added-benefit estimand will generalise to clinical practice, as the treatment histories will not affect the value of the estimand. Otherwise, the addedbenefit estimand will be less generalisable, though depending on the specific study objectives, it may still be useful (as it is the only way of addressing the question 'how much additional benefit is conferred by the intervention in this episode, over and above any previous episodes' which can be estimated without strong assumptions).
As with any treatment effect which is an average over different patients or data points, the estimands defined here may not be representative of the treatment effect for a particular episode. For instance, the treatment effect may decrease in later episodes compared to earlier ones (for instance, in the case of progressive disease, where patients' health status gradually worsens for each new episode they experience, and there is a treatment-by-disease status interaction, or if the treatment itself loses efficacy the more often it is given). It may be of interest to explore supplementary estimands here, particularly to help identify whether treatment should be given for each new episode, or whether there is a certain point at which it becomes no longer useful. This could be done using episode-specific estimands (i.e. the treatment effect the first time a treatment is given vs. the second time, etc.), though we note this may not be sufficient to answer the question on its own. For instance, consider the example of progressive disease given above; the episode-specific effects will show the treatment becoming less effective in later episodes, though it won't be clear whether this is due to the treatment itself becoming less effective, or because of an interaction with disease progression. Therefore, in this type of setting it may also be worth exploring subgroup analyses (e.g. the episode-specific effect in patients whose disease has progressed vs. those where it has not), which may provide a more complete picture of which patients or episodes should receive treatment.
The re-randomisation design is a new type of trial design, and as such, there has been little methodological research to date. Future extensions to the work in this article would be useful, for instance, to evaluate alternatives to independence estimators (such as mixed-effects models), or alternative designs (such as a cluster design where patients are allocated to receive the same treatment for all episodes). Sample size requirements will likely differ for each of the estimands described (e.g. sample size requirements for the per-patient estimands will likely be based on recruiting a specified number of patients, whereas the perepisode estimands will likely require a specified number of episodes, irrespective of patients 5 ), and so further research in this area is warranted. Further work to implement new packages in routine statistical software (such as Stata or R) to automate estimation of the policy-benefit estimands (which can be challenging to implement manually, due to the need to specify a different causal model depending on the number of episodes, and then to average over different parameters in the causal model) would be useful. More generally, further work on the estimation of policy-benefit estimands would be useful, particularly in the setting where some patients experience a large number of episodes, where including an indicator variable for each episode may lead to computational issues. Finally, in this paper, we considered the setting where treatment allocation does not affect the occurrence of subsequent episodes. In some clinical settings, this may not be the case, and so it would be useful to extend the estimands here to the setting where treatment allocation may prevent subsequent episodes from occurring.