Optimal allocation to treatments in a sequential multiple assignment randomized trial

One of the main questions in the design of a trial is how many subjects should be assigned to each treatment condition. Previous research has shown that equal randomization is not necessarily the best choice. We study the optimal allocation for a novel trial design, the sequential multiple assignment randomized trial, where subjects receive a sequence of treatments across various stages. A subject's randomization probabilities to treatments in the next stage depend on whether he or she responded to treatment in the current stage. We consider a prototypical sequential multiple assignment randomized trial design with two stages. Within such a design, many pairwise comparisons of treatment sequences can be made, and a multiple-objective optimal design strategy is proposed to consider all such comparisons simultaneously. The optimal design is sought under either a fixed total sample size or a fixed budget. A Shiny App is made available to find the optimal allocations and to evaluate the efficiency of competing designs. As the optimal design depends on the response rates to first-stage treatments, maximin optimal design methodology is used to find robust optimal designs. The proposed methodology is illustrated using a sequential multiple assignment randomized trial example on weight loss management.


Introduction
In many randomized controlled trials, participants are equally allocated to intervention arms. Such a design is consistent with the view of clinical equipoise that must exist before the start of the trial. 1 However, it may be preferable to allocate more participants to one arm than to another, for instance, when variances and/or costs vary across the treatment arms, [1][2][3][4][5] or when outcomes are categorical rather than quantitative. [6][7][8][9][10] The derivation of the optimal allocation of units to treatment conditions has not only been done for individually randomized trials, but also for more complex trial designs such as cluster-randomized trials, [11][12][13][14][15][16] and trials with partially nested data. [17][18][19] From a statistical point of view, it is more efficient to assign more subjects to the condition with the lowest costs and highest variance. Other, more practical, reasons to use unequal allocation over equal allocation include resource constraints, administrative, political or ethical concerns or when the aim is to gain experience from an intervention and to study its feasibility. 5,20 The focus of these references is on trials where subjects are randomized to either one single treatment or a combination of treatments, but do not change their assigned treatments during the course of the trial. This is a drawback since in real research practice some subjects may benefit more from one treatment and others more from another. Adaptive treatment strategies (ATSs), which are also called dynamic treatment regimens or adaptive interventions, are more flexible in the sense that they allow changing treatments over time. [21][22][23][24] An ATS individualizes treatments to subjects via decision rules that adjust the type, intensity, dosage or delivery of a treatment and specify when, whether and how to proceed at certain critical clinical decisions. For instance, those subjects for whom their assigned treatment turns out to be beneficial may continue the same treatment, while those others may be assigned to another treatment. The use of sequential treatments is often necessary because of: (i) heterogeneous treatment outcomes across subjects, (ii) change in treatment goals over time, (iii) the need to balance potential risks and benefits or (iv) to reduce costs when intensive treatment is not necessary. 25,26 Also, the use of sequential treatments implies multiple clinical decisions to be taken throughout the course of the study. These clinical decisions are formalized through ATSs.
Based on the number of treatments and treatment switches, various competing ATSs may be developed and they may be compared to one another in a so-called sequential multiple assignment randomized trial (SMART). 25,27 SMARTs are multi-stage randomized trial designs that are used to inform on the development of multiple ATSs embedded in it. The use of SMART designs allows researchers to evaluate the timing, sequencing and adaptive selection of treatments by using randomization and developing the best sequence(s) of treatments that lead to the optimal outcomes in the long term. In SMARTs, participants are allowed to switch through multiple stages, where each stage corresponds to a clinical decision, and subjects may be randomized at each stage. Sequenced randomization ensures that at each decision point the groups of participants assigned to the intervention options are balanced in terms of patient characteristics. This adds flexibility, allowing participants to remain on those treatments that are having an effect and giving the possibility to switch away to patients being treated with less effective options. This has made SMART designs appealing in a broad variety of health care, behavioural and psychological settings.
Multiple ATSs are embedded in a SMART and the main question in the design phase of a SMART is how many subjects should be assigned to each ATS, and whether an unequal allocation is better than an equal allocation. Some recent papers studied the relation between sample size and power for SMART designs, 25,28-34 but did not study the optimal allocation of units to treatment sequences and the loss of efficiency of using equal rather than unequal allocation.
The aim of this paper is to derive optimal allocations of units for a prototypical SMART design. This is a two-stage design where all units are randomized to two treatment conditions in the first stage. Those who respond to their assigned treatment are not re-randomized in the second stage, while those who do not respond are re-randomized to two secondstage treatments. This design was considered earlier by NeCamp et al. 32 in the setting of a cluster-randomized trial. In our contribution, we focus on individual randomization. We focus on sample sizes to be used when comparing two ATSs that start with different first-stage treatments. Four of such pairwise comparisons can be made in their prototypical SMART design, and one comparison may be of more importance than another. We therefore use multiple-objective optimal design methodology to consider all comparisons simultaneously, while taking into account their relative importance. 35 Multiple-objective optimal designs are useful when the study has multiple and conflicting objectives, such multiple pairwise comparisons of marginal means of ATSs in a SMART. It combines these objectives in one optimality criterion and tries to seek a design that is highly efficient for each of these criteria. We provide a Shiny App to calculate the optimal allocation of units and to evaluate the efficiency of the design with equal allocation. We demonstrate our optimal design methodology on the basis of a SMART example that compares two different treatments, nutrition (NUT) and physical activity (PHY), for weight loss management. Our focus is on SMARTs with a quantitative outcome with individual randomization. In other words, we do not focus on cluster-randomized SMARTs or other complex SMART designs with clustered data.
The remainder of our contribution is organized as follows. Section 'Prototypical SMART design' further discusses the prototypical SMART design and its embedded ATSs. Furthermore, this section introduces the example of weight loss management. Section 'Derivation of the optimal design' derives the optimal allocation of units for studies in which either the total sample size or the budget is fixed. In the latter case, we consider the realistic situation where costs may vary across treatment conditions. The optimal allocation turns out to depend on the subjects' probabilities to respond to their first-stage treatment. We therefore also focus on maximin optimal designs that are robust to incorrect prior estimates of these probabilities. Furthermore, Section 'Derivation of the optimal design' introduces the Shiny App that we developed for finding the optimal design. Section 'A SMART example' demonstrates our optimal design methodology on the basis of the weight loss example. It shows how the optimal design is influenced by the costs per treatment, proportion of responders to firststage treatments and the relative importance of the four pairwise comparisons. Section 'Discussion' summarizes our findings, discusses limitations of this contribution and gives directions for future research.

Prototypical SMART design
Before we focus on the prototypical SMART, we rehearse some general ingredients for arbitrary SMART (see for instance Ertefaie et al., 36 but using different notation). The observed covariates and treatment assignment at stage k are denoted O k and X k , respectively, and O k and X k denote the covariate and treatment histories up to and including stage k. Within a SMART multiple ATSs are embedded; these are denoted d i , i = 1, . . . , I. An ATS is basically a treatment trajectory and denoted by a vector of counterfactual treatment assignments for a given individual j. If the SMART has two stages, then d i = (X 1 , X R 2 , X NR 2 ), where X R 2 is the treatment assignment in the second stage had the subject responded, and X NR 2 is the treatment assignment in the second stage had he or she not responded. So, for a subject who responds, X NR 2 is not observed, and for a subject who does not respond X R 2 , hence d i is called a vector of counterfactual treatments. The observed treatment history only includes the treatments a subject has actually been assigned to X 2 = (X 1 , X 2 ). At the end of each stage k, a tailoring variable is measured which determines if a subject has responded to the treatment in that stage or not. In other words, this variable determines which treatment the subject is assigned to in the subsequent stage. At the end of the study (i.e. at the end of the final stage) the continuous outcome variable Y j is measured on each subject. These outcomes are then used to compare different ATSs to one another.
The prototypical SMART design is visualized in Figure 1. This design has been used in various research fields; published examples of its use in the treatment and long-term management of many chronic conditions include weight loss, 26,37,38 substance abuse, 39,40 cancer research, 41,42 adolescent depression, 43 adolescent conduct problems, 44 suicide, 45 and attention-deficit/hyperactivity disorder. 46 The prototypical SMART is a two-stage design with two first-stage treatments A and B; the proportions randomized to these treatments are denoted p 1 and 1 − p 1 , respectively. After some amount of time it is determined which subjects respond to their first-stage treatment, depending on some criterion such as a sufficient amount of weight loss or smoking cessation. The response rates to first-stage treatments A and B are equal to γ 1 and γ 2 , respectively. Those subjects who respond to their first-stage treatment are not further randomized, but receive second-stage treatment C or F, depending on their first-stage treatment. This may be the same as the first-stage treatment, but may also be another treatment or discontinuation of treatment with or without further monitoring. Those subjects who do not respond to their first-stage Figure 1. A scheme of the prototypical sequential multiple assignment randomized trial (SMART) design from NeCamp et al. 32 Circled 'R' denotes randomization at each stage. p 1 and (1 − p 1 ) are, respectively, the proportions of subjects receiving first-stage treatments A and B. p 2 and (1 − p 2 ) are, respectively, the proportions of subjects receiving second-stage treatments D and E for non-responders starting with first-stage treatment A. p 3 and (1 − p 3 ) are, respectively, the proportions of subjects receiving second-stage treatments G and H for non-responders starting with first-stage treatment B. γ 1 and γ 2 indicate, respectively, response rates for the first-stage treatments A and B. treatment are further randomized. Non-responders who received first-stage treatment A are randomized to second-stage treatments D and E, with proportions p 2 and 1 − p 2 , respectively. Such a second-stage treatment may be an intensified version of the first-stage treatment A, treatment A augmented with another treatment (which may be first-stage treatment B), first-stage treatment B, or an entirely different treatment. In the same manner, non-responders who received first-stage treatment B are randomized to two second-stage treatments G and H. This design includes eight different treatment conditions, where some of the second-stage treatments may be the same as the first-stage treatments or a combination of them.
Four ATSs are embedded in the prototypical SMART design, see Table 1. For instance, the first ATS, denoted d 1 , assigns all subjects to first-stage treatment A. Responders receive second-stage treatment C while non-responders receive second-stage treatment D.
The primary analysis goal of a SMART design is usually one of the following: (i) comparing first-stage intervention options; (ii) comparing second-stage intervention options; (iii) comparing two or more embedded ATSs in the study starting with the same first-stage intervention option or (iv) comparing two or more embedded ATSs in the study starting with different first-stage intervention options. 31 In the derivation of our optimal design, we focus on embedded ATSs that start with different first-stage treatments, which is a common primary aim in SMARTs. 32

Example: weight loss management
Bariatric surgery is an effective treatment for obese patients to lose weight. Given its costs, potentially harmful side effects and the risk of death, patients in the Netherlands are only considered eligible if they can demonstrate they have previously attempted other means to lose weight. Two treatments are an increase in PHY and a change in NUT. Figure 2 visualises the example SMART design. All patients are first randomized to either PHY or NUT. Then, at the end of the first stage, subjects are categorized as responders or non-responders, according to some predefined definition of response, for example, a threshold for weight loss after a given period of time. Non-responders are then re-randomized to second-stage treatments, regardless of their treatment in the first-stage. They either switch to the other treatment or pursue with a combination of both treatments (NUT + PHY) in the second stage. Responders are not re-randomized and pursue with their first-stage treatment. This example is visualized in Figure 2. Four different ATSs are embedded within this prototypical SMART design: . The superscript R refers to second-stage treatment assigned to responders, while the superscript NR denotes second-stage treatment assigned to non-responders.
The SMART design of this example is a simplification of the prototypical SMART design in the sense that just two treatments are involved. Responders continue with their first-stage treatment, while non-responders are randomized to the other treatment or a combination of both treatments. This specific SMART design was previously used for, among others, the treatment of anxiety disorder, 25 obsessive-compulsive disorder 47 and chronic pain. 48 Derivation of the optimal design Introduction For a given ATS d i , i = 1, . . . , 4, let Y j , j = 1, . . . , N d i be the continuous primary outcome of interest for the jth subject as measured at the end of stage 2, with N d i denoting the number of subjects whose treatment trajectories are Table 1. The four ATSs embedded in the prototypical SMART design.

ATS label
First-stage treatment Status at the end of first-stage Second-stage treatment Responder F Non-responder H ATS: adaptive treatment strategy; SMART: sequential multiple assignment randomized trial. consistent with the ATS d i . Y j is supposed to have E(Y j ) = μ i and Var(Y j ) = σ 2 , for all j = 1, . . . , N d i . We assume common variance σ 2 across all four ATSs. The target parameter μ i , the marginal mean outcome expected under ATS d i , depends on the proportion of responders to first-stage treatment in ATS d i in the population. It is estimated by a weighted average of the observed outcomes of subjects whose treatment trajectories are consistent with d i . 31 The weights follow from the fact that there is a structural imbalance between responders and non-responders: the nonresponders are re-randomized but the responders are not. For instance, for ATS d 1 , responders have a probability of p 1 of receiving the treatment sequence they actually received, and their subject-specific weights are W j = 1 / p 1 . For nonresponders, this probability is p 1 p 2 and hence their weight is W j = 1 / p 1 p 2 . Here p 1 is the randomization probability to treatment A in the first-stage and p 2 is the randomization probability to treatment C in the second-stage. The weights are the inverse of the probabilities, hence the weighting is called inverse probability weighting. By using these weights, the relative contribution of the responders and non-responders in the calculation of the weighted mean outcome in ATS d 1 is the same as when this ATS had not been embedded in a SMART. In other words, since the ATS is embedded in a SMART, the non-responders have a higher weight than the responders to account for the fact that some of them are randomized to treatment E, rather than treatment D. This is a generalization from the work of Ghosh et al. 31 in the sense that we allow the proportions p 1 and p 2 to be unequal to 0.5. For the other ATSs, subject-specific weights can be obtained in a similar way.
The weighted mean for the continuous primary outcome of interest for ATS d i is equal to The expected value of this weighted mean is given by Equation (2) shows that the weighted mean is an unbiased estimator of the marginal mean. The variance of the weighted mean is equal to For each ATS d i , the variance of the weighted mean is computed using the subject-specific weights. First, the expected number of people in the trial whose treatment trajectories are consistent with d i is computed for each ATS. For d 1 , this is equal to with the first term on the right side representing the expected number of responders and the second being the expected number of non-responders. The proportions p 1 and p 2 are defined as above, while N is the total sample size of the SMART and γ 1 is the response rate to first-stage treatment A. Following from (4), we obtain The variance for the weighted mean Y d 1 , for ATS d 1 , is obtained by plugging (5) and (6) into (3): The right side of (7) consists of two factors. The first is the common variance of a mean, while the second is used to account for the fact that subjects may be re-randomized. This second factor is a function of the response rate γ 1 to first-stage treatment A.
Using their respective subject-specific weights, formulae for the variance of the weighted mean Y d i for the other ATSs are obtained in a similar way; these are shown in Table 2.
We consider pairwise comparisons of ATSs that start with different first-stage treatments. The expected difference in weighted means of two such ATSs d i and d i ′ (with i = 1 or 2 and i ′ = 3 or 4) is μ i − μ i ′ with the corresponding variance we assume that weighted means of ATSs that start with different first-stage treatments are independent. This assumption holds as long as outcomes of subjects from ATSs that start different first-stage treatments are independent Table 2. Variance for the weighted mean Y di for the four adaptive treatment strategies (ATSs) embedded.
Considering the ATSs embedded in our example, four possible pairwise comparisons exist, with corresponding variances: , with Y d i being the weighted mean for the continuous primary outcome variable of interest for the ATS d i , i = 1, . . . , 4. Formulae for the variance of these comparisons can be derived by plugging in the variances of the single ATSs as reported in Table 2.
The optimal design ξ * is minimized. Each objective has its own optimal design. For instance, the optimal design for Φ 13 is ξ * 13 =(0.5, 1, 1), which implies both first-stage treatments have randomization probability 0.5, all non-responders in first-stage treatment A receive second-stage treatment D, and all nonresponders to first-stage treatment B receive second-stage treatment G. The optimal designs for the other objectives are ξ * 14 =(0.5, 1, 0), ξ * 23 =(0.5, 0, 1) and ξ * 24 =(0.5, 0, 0). The optimal design for one objective does not only hold for the other single objectives, but it may also perform poorly. 49 For that reason, a multiple-objective optimal design is used, so that all of the four pairwise comparisons are taken into account simultaneously. We do so by using a weighted sum of the four objectives, where weights are to be chosen by the user. The use of weights allows placing more emphasis on the one objective than another, subject to the researcher's interests and the goals of the study. A constraint is put on the weights such that their sum is equal to 1. The optimal design problem becomes a multiple-objective optimal design problem. The aim is to minimize the optimality criterion with λ ii ′ being the weight assigned to the respective objective Φ ii ′ . The corresponding optimal design is a so-called compound-optimal design.
Optimal design under a fixed total sample size In this scenario, the optimal design is sought under a fixed total sample size N. This is a realistic scenario when studying treatments for a rare disease or condition, but it can also be used when resource constraints allow recruiting a fixed number of subjects. It is assumed that a priori estimates of the response rates γ 1 and γ 2 are available. The optimal design minimizes the objective in (9); it is found by taking the gradient of (9) with respect to p 1 , p 2 and p 3 . The optimal proportions for the second-stage treatments are given by and It is worth noting that the optimal second-stage proportions p * 2 and p * 3 do not depend on the response rates γ 1 and γ 2 , or on the total sample size N, but only on the choice of the weights. In particular, p * 2 increases as λ 13 and/or λ 14 increase. This is obvious since objectives Φ 13 and Φ 14 are comparisons that include treatment D, and more efficient comparisons can be made if more subjects are assigned to this treatment. Similarly, p * 3 increases when λ 13 and/or λ 23 increase. This is also obvious since objectives Φ 13 and Φ 23 are comparisons that include treatment G, and more efficient comparisons can be made if more subjects are assigned to this treatment.
The optimal randomization probability for the first-stage treatment A takes on a more complicated form: where p * 1 depends on both γ 1 and γ 2 , and on the optimal proportions p * 2 and p * 3 , while it does not depend on N. A detailed derivation of the optimal design is given in the online supplement.

Optimal design under a fixed budget
In this scenario, we consider a budgetary constraint: the total costs C for treating subjects should not exceed the budget B. The costs are calculated as where c A are the costs per subject in treatment A and N A are the number of subjects who receive treatment A, and similarly for the other treatments B to H. The costs may vary across subjects and are assumed to be known beforehand. The sample sizes are stochastic since they depend on the proportions p 1 , p 2 and p 3 and response rates γ 1 and γ 2 . In the derivation of the optimal design, we use their expected values. For the first-stage treatments, we have for the second-stage treatments C, D and E, we have and for the second-stage treatments F, G and H, we have For a given budget, the total sample size N that can be used decreases when the costs increase. This implies that a design is not only determined by the proportions but also by the total sample size: ξ = ( p 1 , p 2 , p 3 , N ). The optimal design is found in a numerical manner through a domain search algorithm, see the online supplement for more details.

Robust optimal design
The optimal design depends on the response rates γ 1 and γ 2 , hence the optimal design is locally optimal. These parameters are often unknown in the design stage of a SMART and an educated a priori guess based on expert opinions or findings in the literature should be used. There is, however, no guarantee that such a guess is correct and robust optimal design methodology may be used to protect against a loss of efficiency due to a misspecification of the response rates. We use maximin optimal design methodology 50 to allow specification of intervals, rather than point estimates, of the two response rates. The maximin optimal design ξ MMD maximizes the minimal relative efficiency (RE) among all designs in the design space Ω. In other words, it selects the best of the worst-case scenarios. The maximin optimal design can be found using the following three steps: 1. Define the parameter space for the response rates and the design space Ω for the proportions. For instance, the first response rate γ 1 may be between 0.2 and 0.3 and the second response rate γ 2 may be between 0.35 and 0.45. The design space is Ω = (0 ≤ p 1 ≤ 1, 0 ≤ p 2 ≤ 1, 0 ≤ p 3 ≤ 1). 2. For each possible combination of the two response rates in the parameter space, compute the locally optimal design ξ LOD . Then compute the RE of each design ξ in Ω compared with the locally optimal design: RE = Φ(ξ LOD ) / Φ(ξ). 3. For each design in Ω, find its smallest RE value within the parameter space. Then, select the design that has the highest minimum RE across all designs in the design space. This is the maximin optimal design ξ MMD and its minimum RE is called the maximin value.
This procedure yields the design which is most robust to a misspecification of the response rates and it can be used when working under a fixed budget or under a fixed total sample size.

Statistical power for the optimal design
Once the optimal allocation to treatments has been derived, it makes sense to determine how much power the study has for each of the four pairwise comparisons of ATSs 51 . The following steps should be taken in such a power analysis: 1. Calculate the variance Var( Y d i ) for each of the four ATSs in the SMART. For the case of a fixed total sample size this can be done easily by plugging in the optimal proportions p * 1 , p * 2 and p * 3 and total sample size N into the equations of Table 2. For the case of a fixed budget, first, the total sample size N has to be calculated from the budget, costs and optimal proportions. This can be done on the basis of equations (15) to (18), as is further explained in the online supplement.

For each of the four pairwise comparisons of ATSs: calculate Var
3. For each of the four pairwise comparisons of ATSs, get a prior estimate of the expected difference in marginal means μ i − μ i ′ . A prior estimate may be obtained from the literature or an expert's expectations. As an alternative, one may use the minimal relevant effect size, that is, the smallest effect size that is considered to be relevant. 4. For each of the four pairwise comparisons of ATSs, select the type I error rate α and decide whether a one-sided or two-sided test has to be performed. 5. For each of the four pairwise comparisons of ATSs, calculate the power. For a one-sided alternative use the following equation: are the variances of the two ATSs to be compared, z 1−α is the (1 − α)th quantile of the standard normal distribution and z −1 is the inverse of the standard normal distribution. For a two-sided alternative, α has to be replaced by α / 2.

Shiny app
We developed a Shiny app 52 to facilitate finding the optimal design; it is available from https://andreamorciano.shinyapps. io/OptimalSMART/. It calculates locally optimal designs for a fixed total sample size as well as a fixed budget. In the first case, the user should specify the total sample size, in the latter case, he or she should specify the costs per treatment along with the budget. Furthermore, an a priori estimate of the two response rates should be specified to find the locally optimal design. The numerical algorithm that finds the optimal design for the budgetary constraint has a precision of 0.00002 for the optimal proportions. The Shiny app can also be used to find the maximin optimal design. It that case intervals [γ 1 − 0.05, γ 1 + 0.05] and [γ 2 − 0.05, γ 2 + 0.05] are considered around the user-specified values γ 1 and γ 2 . These intervals are continuous; in our algorithm, we use a step size of 0.01 to discretize these intervals, while a step size of 0.05 is used for the response rates. In the case the reader is interested in using a different step size, he/she can contact the first author.

A SMART example Introduction
We apply the optimal design methodology to the example of the weight loss management study of Figure 2. Participants are randomized to two first-stage treatments: PHY and NUT. A response is defined as a (absolute or relative) loss in body weight that exceeds a user-selected threshold value. We use three sets of a priori guesses for the two response rates of the two first-stage treatments: (γ 1 , γ 2 ) = (0.15, 0.25), (γ 1 , γ 2 ) = (0.25, 0.40) and (γ 1 , γ 2 ) = (0.40, 0.55). In each case, we choose a larger value for NUT than for PHY, as previous research has demonstrated that PHY produces smaller bodyweight loss than diet (NUT). 53 For the first set of response rates, the definition of a response is most stringent, resulting in the smallest response rates, and for the third it is most lenient, resulting in the highest response rates.
We consider three sets of weights for the multiple-objective optimal design (9). The first considers each comparison to be of equal importance, which implies that equal weights are used: (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.25, 0.25, 0.25, 0.25). The second puts more emphasis on those comparisons where second-stage treatments are either PHY or NUT, but not a combination of the two. In this case, researchers are mainly interested in the comparison between d 1 = (PHY, PHY R , NUT NR ) and d 3 = (NUT, NUT R , PHY NR ) rather than the other ones. Designs with a single second-stage treatment are less expensive, they may be easier to implement by the researchers and easier to adhere to by the participants. As an illustration we use (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.70, 0.10, 0.10, 0.10). The third set of weights puts more emphasis on those second-stage treatments that are a combination of NUT and PHY, for instance, because there is a believe combined treatment is more effective. In that case the main focus is on the comparison between d 2 = (PHY, PHY R , (NUT + PHY) NR ) and d 4 = (NUT, NUT R , (NUT + PHY) NR ). As an illustration we use (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.10, 0.10, 0.10, 0.70).
For this specific example, we developed another version of our Shiny app; this is available at https://andreamorciano. shinyapps.io/OptimalSMART2/.
Locally optimal design under a fixed total sample size For each combination of (γ 1 , γ 2 ) and (λ 13 , λ 14 , λ 23 , λ 24 ), the optimal design is given in Table 3, along with the RE of the balanced design (where p 1 = p 2 = p 3 = 0.50) as compared to the optimal design. We observe the optimal design hardly depends on the response rates, but it does depend on the weights. For each set of weights, the optimal design dictates (about) equal randomization to first-stage treatments. For the first set of weights, the optimal design is (almost) equal to the balanced design and the RE of the balanced design is 1. For the second set of weights more than half (two-thirds) of participants are randomized to single second-stage treatments. This is obvious because the chosen weights put more emphasis on the comparison of single second-stage treatments. For the third set of weights, less than half (one-third) of participants are randomized to single second-stage treatments. This is also obvious because the chosen weights put more emphasis on the comparison of combined second-stage treatments. The optimal proportions p * 2 and p * 3 for the second set of weights are the complement of those for the third set of weights. In all cases, the RE of the balanced design is above 0.9, which implies it performs rather well as compared to the optimal design.
The results do not necessarily apply to other combinations of weights and response rates, so a researcher who is planning a SMART is advised to use our Shiny app to derive the optimal design for the trial at hand, and to do a sensitivity analysis to study how the optimal design is influenced using by various realistic combinations of weights and response rates.

Locally optimal design under a fixed budget
To find the optimal design under a budgetary constraint, the costs for both treatments and the budget need to be defined. We assume both stages are of equal length, so the costs do not vary across stages. The costs for combined treatment are the sum of the costs for both single treatments. We consider two sets of costs for NUT (C N ) and PHY (C P ): (C N , C P ) = (300, 50) and (C N , C P ) = (300, 300). Let us assume the costs are expressed in euros and the length of each stage is one month. The costs for NUT are a reasonable amount to buy healthy food for one participant per month in the Netherlands. The costs for PHY in the first set cover a subscription to the local gym for one month, those in the second set also include personal training by a fitness coach. Furthermore, the budget is B = 100, 000. For the response rates and the weights, we consider the same sets of values as in Section 'Locally optimal design under a fixed total sample size'.
For (C N , C P ) = (300, 50), the optimal proportion p * 1 is somewhat above 0.5, which implies that in the first stage more subjects are randomized to the least expensive treatment PHY than to the more expensive treatment NUT. The optimal proportion p * 1 hardly depends on the chosen weights, but it slightly increases with increasing response rates. Higher response rates imply more subjects receive the same treatment in stage 2 as they did in stage 1. It is therefore advantageous to already randomize more subjects to the least expensive treatment PHY in stage 1, so that more subjects receive this treatment in stage 2 as well. For (C N , C P ) = (300, 300), both first-stage treatments are equally expensive and the optimal proportion p * 1 is (about) 0.5. It hardly depends on the chosen weights and the response rates. The optimal proportions p * 2 and p * 3 hardly depend on the response rates but they do depend on the chosen weights. For the first set of weights, (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.25, 0.25, 0.25, 0.25), somewhat more subjects are randomized to the single second-stage treatments NUT or PHY than to the combined second-stage treatment PHY + NUT. This is obvious since single second-stage treatments are less expensive than combined treatments. For the second set of weights, (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.70, 0.10, 0.10, 0.10), even more subjects are randomized to single second-stage treatments than for the first set of weights. This is also obvious because the second set of weights puts more emphasis on the comparison of those ATSs with single second-stage treatments. For the third set of weights, (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.10, 0.10, 0.10, 0.70), more subjects are randomized to combined second-stage treatments than Table 3. Locally optimal design: optimal proportions for first-stage (p * 1 ) and second-stage (p * 2 , p * 3 ) treatments for three different sets of weights (λ 13 , λ 23 , λ 14 , λ 24 ) for the multiple-objective optimal design, and for three different sets of response rates (γ 1 , γ 2 ). The relative efficiency (RE) of the balanced design is also provided. The optimal proportions are derived under a fixed total sample size.  to single second-stage treatments, which is also obvious because this set of weights puts more emphasis on the comparison of ATSs with combined second-stage treatments. The optimal total sample size N * depends on the combination of costs (C N , C P ). As is obvious, fewer subjects can be included for C P = 300 than for C P = 50. Furthermore, N * depends on the weights: most subjects can be included for the second set of weights and fewest for the third set of weights. For the second set of weights, more subjects are randomized to the least expensive single second-stage treatments, hence a larger total number of subjects can be included. Finally, more subjects can be included when the response rates increase. Subjects who respond to treatment are not re-randomized, hence they receive a single treatment in the second-stage. Single treatments are less expensive than combined treatments, hence more subjects can be included.
The RE of the balanced design slightly depends on the response rates. It is also related to the weights. The RE is highest for the first set of weights, since the optimal proportions are nearest to those of the balanced design. Slightly lower relative efficiencies are found for the third set of weights, but these relative efficiencies are still above 0.9. The lowest relative efficiencies are observed for the second set of weights as the optimal proportions deviate most from those of the balanced design. The lowest RE is RE = 0.85, which implies that the balanced design requires 100%[1 − (1 / 0.85)] = 17% more subjects than the optimal design.

Robust optimal design
The optimal designs that were presented in subsections 'Locally optimal design under a fixed total sample size' and 'Locally optimal design under a fixed budget' are locally optimal since they depend on the response rates γ 1 and γ 2 . Such response rates are often unknown in the design phase of a SMART and an educated a priori guess must be given. There is, however, no guarantee such a guess is correct, and an incorrect guess may result in a suboptimal design. This problem may be overcome by using robust optimal design methodology; here we use the maximin optimal design methodology as described in section 'Robust optimal design'.
Tables 5 and 6 in the online supplement show maximin optimal designs using the same sets of weights and combinations of costs as in Tables 3 and 4 Tables 3 and 4. A comparison of Table 3 and Table 5 of Supplemental material, and Table 4 and Table 6 of Supplemental material shows the locally optimal designs and maximin optimal designs are (almost) identical for the chosen sets of weights, response rates and costs. As a result, the minimal RE of the balanced design as given in Tables 5 and 6 of Supplemental material is almost equal to that of the RE of the balanced design in Tables 3 and 4. This result is not surprising since in Sections 'Locally optimal design under a fixed total sample size' and 'Locally optimal design under a fixed budget', it was shown that the optimal design hardly depends on the response rates. Of course, this finding does not necessarily hold for all combinations of responses rates, weights and costs. The user is therefore encouraged to apply maximin optimal design methodology in the case the response rates are likely to be misspecified. Table 4. Locally optimal design: optimal proportions for first-stage (p * 1 ) and second-stage (p * 2 , p * 3 ) treatments for three different sets of weights (λ 13 , λ 23 , λ 14 , λ 24 ) for the multiple-objective optimal design and for three different sets of response rates (γ 1 , γ 2 ). The relative efficiency (RE) of the balanced design is also provided. The optimal proportions are derived under a fixed budget with C = 100, 000 and for two different sets of costs (C P , C N ).

Discussion
Considering our example of a prototypical SMART design, we derived the optimal design ξ * =( p * 1 , p * 2 , p * 3 ) both under a fixed sample size and budget constraint. Under a fixed sample size, we found that the optimal probability in the first-stage p * 1 is mostly influenced by the weights chosen for the multiple-objective optimal design, while it is only slightly influenced by the response rates. On the other hand, second-stage optimal probabilities are only influenced by the choice of the weights. When considering the second set of weights (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.70, 0.10, 0.10, 0.10) or the third set, (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.10, 0.10, 0.10, 0.70), which, respectively, put more emphasis on the use of single and combined treatments, the optimal design ξ * performs better than the balanced design ξ b = (0.50, 0.50, 0.50), although the latter still achieves a RE above 0.90. When equal weights are used, ξ * and ξ b perform almost identically in terms of RE. Under a fixed budget, the optimal proportions are influenced also by the cost of treatments, besides the aforementioned weights and response rates. When including cost of treatments into account, the performance in terms of RE of the optimal design ξ * , with respect to ξ b , improves. The reason might be that unequal allocation of patients to intervention options seems to work better under a fixed budget than under a fixed sample size, as was also previously stated in the literature. 2,3 It is especially advised to use the optimal design rather than the balanced design when the second set of weights, (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.70, 0.10, 0.10, 0.10), is used. For this set, ξ b may have a RE as low as 0.86. When using equal weights for the multiple-objective optimal design, ξ b achieves a RE with respect to ξ * above 0.95. When using the third set of weights, (λ 13 , λ 14 , λ 23 , λ 24 ) = (0.10, 0.10, 0.10, 0.70), ξ b achieves a RE above 0.90.
It should be mentioned that the optimal designs are locally optimal, as they depend on the two unknown response rates γ 1 and γ 2 . One way to address this issue is using maximin optimal design methodology. In our example, the maximin optimal designs are quite similar to the locally optimal designs. In other words, the locally optimal designs are rather robust with respect to mild misspecification of the response rates. However, this finding does not always hold and it is advocated to derive a maximin optimal design if there is uncertainty about the a priori guesses of the response rates.
We derived our optimal design under the assumption that outcomes of subjects in ATSs that start with different firststage treatments are independent of each other, resulting in a zero correlation between weighted mean outcomes of ATSs starting with different first-stage treatments. There are situations in which this assumption may be violated. Consider for instance the situation in our weight loss example where just a limited number of personal trainers is available. It may then occur, a personal trainer trains subjects from ATSs starting with different first-stage treatments. In such a case, the outcomes of subjects who have been trained by the same personal trainer become dependent because of the trainer's skills, enthusiasm, experience, etc. In such a case, the assumption of independence is violated and hence our optimal design is not applicable. Such a problem can be easily solved by letting each personal trainer only train subjects from ATSs that start with the same first-stage treatment.
One limitation of this study is that it does not take clustered data structures into account, while such data may also occur in SMARTs. 54,55 Clustered data occur, among others, in cluster-randomized trials and multicentre trials. In such studies not only the total number of subjects in each treatment sequence needs to be determined, but also the number of clusters and cluster size. 56 The optimal design will depend on the intraclass correlation coefficient, which measures the degree of dependence of outcomes within the same cluster.
Another limitation of this study is that formulae and methodology only apply to the prototypical SMART designs in Figures 1 and 2. Based on the number of treatments, stages and randomizations, different SMART designs can be developed, of which many examples exist in the literature 57,58 and online. 59 It would be necessary to study optimal designs for such other types of SMART designs.
To our knowledge, this is the first paper that studies optimal allocation to treatments in SMARTs. Our Shiny App allows researchers in the fields of biomedical, health and social sciences to derive the optimal design for their SMART and to calculate the efficiency of a balanced design. We hope that this paper will further contribute to the development and implementation of SMARTs.