Continuous(ly) missing outcome data in network meta-analysis: A one-stage pattern-mixture model approach

Appropriate handling of aggregate missing outcome data is necessary to minimise bias in the conclusions of systematic reviews. The two-stage pattern-mixture model has been already proposed to address aggregate missing continuous outcome data. While this approach is more proper compared with the exclusion of missing continuous outcome data and simple imputation methods, it does not offer flexible modelling of missing continuous outcome data to investigate their implications on the conclusions thoroughly. Therefore, we propose a one-stage pattern-mixture model approach under the Bayesian framework to address missing continuous outcome data in a network of interventions and gain knowledge about the missingness process in different trials and interventions. We extend the hierarchical network meta-analysis model for one aggregate continuous outcome to incorporate a missingness parameter that measures the departure from the missing at random assumption. We consider various effect size estimates for continuous data, and two informative missingness parameters, the informative missingness difference of means and the informative missingness ratio of means. We incorporate our prior belief about the missingness parameters while allowing for several possibilities of prior structures to account for the fact that the missingness process may differ in the network. The method is exemplified in two networks from published reviews comprising a different amount of missing continuous outcome data.


Introduction
Binary outcomes have drawn methodologically more attention for being the most prevalent in systematic reviews 1,2 and rather straightforward to handle. 3 Although less widespread in systematic reviews for being often more complex to interpret and more labour-intensive to measure compared to binary outcomes, 1 continuous outcomes play an important role in decision-making and clinical practice. Similar to binary outcomes, continuous outcomes are also prone to missing outcome data (MOD). For instance, in a systematic review on respiratory rehabilitation in chronic obstructive pulmonary disease, Ebrahim et al. 4 observed a MOD rate ranging from 0% to 38% across 31 included trials. In a collection of 190 Cochrane systematic reviews published between 2009 and 2012 in three mental health Cochrane Groups, 27 out of 140 selected meta-analyses considered a continuous Figure 1. The distribution of missing continuous outcome data (MCOD; expressed as a percentage) across the different health fields in selected network meta-analyses (left plot with split violins) and the pairwise meta-analyses (right plot with split violins) from two surveys on the reporting and handling of aggregate missing outcome data. 5,6 The red violins illustrate the density of differences in the percentage of MCOD between the compared arms across trials, and the grey violins indicate the distribution of the total percentage of MCOD across trials. The red and grey points indicate the median value in each split violin. DANG, Cochrane Depression, Anxiety and Neurosis Group; DPLPG, Cochrane Developmental, Psychosocial and Learning Problems Group; SG, Cochrane Schizophrenia Group.
mechanisms. 11 In their article, the authors study a few modelling options concerning the structure of the IMP, thus potentially overlooking alternatives that would allow the researcher to investigate in more detail the implications of different structures on the conclusions.
Occasioned by the limitations of the two-stage approach by Mavridis et al., 9 the present study aims to provide a one-stage pattern-mixture model approach under the Bayesian framework for MCOD to gain knowledge with respect to the missingness mechanisms across different interventions and trials in a network. A one-stage model approach allows for an eloquent synthesis of the data in a single step, and it allows us to learn about the missingness process via the estimation of the missingness parameter, thus offering an immediate advantage over the two-stage pattern-mixture model approach.
The article has the following structure. In Section 2, we introduce two published systematic reviews with network meta-analysis (NMA) with different amount of MCOD; an NMA of treatment options for Parkinson's disease with a considerable amount of MCOD in many trials (>20% MCOD) and an NMA of physical activities for type-2 diabetes patients with a moderate amount of MCOD. In Section 3, we describe the one-stage pattern-mixture model and the missingness parameters with different structures for their prior distribution. In Section 4, we apply our method to the motivating examples. We conclude with a discussion in Section 5 and brief recommendations in Section 6.

Motivating examples
We consider two motivating examples: (a) the network of Stowe et al. 12 investigating antiparkinsonian drugs by measuring the change from baseline of patient off-time reduction (Table 1) and (b) the network of Schwingshackl et al. 13 assessing the effect of different training modalities on HbA1c for patients with type 2 diabetes (Table 2). In both examples, negative values of mean difference (MD), and standardised mean difference (SMD) and positive values of the ratio of means (RoM) in the logarithmic scale -the most commonly used effect sizes in the synthesis of continuous outcomes -favour the first intervention in the comparison. The rationale of the choice of these networks heavily depends on the amount of missingness, quantified as the percentage of MCOD (%MCOD) from our broader collection of five networks ( Figure 2; Table S1). The network of Stowe et al. 12 has a considerable % MCOD (>20%) in many trials, whereas the network of Schwingshackl et al. 13 suffers from a moderate amount of MCOD.

Notation
Consider a series of N trials comparing different sets of T interventions for the same patient population and health condition. In the absence of MCOD, we have information on the mean y ik and the variance v ik of a continuous outcome (for instance, change in pain intensity scores) as measured in the n ik randomised participants in arm k (k ¼ 1; 2; . . . ; a i with a i being the number of arms in trial i) of trial i, where By convention, we assume v ik to be known.
In the presence of MCOD, we do not have information on y ik and v ik . Instead, we know the mean y o ik (superscript to abbreviate observed) and the variance v o ik of the continuous outcome as measured in the n o ik participants who completed (from now on called completers) arm k of trial i out of the total randomised (n ik ), where with m ik ¼ n ik À n o ik the missing participants in arm k of trial i, and q ik the probability of being missing. We assign a uniform prior distribution on q ik with support in 0;1 ½ . Exclusion of MCOD corresponds to analysing the n o ik completers in arm k of trial i. By reducing the randomised sample to the completers, exclusion of MCOD results in loss of power and may increase the risk of biased results if the reasons for premature discontinuation are informative. 10,11  Of the total 33 two-arm trials, we excluded three trials for reporting opposite signs in the mean change from baseline in the compared arms (log RoM cannot be calculated for these trials), and one trial due to inaccuracies in the available information regarding MCOD ( Figure 1 in Stowe et al. 12 ). b Green indicates a low risk of attrition bias ( 5%), red indicates a substantial risk of attrition bias (>20%), and orange indicates a moderate risk of attrition bias. Figure 2. Networks on a primary continuous outcome with extractable missing continuous outcome data (MCOD) from five published systematic reviews. The size of the node is proportional to the number of observed treatment comparisons that include that node. The thickness of the edge is proportional to the number of trials that investigated that comparison. A low, moderate and large amount of percentage of total MCOD (%MCOD) is represented in each node and edge with green (%MCOD 5), orange and red colour (%MCOD > 20). In each node, the %MCOD is the ratio of the total number of MCOD to the total number randomised across the trials that investigate the intervention. In each edge, the %MCOD is the ratio of the total number of MCOD to the total number randomised across the trials with that comparison. COMTIþLD: catechol-O-methyl transferase inhibitors plus levodopa; DAþLD: dopamine agonist plus levodopa; MAOBIþLD: monoamine oxidase type B inhibitors plus levodopa; PBOþLD: placebo plus levodopa. The number of randomised participants was obtained from the corresponding published reports, and then we calculated the missing outcome data in each trial. b Green implies a low risk of attrition bias ( 5%), red indicates a substantial risk of attrition bias (>20%), and orange implies a moderate risk of attrition bias. c Calculated as the difference between the arms with the maximum and minimum percentage of missing outcome data.

One-stage pattern-mixture model
In each trial, we are interested in estimating the unknown parameter h ik while accounting for the missing participants properly. For that purpose, we consider a pattern-mixture model to study the missing participants and completers jointly where h m ik is the missingness parameter that indicates the underlying mean of the continuous outcome among the missing participants in arm k of trial i. By using equation (1), the randomised sample is retained in each trial of the network to allow for inferences on the whole target population irrespective of trial completion or premature discontinuation. The available data provide no information on h m ik , and therefore, we have to make clinically plausible assumptions.

Informative missingness parameters
In order to quantify the missingness process, appropriate missingness parameters have been proposed. Following Mavridis et al., 9 we consider the following IMPs for MCOD: Informative missingness difference of means (IMDoM). The IMDoM is defined as the difference between the mean outcome among missing participants and the mean outcome among the completers (1) and rearranging, we obtain The u ik s are not known and cannot be retrieved from the data, and hence we need to propose plausible values for them. Under the Bayesian framework, it naturally translates to assigning a prior distribution for u ik . A natural choice for u ik is a normal prior distribution with mean D ik and variance r 2 ik that reflect our prior belief and uncertainty about the missingness process on average, respectively, in arm k of trial i. Then, D ik > 0 implies that a larger outcome on average is more likely to occur among missing participants rather than completers in arm k of trial i, D ik < 0 implies the opposite, and D ik ¼ 0 indicates the missing at random (MAR) assumption on average. According to Mavridis et al., 9 the precision of the summary effect decreases for r 2 ik ! 3 2 ; afterwards, the betweentrial variance becomes zero, and the increase in within-trial standard error increases further the standard error of the summary effect ( Figure 5). 9 We consider r 2 ik ! 3 2 to be conservative, whereas r 2 ik < 1 to be liberal regarding our prior belief for the missingness mechanism on average. For model identifiability, we use informative priors on u ik following the prior distributions considered in Mavridis et al. 9 Specifically, we use r 2 ik ¼ 1 in all models as the primary analysis, and r 2 ik ¼ 3 2 as the sensitivity analysis. With the sensitivity analysis, we aim to investigate whether and how increasing the prior variance of u ik may impact on the investigated NMA parameters (see Section 3.4).
Informative missingness ratio of means (IMRoM). The IMRoM is defined as the ratio of the mean outcome among the missing participants to the mean outcome among the completers By replacing h m ik in equation (1) and rearranging, we obtain Then, we can assign a normal prior distribution on x ik with mean D ik and variance r 2 ik . In line with IMDoM, we consider r 2 ik ! 0:4 2 to be conservative, whereas r 2 ik < 0:2 2 to be liberal regarding our prior belief for the missingness mechanism on average ( Figure 5, there 9 ). We use r 2 ik ¼ 0:2 2 in all models as the primary analysis, and r 2 ik ¼ 0:4 2 as the sensitivity analysis to investigate the impact of increasing the prior variance of x ik on the investigated NMA parameters.

Structural assumptions for the missingness parameters
There are various options of increasing flexibility with respect to how the IMPs across trials and arms can be structured. 11,14 We focus on the following: • common-within-network; the IMPs are assumed to be the same in the whole network, and only one parameter is estimated per network; • trial-specific; the IMPs are different across trials but assumed to be the same in the compared arms resulting in down-weighting trials with unbalanced MCOD; 10 • intervention-specific; allowing the IMP to be different across interventions but shared across trials, thus resulting in down-weighting trials with higher total MCOD. 10 The IMPs can be further assumed to be identical, hierarchical or independent, which corresponds to hypothesising that these parameters are constant, exchangeable or different, respectively. In this work, the latter is assumed to be either uncorrelated or correlated across the arms of every trial with correlation cor . . . ; a i f g , and j 6 ¼ l following Mavridis et al. 9 The researchers should consider an expert opinion to define the correlation parameter, 9 though it is not an easy task. White et al. 15 offer a framework to elicit the correlation parameter of several arms in a trial. Furthermore, applying both correlated and uncorrelated IMPs can aid in understanding whether results are robust to the joint distribution of IMPs. Table 3 summarises all structural assumptions mentioned above for u ik and x ik under the MAR assumption on average in the primary and sensitivity analyses. Note that in the hierarchical structure, we have assigned a uniform prior distribution on the hyper-standard deviation; other proper prior distributions may be considered as well. 16

Mean difference
We use an identity function to link h ik with u i and d i;k1 in trial i as follows is the underlying mean in the baseline arm and d i;k1 is the random-effect that indicates the MD between arm k and baseline arm in trial i.

Standardised mean difference
We use the following link function is the SMD between arm k and baseline arm in trial i, and S i is the pooled standard deviation  However, in the presence of MCOD, we have no information on v ik . Under MAR, we may assume that the pooled standard deviation among the missing participants is equal to the pooled standard deviation among the completers. Then, we can use the pooled standard deviation among the completers as the pooled standard deviation for the randomised sample. However, by doing so, we do not acknowledge our uncertainty about the MAR assumption in the estimation of the pooled standard deviation. Instead, we may assign a gamma prior distribution on the pooled variance for the randomised sample with shape and scale parameters that are defined as follows where r i is the pooled standard deviation among the completers in trial i. 17 Ratio of (arithmetic) means For the RoM, the link function is the following is the RoM (arm k versus baseline arm) in the logarithmic scale in trial i. This effect measure is less prevalent in systematic reviews as compared to MD and SMD; 2 however, we have considered it for completeness.

Bayesian random-effects network meta-analysis model
For all the aforementioned effect measures, we assume d i;k1 to follow a normal distribution with mean l t ik ;t i1 and variance s 2 common for all observed pairwise comparisons to facilitate estimation of the parameter when there are comparisons with few trials. With t ik , we indicate the intervention in arm k of trial i. By considering a common s 2 , the correlation between pairs of random-effects in multi-arm trials is equal to 0.5. 18 We apply the consistency equation, which is a linear combination of comparisons with the reference intervention of the network (here, intervention A) to obtain the treatment effects for the remaining comparisons 19 where t ik , t i1 2 B; C; . . . ; T f g . We use the surface under the cumulative ranking curve (SUCRA) to order the interventions from the best to the worst. 20 For the proposed one-stage pattern-mixture models, we use the normal likelihood with known variance for each arm of every trial (example 5 in the Appendix of Dias et al. 21 ). For the location parameters u i and l t i ;t i1 , we consider a normal prior distribution with mean 0 and variance 10,000. We assign a half-normal prior distribution on s with mean 0 and variance 1. 16 Since empirical priors for s 2 have been proposed only for the SMD, we use this half-normal prior distribution on s for all three effect measures. Due to many investigated models and their parameters, we consider a pragmatic approach to assess for the convergence of all model parameters. We use the Gelman-Rubin convergence diagnostic,R, and we consider parameters withR > 1:1 to not have achieved convergence; then, the corresponding posterior distributions cannot be trusted. 22 For parameters with a lack of convergence, we planned to look at their trace plots and autocorrelation plots to understand the cause of non-convergence. All models were implemented in JAGS via the R-package R2jags (statistical software R, version 3.6.1). 23,24 The R-package ggplot2 was used to draw the figures in Section 4. 25 The functions related to this manuscript are publicly available at https://github.com/LoukiaSpin/One-stage-PM-NMA-model-Continuous-Outcomes.git.

Application of the models
We apply the proposed models to the networks of Stowe et al. 12 and Schwingshackl et al. 13 We focus on MD and SMD for being the most prevalent effect measures for synthesising a continuous outcome. We present the results on log RoM for the primary and sensitivity analyses in the Supporting Information (Figure S1 -S5; Tables S2, and S4). Figure 3 depicts the posterior mean and 95% credible intervals (CrI) of all models for the comparisons with the reference intervention (i.e. placeboþLD in Stowe et al., 12 and aerobic in Schwingshackl et al., 13 respectively), and the posterior median and 95% CrI of s 2 . In each plot, the vertical dashed lines refer to the point estimate and 95% CrI after exclusion of MCOD, which translates to the available case analysis (ACA). 26 According to the Gelman-Rubin convergence diagnostic, convergence was achieved for all model parameters under all three effects measures and all different structural assumptions for the u ik and x ik , becauseR < 1:1 (range: 1.001 -1.040 for l t ik ;A , s 2 , and SUCRA values; range: 1.001 -1.002 for u ik and x ik under all different structural assumptions in primary and sensitivity analyses).

Network with considerable MCOD (antiparkinsonian drugs)
The results obtained for MD and SMD analyses are very similar, following the same pattern, while on a different scale (Figure 3(A)). The posterior means for the identical and hierarchical structures are almost identical (after rounding to the second decimal) for the common-within-network, trial-specific and intervention-specific assumptions. The same conclusion can be drawn for the correlated and uncorrelated assumptions, whose estimates are found to be very similar to each other.
The intervention-specific assumption results in a slightly larger patient off-time reduction for dopamine agonist plus levodopa (DAþLD) versus placeboþLD as compared with the competing assumptions whose point estimates are identical or very similar to ACA (posterior mean of MD: À1.49 vs. À1.44). In this comparison, the results are almost interchangeable across the different assumptions because most of the included trials have low or moderate %MCOD (<10%) that is quite balanced in the compared arms as opposed to the other two comparisons.
The results for COMTIþLD versus placeboþLD are more variable across the assumptions, as expected because 6 out of 11 trials that were included suffer from considerable %MCOD that are unbalanced in the compared arms. All assumptions consistently lead to lower patient off-time reduction when compared with ACA, similarly in MD and SMD: the posterior mean is 14-30% lower than ACA across the assumptions. Among the different assumptions, the intervention-specific assumption leads to the lowest patient off-time reduction, whereas both assumptions under the independent structure yield the largest patient off-time reduction.
For the comparison of monoamine oxidase type B inhibitors plus levodopa (MAOBIþLD) versus placeboþLD, the competing assumptions yield almost interchangeable posterior means (MD ranging from À0.83 to À0.81, SMD from À0.37 to À0.36) that are close to ACA (MD: À0.83, SMD: À0.36), except for the common-within-network assumption that leads to a lower patient off-time reduction on average. Contrary to the other comparisons, the 95% CrIs are wider since two trials only inform this comparison, one of which has 35% MCOD ( Table 1).
All assumptions yield a lower posterior median of s 2 (MD: 0.03 to 0.06, SMD: 0.004 to 0.006) with narrower 95% CrI compared with ACA (MD: 0.1, SMD: 0.01) which indicates that all assumptions have explained a substantial part of the between-trial variance (MD: 40% to 70%, SMD: 40% to 60% compared to ACA). The intervention-specific assumption provides the lowest posterior median and narrowest 95% CrI of s 2 , followed by the common-within-network assumption. The posterior median of s 2 is similar in the trial-specific assumption and the independent structure. According to SUCRA values, placeboþLD and DAþLD are consistently the worst and the best interventions, respectively (Figure 4(a)). The hierarchy is uncertain for COMTIþLD and MAOBIþLD due to overlapping 95% CrI across all models, which is attributed to considerable %MCOD in the corresponding trials.
By increasing the variance of the prior distribution for IMDoM at 3 2 , and therefore our uncertainty on our belief about the MAR assumption, the results follow the same pattern with the primary analysis (only results on MD are shown), but the 95% CrIs are now wider overall ( Figure S4A in the Supporting Information).

Network with moderate MCOD (different training modalities)
In line with the previous example, MD and SMD provide very similar results in terms of pattern, but on a different scale (Figure 3(B)). The point estimates are almost identical under the hierarchical and identical structures for the common-within-network, trial-specific and intervention-specific assumptions, and similar for the correlated and uncorrelated assumptions under the independent structure. The intervention-specific assumption leads to a slightly large reduction in HbA1c in favour of resistance, but the evidence is weak (the 95% Crl includes zero). All other assumptions give similar results with each other and with ACA strongly favouring resistance. Furthermore, under the intervention-specific assumption, the comparison of combined training versus aerobic results in lowering the reduction in HbA1c by approximately 37% compared with the competing assumptions. In contrast, all other assumptions result in similar findings with each other and with ACA but slightly more precise than ACA. Except for two trials with considerable %MCOD, the low to moderate %MCOD across and within the trials may explain the low variability of the results across the assumptions for each comparison with aerobic.
All assumptions lead to a lower posterior median and wider 95% CrI of s 2 (MD: 0.03-0.05, SMD: 0.05-0.16) as compared to ACA (MD: 0.06, SMD: 0.18). Identical structure yields a larger posterior median and wider 95% CrI of s 2 for each assumption as compared to the hierarchical structure. In both MD and SMD, the interventionspecific assumption under the hierarchical structure yields the lowest posterior median and most precise 95% CrI of s 2 .
According to SUCRA values, combined training and resistance are consistently the best and the worst interventions across all assumptions, respectively, except for the intervention-specific assumption that leads to overlapping 95% CrIs (Figure 4(b)). The intervention-specific assumption yields wider CrI for comparisons with aerobic, which can explain the wide 95% CrI for SUCRA as well.
In line with the previous network, increasing the prior variance of the prior distribution of IMDoM at 3 2 leads to results of the same pattern with the primary analysis (only results on MD are shown), but with less precise estimates overall ( Figure S4A in the Supporting Information).

Learning about the missingness mechanism
The posterior distribution of IMDoM under the common-within-network and intervention-specific assumptions is given in Table 4 for both network examples. A posterior mean away from zero is an indication that the MAR assumption may not be plausible; that is, the missingness process may be informative. Similarly, a 95% CrI excluding zero and protruding from the interval of the prior distribution of IMDoM is a strong indication of informative missingness; otherwise, the data provide little information to conclude for or against the MAR assumption. For the remaining structural assumptions of IMDoM, the posterior distribution for the most 'problematic' studies (i.e. studies with a considerable amount of MCOD) is shown in Figure 5 for the network of Stowe Assumptions about the structure of IMDoM (u ik ) Common-within-network Intervention-specific et al. 12 The vertical lines in each plot refer to the prior mean (middle grey line) and 95% prior interval (both sides dashed lines) under the MAR assumption on average.

Network with considerable MCOD (antiparkinsonian drugs)
The results under the common-within-network assumption suggest likely informative missingness as point estimates are positive (hierarchical: 0.99, identical: 1.02) and their corresponding CrIs do not include zero. However, the 95% CrIs do not protrude the upper bound of the prior interval ( Table 4). The same conclusions are drawn when we assume a larger prior variance of IMDoM; though the posterior mean is larger (hierarchical: 1.17, identical: 1.20) and the 95% CrI are wider when compared to the primary analysis, as expected (Table S3). However, the common-within-network assumption does not reveal the source(s) of such informative missingness. On the contrary, the intervention-specific assumption indicates that for all interventions, except COMTIþLD, the data provide little information to conclude in favour of or against the MAR assumption as the 95% CrIs of the corresponding posterior distributions include zero (Table 4). Only the posterior mean and 95% CrI for COMTIþLD are far from zero suggesting informative missingness. The conclusions do not change when we assume a larger prior variance of IMDoM (Table S3). This is not surprising because COMTIþLD has attracted the highest %MCOD in the network (Table 1). COMTIþLD may also be responsible for pulling the posterior distribution of IMDoM away from zero under the common-within-network assumption. Conclusions are the same in both structures, though the identical structure provides more precise results than the hierarchical structure.
The trial-specific assumption further indicates that in trials with total %MCOD above 20% and severe imbalance in the compared arms (i.e. the trials that compare COMTIþLD with placeboþLD; Table 1), missing participants may have a smaller patient off-time reduction on average than completers, though the indication is weak ( Figure 5). For these trials, the assumption of within-trial correlated IMDoMs provides further insights on the missingness process as it reveals that this behaviour is more profound in the active arm (i.e. COMTIþLD) than in the placebo arm ( Figure 5). However, the assumption of within-trial uncorrelated IMDoMs indicates that only missing participants receiving COMTIþLD in these trials appear to have a smaller patient off-time reduction on average than completers ( Figure 5). It is evident that by using different assumptions about the missingness parameter, we gain a different level of knowledge regarding the missingness mechanisms in the network.

Network with moderate MCOD (different training modalities)
The data provide little information to conclude for or against the MAR assumption using the common-withinnetwork assumption given the wide CrIs of both hierarchical and identical structures that include zero (Table 4). This finding is shared in the results under the intervention-specific and trial-specific assumptions where for all treatments and trials, the intervals of the posterior distribution of IMDoM are considerably wide and include zero (last three lines of Table 4; Figure 6). The conclusions are similar for both assumptions under the independent structure ( Figure 6). Contrary to the network of antiparkinsonian drugs, the trials of this network suffer mildly from MCOD, which are balanced in the compared arms. Therefore, there is not enough information to conclude for or against the MAR assumption using different assumptions for the missingness parameter.

Discussion
We have proposed a one-stage pattern-mixture model approach under the Bayesian framework that accounts for MCOD from all trials in a single step while allowing the observed data to contribute to the estimation of the missingness parameters to learn about the missingness mechanism. The hierarchical structure of the proposed models facilitates the incorporation of various prior structures and assumptions about the missingness parameter to investigate the implications of MCOD on the conclusions. These features make the proposed model approach particularly attractive to handle aggregate MCOD properly. On the contrary, the two-stage approach does not offer enough flexibility in the analysis of MCOD as it requires strong assumptions about the estimated within-trial variance (considered known) and the missingness parameter (considered independent of the amount of MOD). 9 Due to variation in the amount of MCOD within and across the trials, especially, in the network of Stowe et al. 12 (Table 1), we consider the independent structure for the missingness parameters to be the most plausible, and the common-within-network assumption to be the least plausible in our motivating examples. The intervention-specific assumption is particularly relevant if one is interested to learn about the missingness Figure 6. Interval plots of the posterior distribution of IMDoM using MD in the network of Schwingshackl et al. 13 The one-stage pattern-mixture model under the hierarchical and identical structure assuming trial-specific IMDoMs and under the independent structure assuming within-trial correlated and uncorrelated IMDoMs. The vertical lines indicate the prior distribution for IMDoM. CrI: credible interval; IMDoM: informative missingness difference of means. mechanism in each or specific intervention(s). Common-within-network is a strong and perhaps the least realistic assumption. In a typical network with different trial-designs (e.g. placebo-controlled and active-controlled) and different interventions (e.g. placebo, active and old interventions), the amount of and reasons for MOD may vary across all arms and trials making the common-within-network assumption implausible. The trial-specific assumption is plausible when trials with different characteristics (in terms of design and conduct) are associated with different mechanisms of MOD. [27][28][29] For further discussion on the situations to consider the different prior structures and assumptions for the missingness parameter (Table 3), the interested readers can refer to the literature. 11,14,29 There is already a simulation study on the comparison of different one-stage pattern-mixture models for aggregate binary MOD. 29 This study considered the identical and hierarchical structures for the commonwithin-network, intervention-specific and trial-specific assumptions for the missingness parameter. In the present study on continuous outcomes, we observed the same behaviour of the proposed methods with the one from the simulation study. For instance, the intervention-specific prior structure led to larger posterior standard deviation of the treatment effects (similarly for the hierarchical and identical assumption) as compared to the other structures, especially, for a large amount of MCOD in the network (Figure 3). We expect the one-stage pattern-mixture model for continuous MOD to behave similarly to the one-stage pattern-mixture model for binary MOD for the different structural assumptions of the missingness parameter.
Furthermore, modelling MCOD appeared to have explained part of the between-trial variance in both networks, and particularly, under the intervention-specific assumption which yielded the smallest s 2 . The benefits of modelling rather than excluding (or imputing) MOD in the estimation of s 2 have been already mentioned. 9,14,30 Modelling MOD offers a trade-off between inflation in the within-trial variance and reduction in s 2 which can be considerable when the amount of MOD is substantial. The opposite holds when MODs are excluded or imputed.
We illustrated the proposed one-stage pattern-mixture model approach using three different effect measures for the continuous outcome. In line with Mavridis et al., 9 we apply MD and SMD in conjunction with the IMDoM, and log RoM together with the log IMRoM, because they are intuitively related. To select among these three effect measures for a specific outcome, the researcher should consider the trade-off among the ease of interpretation, statistical properties (e.g. low variability of the treatment effect across the trials 31 ) and the goodness of fit. According to the posterior mean of residual deviance, we found that using the log RoM, none of the models fit the data adequately for the network of Stowe et al. 12 (Table S5 in the Supporting Information). In the network of Cipriani et al., 32 none of the models fit the data adequately for MD and SMD (Table S5 in the Supporting  Information). For a discussion on the statistical properties and performance of these effect measures in the synthesis of trials, the readers should refer to the relevant literature. [33][34][35] The proposed one-stage pattern-mixture model approach can be particularly useful in living systematic reviews, 36 where learning about the missingness process via the estimated missingness parameters can inform the design of a future randomised trial. For instance, assuming that Stowe et al. 12 was a living systematic review, we have learned that trials comparing COMTIþLD with placeboþLD have the most participant losses (Table 1) and missing participants randomised in COMTIþLD tend to have smaller patient off-time reduction as compared to completers in that arm (according to the intervention-specific assumption and the independent structure). Moreover, scrutiny of the COMTIþLD versus placeboþLD trials in the network using the Cochrane Risk of Bias tool will further shed light on why participant losses are almost negligible in some trials ( < 4%) but considerable in others ( < 30%). All this information will allow the trialists of a future COMTIþLD versus placeboþLD trial to undertake proactive plans to increase retention of the participants in the COMTIþLD arm. 37 We were able to extract MOD in each arm of every trial in five networks only (out of the 92 networks with a continuous primary outcome). Therefore, an empirical study using this dataset would not offer sufficient evidence to compare the one-stage with the two-stage pattern-mixture model empirically. It is particularly challenging to collect a sufficient number of networks (or meta-analyses) to conduct an empirical study concerning MOD as the ability to extract MOD, and the quality of the extraction strongly depends on the reporting quality of the systematic reviews under investigation. 38 However, a recent simulation study investigated the performance of modelling the exact distribution (one-stage approach) versus the approximately normal distribution (two-stage approach) of aggregate binary data in the presence of MOD in a triangle of two-arm trials (submitted). Both approaches showed substantial bias overall in the relative treatment effects, especially, for comparisons with the non-reference interventions of the network, when the amount of MOD was considerable. However, the one-stage approach is both conceptually and statistically more appropriate than the two-stage approach when the approximate normality assumption cannot be defended; for instance, the outcome is skewed and/or trials have small size. 39 Since the proposed one-stage pattern-mixture model approach is based on the normal distribution (as the exact likelihood), we would expect some bias in the treatment effects when the synthesis dataset is dominated by small trials with a skewed outcome. In these situations, we recommend that until a more competent one-stage model is developed, the researchers apply our proposed one-stage PM approach to handle MCOD, and they fully acknowledge the limitations of this approach when they discuss the results.

Conclusions
Similar to binary MOD, a proper analysis of MCOD should be discussed thoroughly already in the protocol phase of the systematic review. The analysis plan should comprise the description of the one-stage model with respect to the proper assumptions and prior structure of the missingness parameter. For this purpose, prior knowledge (or expectations) of the missingness process that aligns with the interventions and the design of the trials on the condition of interest is necessary. For instance, inpatients randomised to placebo are more likely to leave the trial early for not experiencing immediate improvement in their schizophrenia symptoms as compared to participants randomised to antipsychotics. 27 In the absence of such knowledge, we recommend that the researchers consider the intervention-specific assumption alongside the hierarchical structure under the MAR assumption (with liberal uncertainty about that belief) in the primary analysis and investigate the robustness of the results under the independent structure while increasing the variance of the prior distribution of the missingness parameter, as a sensitivity analysis. In a network with few comparisons, which are informed by a handful of trials, the identical structure may be advantageous to the independent structure for estimating comparatively fewer missingness parameters. Then, the hierarchical structure alongside the intervention-specific assumption may be considered as a sensitivity analysis. The proposed models can also be applied in a pairwise meta-analysis straightforward to benefit from the variety of structural assumptions about the missingness parameter.