Multilevel and Quasi Monte Carlo Methods for the Calculation of the Expected Value of Partial Perfect Information

The expected value of partial perfect information (EVPPI) provides an upper bound on the value of collecting further evidence on a set of inputs to a cost-effectiveness decision model. Standard Monte Carlo estimation of EVPPI is computationally expensive as it requires nested simulation. Alternatives based on regression approximations to the model have been developed but are not practicable when the number of uncertain parameters of interest is large and when parameter estimates are highly correlated. The error associated with the regression approximation is difficult to determine, while MC allows the bias and precision to be controlled. In this article, we explore the potential of quasi Monte Carlo (QMC) and multilevel Monte Carlo (MLMC) estimation to reduce the computational cost of estimating EVPPI by reducing the variance compared with MC while preserving accuracy. We also develop methods to apply QMC and MLMC to EVPPI, addressing particular challenges that arise where Markov chain Monte Carlo (MCMC) has been used to estimate input parameter distributions. We illustrate the methods using 2 examples: a simplified decision tree model for treatments for depression and a complex Markov model for treatments to prevent stroke in atrial fibrillation, both of which use MCMC inputs. We compare the performance of QMC and MLMC with MC and the approximation techniques of generalized additive model (GAM) regression, Gaussian process (GP) regression, and integrated nested Laplace approximations (INLA-GP). We found QMC and MLMC to offer substantial computational savings when parameter sets are large and correlated and when the EVPPI is large. We also found that GP and INLA-GP were biased in those situations, whereas GAM cannot estimate EVPPI for large parameter sets.

1. Run a small pilot sample, for example of size N l = 1000 for each and estimate the expectation and variance at the first three levels = 0, 1, 2 (i.e.ê 0 ,d 1 ,d 2 ). At = 1, for example, the expectation would bed 1 (N 1 ) = 1 2. Use regression on the three points to estimate C 1 (and α if not equal to 1) so that the level expectation of bias reduction is approximately C 1 2 −α 3. Use regression on the three points to estimate C 2 (and β if not equal to 1) so that the level variance is approximately C 2 2 −β . The variance of the MLMC estimator is therefore L 0 C 2 2 −β /N l 4. Choose level L large enough to control the bias of the final EVPPI estimator, which is C 1 2 −αL ≤ ε/2 5. Choose N for each level to minimise the total computational cost ( L 0 N l 2 ) while achieving RMSE ε, which requires knowledge of the constants C 2 and β. This minimisation can be done by constrained optimisation or by trial and error.
The details of these calculations are given in [20,21] and illustrated in the supplementary code in section 7.5 of this paper.
Two additional refinements of the MLMC method are explained in the appendix 7.1.2 and 7.1.3. The first is a method of arranging the random samples of X and Y to ensure a higher correlation between successive estimatorsê −1 (N ),ê (N ) for which it was shown β = 1.5 [21,41]. The second is an alternative method of constructing the series for MLMC estimator, and is preferable when the EVPPI is large compared with the EVPI.

Further details of the antithetic variable approach to the level estimator
In this appendix we discuss how to enable e (n) −1 and e (n) have a higher correlation by utilising all the inner samples. For e (n) we need to generate 2 samples of Y (n,m) conditional on X (n) . That is For e (n) −1 , we use the same X (n) and separate the 2 samples of Y (n,m) into two sets, both of which have 2 −1 samples, to construct two samples and get the average for the first term and use all the samples for the second term. That is Then the difference estimator d (n) becomes Note now that the second term of e (n) −1 and e (n) cancels. If all three terms have the same optimal choice d * , then d (n) becomes 0. Theorem 3 in [23] shows that the variance of the new

MLMC for DIFF
As illustrated in Figures 4(a) and 4(b), it is possible for the EVPPI to be large or small compared with EVPI, respectively. If The EVPI value is known, and it is known that the EVPPI is close in value to the EVPI, it will require fewer samples to estimate DIFF=EVPI-EVPPI as its smaller magnitude usually leads to a smaller variance. This appendix describes how to construct an estimator for DIFF.
and the standard nested MC estimator is For MLMC, we find that DIFF 0 = 0 and the corresponding MLMC estimator of DIFF is where d (n) is the same as (7.1) since the first term of the DIFF (n) and DIFF (n) −1 are cancelled out similar to the previous subsection. Compared with (2.7), we get rid ofê 0 (N 0 ). Therefore, from a practical point of view, it is always easier to estimate a smaller quantity since the variance would be lower. For larger EVPPI as shown in Figure 4(a), it is more reasonable to estimate DIFF instead of estimating EVPPI directly.  CBT and antidepressants are cheaper than no treatment but antidepressants are the cheapest. CBT and antidepressants confer greater benefits (QALYs) but antidepressants are most effective. Incremental net benefit is highest with antidepressants at a willingness-to-pay threshold (λ) of £20,000. The CEAC indicates that antidepressants have a high probability (> 75%) of being most cost-effective at all realistic willingness-to-pay thresholds.

Details of the 20 treatment option depression model.
In order to explore the complexity that may be encountered in a real cost-effectiveness model, we extended our example from 3 to 20 treatment options. The model structure, outcome costs, and outcome QALYs for each treatment remained the same but treatment effects and costs were randomly generated. However, log odds of relapse and recovery were sampled from a Normal distribution with mean 0 and standard deviation 1 and drug costs were either £500, £1000, £2000, or £3000 with simple rules ensuring treatments with high recovery and low relapse and most expensive, and vice versa ( Table 6). The NMA was also extended to 50 trials of up to 5 arms comparing a random selection of the 20 treatments; baseline log odds response and recovery were sampled from Normal with mean 0 and standard deviation 0.25, while numbers recovering or relapsing were sampled from binomials with the appropriate probabilities ( Figure 6(a) and Figure 6(b) present network plots for this constructed data).
The cost-effectiveness results are presented in Table 7, with CEACs presented in Figure 7. Treatment 10, which had a high log odds ratio of both recovery (1.85) and relapse (0.81) Figure 5: Cost-effectiveness acceptability curve. Each line represents the probability that the corresponding treatment has highest net benefit at each willingness-to-pay threshold. and a high cost (£2000), had the highest expected incremental net benefit (£14590.49 with 95% Credible Interval (-8706.228, 46266.03)) and greatest probability (> 60%) of being most cost-effective at all willingness-to-pay thresholds.   Each node represents a treatment being compared. An edge represents a trial comparing the treatments represented by the two connected nodes. If a path can be found between two nodes, the corresponding treatments can be compared by network meta-analysis. Figure 7: Cost-effectiveness acceptability curves (for any treatment with a probability of being most cost-effective greater than 5% at some threshold) for the 20-treatment depression model.

EVPPI results of the 20 treatment depression model
For comparison, we set the MSE to be 0.25 (ε = 0.5).

Quantile-quantile diagnostic plots for BCEA INLA-GP EVPPI estimation
The qq-plots below were produced using the BCEA function diag.evppi. In the previous decision tree example, the NB function was simple and quick to evaluate. In this section we present an example where the net benefit function is more complex, so that generating a sample of NBs is more computationally expensive. Thus we might expect that standard nested MC methods require unrealistic sample sizes for a required accuracy. The MCMC sampler required more simulations to characterise the posterior distribution, since a statistical model of higher dimension was used, and the sampled chains were more strongly autocorrelated. This example is a model recently used to compare the cost-effectiveness of DOACs for prevention of stroke in atrial fibrillation, the full details of which have been published [18,17]. In brief, this was a Markov model with 17 health states, including death, a 3-month cycle length, lifetime horizon, and taking the UK National Health Service perspective. It compared first-line treatment by coumarin with international normalised ratio 2-3 to four DOACs: apixaban 5mg twice daily (BD), dabigatran 150mg BD, rivaroxaban 20mg once daily (OD), and edoxaban 60mg OD in 70 year old adults. Treatment switching from DOAC to coumarin or no treatment or from coumarin to no treatment was modelled. This model included four events with long-term impacts on costs, QALYs, and event risks: ischemic stroke, myocardial infarction (MI), major extracranial bleed, and intracranial haemorrhage (ICH). Health states were defined by which of these states a patient had experienced, giving the model structure illustrated in Figure 15. In addition, transient ischemic attack (TIA) and systemic embolism (SE) were modelled as transient events with no long-term consequences. Relative effects were informed by a systematic literature review of RCTs comparing DOACs and coumarin. Log hazard ratios relative to coumarin on the seven possible events were estimated using a Bayesian competing risks NMA conducted in OpenBUGS. Log hazard ratios for no treatment compared to coumarin came from a NMA of previously identified RCTs [42]. Log hazards of events on coumarin were estimated using a Bayesian natural history model fit to the coumarin arms of the RCTs [43], thus providing absolute probabilities of events on all possible treatments. MCMC samples from the posterior distributions of these three Bayesian analyses were input directly to the cost-effectiveness model. Costs, utilities, and impact of event history on current risks were estimated using closed form distributions fit to data identified in targeted literature reviews. Full details of all parameters and model assumptions are provided in previous publications [18,17]. In addition to the EVPPI for subsets corresponding to categories of the model parameters (e.g. those generated by MCMC, or treatment switching probabilities), we would like to investigate the potential value of conducing a trial comparing apixaban and dabigatran (the two best DOACs) with a coumarin control arm. We consider two such designs. First is a simple trial that records only the log hazard ratios of apixaban and dabigatran compared to coumarin. Second is a complex trial that records these hazard ratios but also gathers evidence on the absolute probabilities on coumarin, all event and management costs, all event disutilities, all state utilities, and the treatment switching probabilities.

MLMC and QMC results for real, complex health economic model in atrial fibrillation
In summary, the model parameters include 32 where uncertainty is represented by normal random variables, 8 by uniform random variables, 15 by beta distributed random variables, and 42 by 3 sets of MCMC samples (7-dimensional MCMC samples for baseline log hazard ratios, 28-dimensional MCMC samples for log hazard ratios relative to the baseline, and 7-dimensional MCMC samples for log hazard ratios of no treatment). Table 9 shows the EVPPI estimates computed using the standard MC method, and the relative computational cost required by MLMC and QMC to produce estimates with the same degree of accuracy. MLMC is much more efficient than standard MC when EVPPI is large. In cases where small sample exploration indicated a small or zero EVPPI, we only use standard MC. QMC also works better than standard MC for all the EVPPIs.
Comparing MLMC and QMC, we observe that MLMC performs better for the cases where the parameters are correlated (Simple & Complex Trials), and the QMC works relatively better for the other cases.

Comparison with regression approximation methods on real, complex health economic model in atrial fibrillation
As in section 4.2, we compared our estimates with those provided by GP and INLA-GP, as implemented in SAVI and BCEA, respectively. We conducted comparisons on the large parameter sets where MLMC was necessary. In this case GAM could not be applied as there are too many parameters to identify a regression model without making excessively strong assumptions (such as additivity and linearity) about the relationship between the input parameters and the conditional expected net benefit. Again to compare the relative accuracy of the different methods given the same computational cost, we restricted MC, QMC and MLMC to 50,000 samples and GP and INLA-GP to 7500. Results are presented in Table 10.
This application was more challenging for all methods. The INLA-GP could not estimate for the complex trial. The GP results were severely biased upwards compared to MC, probably due to the poor high-dimensional regression approximation. In addition, they lack face validity as the parameters included in the Loghr MCMC case are a subset of those in the all MCMC case, so the loghr MCMC case should not have a higher EVPPI. The INLA-GP estimates also disagreed with the reference results for all methods, with the possible exception of the MCMC set. However, a poor fit of the underlying regression model was only identified by diagnostic plots in the All MCMC case. The MLMC and QMC methods using only 50,000 samples gave estimates close to the MC reference value, and had RMSEs lower than standard MC.
Again testing testing QMC for estimating total EVPI, we found the EVPI and standard deviation were 416.79 and 5.38 using standard MC and 418.12 and 2.41 using QMC. As in the depression example, this represented reduced bias and variance.

Reference to GitHub repository with R code
The code discussed in Section 7.5.1 is provided in a supplementary zip file but is also available at https://github.com/Bogdasayen/mlmc_evppi_demo The depression toy, AF model and MLMC/QMC algorithm code are provided in the GitHub repository located at https://github.com/Bogdasayen/Example_MLMC_and_QMC_for_EVPPI For these models, we briefly point to relevant files below but refer to the repository for complete details.

MLMC demo
We provide the source code mlmc code demo 4.rmd along with mlmc.R and mlmc.test.R in a supplemental file. These are also available at https://github.com/Bogdasayen/mlmc_ evppi_demo. As noted in the acknowledgements, the mlmc.R and mlmc.test.R code was developed by Mike Giles, Louis Aslett, and Tigran Nagapetyan [39]. This code has been used previously for published MLMC applications [40].

MLMC code for depression toy and AF model
The R implementation of the MLMC algorithm is available in the MLMC package. The main task for a user is to construct the EVPPI level estimator (7.1) or DIFF level estimator (??). The depression toy model described in Section 3.1 is provided in the repository Bogdasayen/ Example_MLMC_and_QMC_for_EVPPI/ToyModel-MLMC The "ReadMeFirst.pdf" file includes a full description of the code.
In addition the DOAC MLMC code is provided in the repository Bogdasayen/Example_ MLMC_and_QMC_for_EVPPI/DOAC-MLMC The model description and summary of the numerical results are in NOACS_math.pdf To run the MLMC code, please 1. Run NOAC.AF.model.main.3.R to load necessary files and data.
2. Run MLMC_Core.R to do a simple test locally.
3. Run Parellel_Core.R to run the code using parallel computing.

QMC code for depression toy model
To implement QMC estimate, we start from generating Sobal sequences for uniform distributed random variables U . Then, for a random variable X with cumulative distribution function (CDF) Φ, we get X by X = Φ −1 (U ). To calculate EVPPIs, we only need the inverse CDFs for normal distribution qnorm.R and β distrubution, which is implemented in sfunc.R 1 .
QMC for the depression toy model is provided in the repository Bogdasayen/Example_MLMC_ and_QMC_for_EVPPI/ToyModel_QMC/ The "ReadMeFirst.pdf" file includes a full description of the code.
In addition, the DOAC QMC code is in the repository Bogdasayen/Example_MLMC_and_QMC_ for_EVPPI/DOAC_QMC/ Note: The only code needed for user to run is NOAC_QMC_main.R To run the QMC code, please As discussed in the introduction, cost-effectiveness decision models often rely on Bayesian evidence synthesis to inform parameters so that the joint distribution for some parameters may be represented by samples from an MCMC simulation. Model inputs based on MCMC samples cause three problems for MLMC and QMC estimation. Firstly, if the parameters estimated by MCMC are highly correlated then the number of MCMC samples needed for reliable inference may be large, due to inefficiencies with the MCMC sampler. We address this issue for both MLMC and QMC in Section 7.6.1. Secondly, because MCMC only provides samples and not a closed form representation of the joint distribution, we cannot form the inverse distribution which is needed for QMC. We address this issue Section in 7.6.1. Thirdly, because MCMC does not provide a closed form posterior distribution, we cannot sample from conditional distributions, which are needed for EVPPI estimation whether performed by standard nested MC, MLMC, or QMC. We address this by using multivariate approximations to the posterior distribution is described in Section 7.6.2.

Random selection of MCMC samples for MLMC and QMC
To reduce the computational expense of generating MCMC samples, we first generate a sufficiently large number N c of MCMC samples in advance to store in a data set, and randomly select samples from it. Our motivation is that we can then use uniformly distributed random variables for sampling, rather than needing further expensive MCMC. For MLMC, when constructing the level bias estimatord (N ) we generate 2 uniform random samples U (n) in [0, 1] and choose corresponding MCMC samples with index N c U (n) (the largest integer smaller than or equal to N c U (n) ). This ensure the same samples in the level and − 1 estimators.
For QMC, the idea is to sort the stored MCMC samples first, and QMC points denote the corresponding position in the sorted sequence. This has the added advantage that it removes the need for an inverse of the posterior distribution. For high dimensional problems, we use an efficient approach to sample a uniform distribution on a large empirical dataset. Given a large enough MCMC sample, we further select the samples we will use in the calculation from the MCMC sample pool. We apply QMC on the random selection. For details on how exactly to apply QMC on random selection, please refer to [44]. This algorithm uses samples from the Uniform distribution to repeatedly bisect the MCMC sample space until only a single sample is found (technical name is recursive spectral bisection algorithm for graph partitioning [45]). The order and elements of X on which to sort is determined by the first two dimensions from principal components analysis (PCA). Using only the first two elements was found to be sufficient for the applications in Section 3, and captured the majority of the impact of the variation of our parameters on the net benefit, and thus the impact on EVPPI. However, if many elements of X had high weights, indicating that the PCA cannot identify two significant dimensions, this approach would be not suitable.

Multivariate approximation of MCMC to obtain conditional distributions
For EVPPI estimation, we need to sample from the distribution of Y conditionally on a specific value of X (n) . However this is challenging when X and Y are correlated and their joint distribution is only known through a MCMC sample. In theory, we simply need to set the value of X (n) to be fixed and rerun the MCMC simulation to get Y (n,m) . However, this is highly inefficient, since a separate burn-in is required for each individual sample of X (n) .
To solve this problem, we compute the mean µ and covariance matrix Σ of the MCMC samples Z, and use the multivariate normal distribution (MVN) N (µ, Σ) to approximate the posterior distribution. Then the distribution of Y (n,m) conditional on X (n) is known to be another MVN [46]. The benefit is that the random variable we need to generate is Normally distributed and we can apply our MLMC and QMC formulations.
Note that, the MVN approximation is only suitable for parameters whose joint posterior is a symmetric unbounded distribution. Parameters such as hazard ratios or odds ratios will need to be log transformed to achieve this, as we do in the examples in the following sections. Heavy tailed distributions may be better represented by some other closed form multivariate distribution, such as multivariate t-distribution. However, our approach to MCMC will only work if a transformation to some closed form multivariate distribution (with closed form conditional distributions) is possible.

MVN vs MVT
The conventional version of the p-dimensional multivariate t (MVT) distribution X ∼ t p (µ, Σ, ν), with mean µ, scale matrix Σ (generally not the covariance of X), and degrees of freedom ν (which determine the thickness of the tail), has the probability density function f (x) = Γ{(ν + p)/2} Γ(ν/2)(νπ) p/2 |Σ| 1/2 {1 + ν −1 (x − µ) T Σ −1 (x − µ)} −(ν+p)/2 . (7. 2) The conditional distribution of the multivariate t distribution also follows the multivariate t distribution. Assuming X = (X 1 , X 2 ) p 1 ,p 2 , we can define and then the conditional distribution of X 2 given X 1 is Compared with the multivariate normal distribution, the MVT can capture the feature of the thick tail and may give better approximation of the covariance structure of the MCMC samples. Here is the comparison results between the two distributions of the EVPPI for DOAC simple trial using standard MC with 1024 outer samples and 128 inner samples:  from which we can see that the credible interval of MVT is consistent with the results when using multivariate normal distribution.