Methods for Population-Adjusted Indirect Comparisons in Health Technology Appraisal

Standard methods for indirect comparisons and network meta-analysis are based on aggregate data, with the key assumption that there is no difference between the trials in the distribution of effect-modifying variables. Methods which relax this assumption are becoming increasingly common for submissions to reimbursement agencies, such as the National Institute for Health and Care Excellence (NICE). These methods use individual patient data from a subset of trials to form population-adjusted indirect comparisons between treatments, in a specific target population. Recently proposed population adjustment methods include the Matching-Adjusted Indirect Comparison (MAIC) and the Simulated Treatment Comparison (STC). Despite increasing popularity, MAIC and STC remain largely untested. Furthermore, there is a lack of clarity about exactly how and when they should be applied in practice, and even whether the results are relevant to the decision problem. There is therefore a real and present risk that the assumptions being made in one submission to a reimbursement agency are fundamentally different to—or even incompatible with—the assumptions being made in another for the same indication. We describe the assumptions required for population-adjusted indirect comparisons, and demonstrate how these may be used to generate comparisons in any given target population. We distinguish between anchored and unanchored comparisons according to whether a common comparator arm is used or not. Unanchored comparisons make much stronger assumptions, which are widely regarded as infeasible. We provide recommendations on how and when population adjustment methods should be used, and the supporting analyses that are required to provide statistically valid, clinically meaningful, transparent and consistent results for the purposes of health technology appraisal. Simulation studies are needed to examine the properties of population adjustment methods and their robustness to breakdown of assumptions.


# Drop the yprob column
Tabulate the AB trial, to check that our "randomisation" has worked, and examine the generated outcomes.

# Drop the yprob column
Tabulate the AC trial, to check that our "randomisation" has worked, and examine the generated outcomes.

MAIC
We are now ready to proceed with our analyses. First, we will estimate the population-adjusted indirect comparison using MAIC. This involves estimating a logistic propensity score model, which includes all effect modifiers but no prognostic variables. This is equivalent to the following model on the log of the individual weights: The weights are estimated using the method of moments to match the effect modifier distributions between the AB and AC trials. This is equivalent to minimising whenX EM (AC) = 0. In order to do this, we define the objective function to minimise (as above), and the gradient function (its derivative) which will be used by the minimisation algorithm.

To satisfyX EM
(AC) = 0, we create centred versions of the effect modifiers by subtractingX EM (AC) from X EM in both trials. Here only age is an effect modifier, and we can balance this in both mean and standard deviation as we have observed age.mean and age.sd in the AC trial. We therefore include centred versions of age and ageˆ2 for each individual in the AB trial in the weighting model. Centring the mean is simple, but centring higher moments requires some attention: due to aggregation, we cannot simply centre ageˆ2 from the AB trial with age.sd from the AC trial. We use the variance formula var(X) = E(X 2 ) − E(X) 2 , and centre ageˆ2 in the AB trial with age.meanˆ2 + age.sdˆ2 in the AC trial. Here we make use of the sweep function to simultaneously centre the two row vectors age and ageˆ2 by subtracting age.mean and age.meanˆ2 + age.sdˆ2 respectively: X.EM.0 <sweep(with(AB.IPD, cbind(age, age^2)), 2, with(AC.AgD, c(age.mean, age.mean^2 + age.sd^2)), -) To estimate α 1 , we use the function optim to minimise the function objfn. The method we tell optim to use is BFGS (after Broyden, Fletcher, Goldfarb and Shanno), which makes use of the gradient function gradfn that we specified to aid minimisation. We have to specify an initial value in the par argument (we choose c(0,0) here), and X = X.EM.0 is passed to objfn and gradfn as an additional argument.
print(opt1 <optim(par = c(0,0), fn = objfn, gr = gradfn, X = X.EM.0, method = "BFGS")) The output generated simply states that convergence has occurred successfully ($convergence = 0). The estimateα 1 is found in $par. (The other outputs are $value, the value of objfn at the minimum, $counts, the number of evaluations of objfn and gradfn before convergence, and $message, for any additional information from the minimisation algorithm.) The estimated weights for each individual are then found byŵ it = exp X EM itα 1 . We do not need to estimate α 0 , as this constant cancels out.
It is easier to examine the distribution of the weights by scaling them, so that the rescaled weights are relative to the original unit weights of each individual; in other words, a rescaled weight > 1 means that an individual carries more weight in the reweighted population than in the AB population, and a rescaled weight < 1 means that an individual carries less weight. The rescaled weight is calculated as Rescaled weight (multiple of original unit weight) count The mean of the rescaled weights is not informative, as it is guaranteed to be 1: Here, the rescaled weights range from 0 to 3.44, and the median is heavily skewed towards zero (0.07). A histogram of the weights is also very helpful to present, and clearly shows that a large number of individuals have been given zero (or close to zero) weight. This is not surprising, as the age range in the AB trial (45 to 75) is much wider than that in the AC trial (45 to 55). There are therefore a large number of individuals from the AB trial who have been excluded. More positively, there are no very large weights, as the distribution of effect modifiers in the AC population is entirely contained within that of the AB population (there are no ages in the AC population outside of those observed in the AB population).
The approximate effective sample size is calculated as

## [1] 185.6451
This is quite a reduction from the original 500, but is still reasonably large. (The actual ESS is likely to be larger than this, as the weights are not fixed and known.) Note that age is balanced (in terms of mean and SD) with the AC population after weighting. The estimated relative effectd AB(AC) of B vs. A in the AC population is found by taking weighted means of the outcomes in the AB trial. However, in practice it is easier to generate these estimates using a simple linear model: this is exactly equivalent to taking the weighted means, but allows us to use the sandwich package to calculate standard errors correctly using a sandwich estimator. (Note that it is possible to generate estimates of absolute outcomes on each treatment using the weighted means or the linear model, but these will be biased unless all prognostic variables and effect modifiers in imbalance between the populations are accounted for. Unbiased prediction of absolute outcomes relies on the much stronger assumption of conditional constancy of absolute effects.) # Binomial GLM fit1 <-AB.IPD %>% mutate(y0 = 1 -y, wt = wt) %>% glm(cbind(y,y0)~trt, data = ., family = binomial, weights = wt) # Sandwich estimator of variance matrix V.sw <-vcovHC(fit1)

STC
Alternatively, we can use STC to make the indirect comparison, which involves creating an outcome regression model. We fit a regression model in the AB trial population, and use this to predict outcomes in the AC trial population. The outcome model need only contain all effect modifiers in imbalance in order to be unbiased, but adding other covariates may increase precision. In this case, we only need to include age in the model, but adding gender may increase precision by accounting for more of the variation.
First, fit the outcome model. If we centre age at the mean value from the AC population, then the interpretation of the trtB coefficient is the average B vs. A effect in the AC population.

AB.IPD$y0 <-1 -AB.IPD$y # Add in dummy non-event column
# Fit binomial GLM STC.GLM <glm(cbind(y,y0)~trt*I(age -AC.AgD$age.mean), data = AB.IPD, family = binomial) summary(STC.GLM) The residual deviance suggests that the model fits the data well, as pchisq(406.73, 496) = 0.001. However, even if the model fit was poor, the eventual indirect comparison would still be unbiased -as long as all effect modifiers in imbalance are included in the model, so that conditional constancy of relative effects is satisfied. It is most important that effect modifiers are pre-specified prior to analysis (as per the NICE Methods Guide, and ISPOR guidance); if this is done properly, and all effect modifiers are identified, then the reliance on traditional "model checking" is greatly reduced. Variable selection techniques should not be used to justify the inclusion or exclusion of effect modifiers in the model.

##
We may however try to add other prognostic variables to the model, to try and improve model fit and possibly reduce the standard error of the indirect comparison. Adding gender to the model did not alter the residual deviance or the AIC significantly, so we continue without gender in the model.
(Note that not all of the estimated coefficients match those of the true underlying logistic model which we specified earlier. This is because some of their interpretations are different: the reference age here is the mean age of the AC population, whereas the reference age for the true model was age 40.) The log OR d AB ( So the STC estimate of the log odds ratio of treatment C vs. B is −0.247, with standard error √ 0.227 = 0.476. We examine this result and that from MAIC in more detail below.
Whilst in this scenario STC appears to have produced an estimate closer to the true value and with a slightly narrower confidence interval, these results should not be taken in any way to mean that either method is preferred. The properties and performance of MAIC and STC (and other methods) need to be thoroughly investigated through simulation studies, particularly under failure of the requisite assumptions. Then, and only then, can preferences be expressed for any given method.