Health Opportunity Costs: Assessing the Implications of Uncertainty Using Elicitation Methods with Experts

Well-established methods of economic evaluation are used in many countries to inform decisions about the funding of new medical interventions. To guide such decisions, it is important to consider what health gains would be expected from the same level of investment elsewhere in the health care system. Recent research in the United Kingdom has evaluated the evidence available and the methods required to estimate the health effects of changes in health care expenditure within the National Health Service. Because of the absence of sufficiently broad-ranging data, assumptions were required in the previously mentioned work to estimate health effects in terms of a broader measure of health (quality-adjusted life-years), which is more relevant for policy. These assumptions constitute important sources of uncertainty. This work presents an application of the structured elicitation of the judgments of key individuals about these uncertain quantities. This article describes the design and conduct of the exercise, including the quantities elicited, the individual (rather than consensus) approach used, how uncertainty in knowledge was elicited (mode and bounds of an 80% credible interval), and methods to generate group estimates. It also reports on a successful application involving 28 clinical experts and 25 individuals with policy responsibilities. Although, as expected, most experts found replying to the questions challenging, they were able to express their beliefs quantitatively. Consistent across the uncertainties elicited, experts’ judgments suggest that the quality-adjusted life-year (QALY) impacts of changes in expenditure from earlier work using assumptions are likely to have been underestimated and the “central” estimate of health opportunity cost from that work (£12,936 per QALY) to have been overestimated.

be some consideration of how any health gains offered by the new intervention are to be assessed against any additional costs it imposes on health systems. A key piece of information to guide this assessment is an estimate of the health gains that could have been achieved elsewhere with the same levels of investment-the health opportunity costs-that is, to consider the health effects that could be generated by making the additional resources required for the new interventions available for other services and interventions that could be funded instead or the health effects of those activities that would need to be given up if these resources are committed to the new intervention.
A number of studies in different countries have based an assessment of opportunity costs on the empirical relationship between changes in health care expenditure and health outcome. [5][6][7][8] Recent research in the United Kingdom used national data on expenditure and outcomes in different disease areas reported at a local level in the National Health Service (NHS). [9][10][11] By exploiting the variation in expenditure and mortality outcomes, the relationship between changes in expenditure and mortality was estimated (while accounting for endogeneity). By using the effect of expenditure on the mortality and lifeyear burden of disease as a surrogate for the effects on a more complete measure of burden (one that also includes the quality-of-life burden of disease), a cost per qualityadjusted life-year (QALY) that reflects the likely impact of changes in expenditure on both mortality and morbidity was also reported.
These estimates of the marginal productivity of health care expenditure indicate the health that is expected to be forgone as a consequence of additional costs displacing other health care activities. They reflect what is likely to happen in the health care system, given current levels of information, local decision making, and the influence of other aspects of social value, which are not captured in measures of health such as QALYs. They represent the relevant expected health opportunity costs when the decision context is restricted to approving or rejecting a new intervention. i In this context, it also indicates the maximum that the health care system can afford to pay for the additional benefits offered by a new intervention (e.g., the temporary monopoly price for pharmaceuticals protected by patent) without reducing the total number of QALYs generated.
The assumptions that were required to link the estimates of effects of changes in expenditure on the mortality burden of disease to the likely effect on QALYs constitute important sources of uncertainty. To inform these assumptions appropriately, the judgments of key individuals, such as those with substantive clinical or policy expertise, are important. Elicitation methods offer a systematic process for formalizing and quantifying, typically in probabilistic terms, individuals' judgments about uncertain quantities. 12,13 Elicitation is an important activity in many fields, including in support of decision making, where there may be significant uncertainties and their quantification can feed directly onto decisions. Furthermore, elicitation is a vital element of a Bayesian approach to statistics, the principles of which are core to decision analyses. Here, the use of prior information to augment existing data has an established theoretically basis, particularly where the empirical evidence is limited. 12 This research presents an application of structured elicitation to inform estimates of expected health opportunity costs in the UK NHS, a key quantity to inform policy decisions. This constitutes a novel and important context for the use of structured elicitation, aiming to reflect uncertainty in the judgments required for policy appropriately and explicitly. We demonstrate the applicability of the elicitation exercise in practice. Its design draws from wider experience of elicitation in health technology assessment 14 and literature from other areas of science (for example, refs. 15 and 16).
This article is structured as follows. The next section summarizes earlier work by Claxton et al. 9 to estimate NHS marginal productivity and is the motivation for the current work. The following sections focus on the elicitation exercise, presenting its methods (design, conduct, and analyses) and the results of its application. The article finishes with a discussion including key policy implications.

Summary of the Work by Claxton et al. and Overview of the Key Uncertainties Identified
Claxton et al. 9 evaluated the relationship between expenditure and mortality using a cross-sectional design, seeking to identify differences in mortality across health care commissioning units (at the time of this research, there i Decision makers may also compare the proposed investment to other specific disinvestments or alternative investments. However, they still need to consider how these compare with what the health care system would be expected to deliver (i.e., an estimate of marginal productivity is still relevant). If the decision maker had full information about all interventions that are or could be provided for all indications and subgroups of the population and was also tasked with the wholesale redesign of the health care system, well-established mathematical programming solutions would be possible and appropriate. The marginal productivity would be the outcome of this optimization (i.e., the shadow price of the expenditure constraint from solving the dual problem).
were 152 primary care trusts) that could be attributed to differences in NHS spend. Empirically, the research first quantified expenditure elasticities, that is, how changes in NHS expenditure in a given year were allocated between Programme Budgeting Categories (PBCs), which reflect broad disease areas characterized by International Classification of Disease (ICD) codes.
Second, the research estimated outcome elasticities, that is, how changes in expenditure by a PBC (in a particular year) altered PBC-specific mortality rates (using national data on mortality reported for ICDs or groups of ICDs, mapped onto PBCs). Analyses adjusted for important covariates (including need) and used instrumental variables to estimate causal effects overcoming the problem of endogeneity.
Results showed that the mortality effects of changes in spend could be identified for only 11 of the 23 PBCs (such as cancer and gastrointestinal disorders). For the remaining disease areas (such as mental health disorders), health care focuses primarily on improving healthrelated quality of life (HRQoL). Across the 11 PBCs for which mortality effects were detected, empirically based estimates of how changes in total NHS expenditure affect mortality were generated, returning the following point estimates (using 2008 expenditure and 2008-2010 mortality): £105,872 for the cost per death averted, £23,360 for the cost per life-year, and £28,045 for the cost per lifeyear where life-years were adjusted for HRQoL.
However, an estimate of health opportunity costs relevant for policy needs also to consider the following (Table 1): A. whether changes in expenditure have effects beyond the year of expenditure (this can be termed duration of effects), B. how the effects of changes in expenditure on mortality relate to effects on a broader measure of health that incorporates both duration and HRQoL impacts (QALYs; this can be termed surrogacy), and C. how changes in expenditure affect health in disease areas for which the previous work could not measure a mortality effect (this can be termed extrapolation).
In the original research, 9 very limited data were available with which to assess each of these questions, and hence assumptions were made (listed in Table 1). These were used to obtain a central estimate of health opportunity costs (expressed as a cost per QALY) across all disease areas of £12,936 per QALY. An analysis of the uncertainty imposed by the empirical estimates (the expenditure elasticities estimated for each of the 23 PBCs and the outcome elasticities estimated for 11 of these) indicated that the probability of this central estimate being less than £20,000 per QALY was 0.89. 9

Methods
This research aimed at formally eliciting the beliefs of key individuals on the 3 judgments outlined above (and in Table 1), which are required for a policy-relevant estimate of health opportunity costs. Another uncertain quantity that was elicited concerned the expected life-years gained from averting a death. This is not required to evaluate health opportunity costs in terms of QALYs (although it is important to distinguish morbidity from mortality impacts on the QALY estimate), and hence, for conciseness, methods and results of the elicitation for this quantity are not described in this article but are available elsewhere. 16 Uncertainty in knowledge was explicitly elicited throughout. 12,17,18 The design of the exercise sought to minimize the use of cognitive heuristics that may lead to bias. [19][20][21] Two groups of individuals were considered: the first comprised clinical experts, acting as substantive experts in key disease areas, and the second included policy experts, defined as individuals drawn from organizations that develop or implement policy or that have a major interest in policy in this area. These individuals are not expected to have specific substantive expertise in key clinical areas. Policy experts were asked for their judgments on the quantities of interest once they had considered the information that had been elicited from clinical experts. As such, the elicited judgments from policy experts reconcile their own judgments together with the views of the substantive (clinical) experts.
This exercise did not seek to establish consensus, as such methods are known to have a number of limitations (e.g., because of the fact that aggregation is done implicitly, dominant individuals may imbalance group dynamics, and consensus methods are known to return overly precise judgments). 22 Hence, experts were asked to give their opinions individually (and discouraged from interacting), and a group estimate was generated analytically (detailed below). 12 All aspects of the exercise (design, conduct, and analyses) were protocolled in advance. 23 What Quantities Were Elicited?
The elicitation questionnaire focused on the effects on the population health of changes in NHS expenditure in a particular year (all else unchanged). Experts were prompted to think of changes in expenditure that were significant but still represented a small proportion of NHS expenditure.
The first uncertain quantity concerned the duration of effects. A 2-part question was used (section A, Table 2) that first asked about the duration of mortality effects Section A Question. For how many more years (beyond the year of increased expenditure) would you expect disease-specific mortality rates to be reduced? Question. From an increase in expenditure in a particular year, how do reductions in mortality rates in subsequent years compare (in proportionate terms) to the reduction observed in the first year? This was elicited separately for the second, third, and fourth years. Refers to quantities A2 yr, A3 yr, and A4 yr, respectively, in the diagram.

Section B
Question. If expenditure is increased in a particular year, how many times bigger (or smaller) are proportionate reductions on quality-adjusted life-year burden when compared with proportionate reductions on mortality burden? We elicited for the year of increased expenditure (first year) and also for any later effects of expenditure on the second, third, and fourth years subsequent to increased expenditure. Refers to quantities B1 yr, B2 yr, B3 yr, and B4 yr, respectively, in the diagram. Section C Question. How much bigger (or smaller) are reductions in health burden (quality-adjusted life-years) when expenditure is increased, for example, in ''mental health disorders'' instead of disease areas with a measured effect of increased expenditure on mortality (average effect across all disease areas in this group). This was elicited for the year of expenditure (first year) and also for any later effects of expenditure on subsequent years (second, third, and fourth years). Refers to quantities C1 yr, C2 yr, C3 yr, and C4 yr, respectively, in the diagram.
beyond the first year. Second, it asked about the magnitude of mortality effects in the second, third, and fourth years after the change in expenditure. Participants were asked to express the latter as a proportion of the effect in the first year, because the effect on the first year is an estimable quantity (and was the focus of the empirical work in Claxton et al. 9 ). Using a relative quantity allows for conditional independence to be reasonably assumed and avoids the burdensome task of eliciting dependency. Conditional independence was also assumed in the elicitation of other uncertain quantities, and the accompanying diagram in Table 2 illustrates the conditional relationships specified. Note that the wording intentionally asked for the effects that can be attributed to changes in expenditure in a particular year and hence was able to identify future (lagged) effects causal to that year's change in spend.
The second uncertain quantity subject to elicitation related to the surrogacy relationship and aimed to establish the effects of increased expenditure on a year's QALY burden (section B, Table 2). QALY burden was defined as comprising of the life-years lost due to premature mortality (due to disease) in the year of interest, adjusted for quality, plus any impacts on the level of HRQoL from disease in individuals alive in that year. This was elicited separately for the year of expenditure (first year) and subsequent years (second, third, and fourth years). To allow for conditional independence, it was formulated as relative to effects on mortality burden in the same year.
The third uncertain quantity related to extrapolation (section C, Table 2). Experts were asked about reductions in QALY burden in disease areas that did not have measurable mortality effects (e.g., mental health). They were asked to express these reductions proportionally in relation to the average QALY burden reduction from an increase in NHS expenditure across all disease areas with measurable mortality effects. Again, this was elicited separately for the year of expenditure (first year) and subsequent years (second, third, and fourth years).
Although elicited judgments are likely to differ between disease areas, it was considered too burdensome for the experts to present their judgments for each of the 23 PBCs. Hence, 7 disease areas (circulatory, respiratory, gastrointestinal, neurological, mental health, endocrinology, musculoskeletal) were selected. These were chosen because changes in expenditure and changes in mortality in those areas are the most important drivers of the central estimate of health opportunity cost and most sensitive to the surrogacy and extrapolation assumptions. Estimates were elicited from experts separately for each of these 7 main PBCs and a single estimate for the remaining PBCs combined. These are heterogeneous and broad disease areas, so in responding to questions, experts were asked to consider the ICDs within each PBC for which an increase in expenditure is more likely to fall.

Which Experts?
We aimed to recruit purposively 20 clinicians (at least 2 from each clinical area ii ) and 20 individuals affiliated with selected policy-relevant organisations. iii, 24 Responses from experts were anonymous, but the organizations they belong to were recorded (policy experts), as were the clinical areas of expertise (clinical and relevant policy experts), to facilitate analysis of betweenexpert heterogeneity. 14 How Were the Different Quantities Elicited?
It was important for elicitation to reflect experts' uncertainty, so experts were asked for multiple summaries on each quantity. 12 One was the mode (the value the expert believes to be most likely, their best guess) as it is generally thought that experts can more easily report this than the mean or median. 12,25 The other summary estimates were the bounds of a credible interval (Crl; the Bayesian equivalent to confidence intervals). iv Evidence shows that while eliciting CrI is intuitive, there is a clear tendency for these to be too narrow (a bias called ''overconfidence''); that is, people believe their estimates are more accurate than is justified. 26 This limitation is acknowledged, but experts' time constraints were a major consideration. 27 Hence, strategies were adopted to minimize the potential for bias: 80% CrI were elicited as these typically show less overconfidence than 95% CrIs, 12 and single limit estimates were also elicited-in which the lower bound is elicited first and then the upper bound separately-as these are also thought to produce wider estimates than asking directly for the range. 28,29 Hence, the wording used in this work was as follows: (Mode) My best guess for the value of this quantity is . . . .
(Lower bound of 80% CrI): I am very certain (90% certain) that the true value for this quantity is higher than . . . .
(Upper bound of 80% CrI): I am very certain (90% certain) that the true value for this quantity is lower than . . . .

Conduct of the Exercise
A paper questionnaire was developed (Supplementary Appendix 1) and extensively piloted. To facilitate appropriate training, the exercise was, where possible, conducted in groups (workshops). A training session for experts was developed that described the objectives of the elicitation exercise; clarified concepts such as those of uncertainty, variability, and heterogeneity; familiarized experts with the quantities the research sought to elicit; described and explained the impact of bias and heuristics; and trained experts on the methods of elicitation used (Supplementary Appendix 2). [30][31][32] This was delivered by 2 of the authors (K.C. and M.O.S.).
Throughout the exercise, individuals were encouraged to revisit and revise their answers to previous questions, 33 but we did not record when this occurred. At the end of each section of the exercise, participants were asked whether they were confident the answers they had given reflected their views and uncertainties. Response options were ''yes,'' ''not sure,'' and ''no.'' Individuals were also provided with opportunities for free-text feedback.
The judgments from clinical experts were elicited prior to those of policy experts. The judgments from clinical experts were summarized (histograms of the modes and upper and lower CrI bounds) and presented to policy experts to help them formulate their judgments using the same elicitation tool (Supplementary Appendix 3).

Analyses and Pooling across Experts
Analyses were conducted in Excel 2010. 34 In describing the elicited beliefs, the first step was to fit a distribution to each quantity elicited from each individual expert. 30,35 The quantities of interest here ranged between 0 and +infinity and were fitted with the log-normal distribution as prespecified. 23 Given that 3 summaries were elicited from each expert, more than 1 type of 2-parameter distribution can reasonably reflect their judgments. It was protocolled 23 that, to reflect this additional uncertainty, 2 alternative (2-parameter) distributions would be fitted: one using the lower bound of the CrI and the mode and another using the upper bound and the mode. v A unique distribution for each quantity elicited by each expert was then derived by linear pooling of the 2 distributions (i.e., pooling means and variances). vi Further details on this stage of analysis is presented in Supplementary Appendix 4.
After describing each expert's judgment for each quantity using distributions, these were pooled together to derive a single distribution for the group. Linear pooling was used 12 with equal weights across experts 4 to preserve the individual judgments in the collective (pooled) judgment. 14,26 Linear pooling means that, if the experts' distributions for a single quantity are identical, the pooled distribution is also identical to the individuals' distributions. Also, if there is the support from at least 1 expert that the quantity of interest takes a particular value, the pooled distribution will also show some support for that value. 12,36 The primary analysis reflects the pooled results from clinical experts, and the secondary analysis reflects the pooled results from policy experts.

Sensitivity Analyses
Two sensitivity analyses were protocolled. 23 One explored heterogeneity (i.e., between-expert uncertainty) by 1) considering only responses of clinical experts in the clinical specialty relating to the disease area in the question and 2) by grouping policy experts based on the type of organization to which they belonged (see footnote iii). The second protocolled sensitivity analysis disregarded those responses when individuals indicated they were not confident that the response reflected their views and uncertainties. A third and final sensitivity analysis was not protocolled and provided a qualitative assessment of the implications of using a Gamma distribution, instead of the log-normal, in the fitting.

Primary Analyses Using Substantive (Clinical) Experts' Responses
Twenty-eight clinical experts participated in 3 (group) workshops and 4 individual interviews. vii A summary of the pooled distributions across all clinical experts is presented in Table 3.
Results of question A1 (duration of effects) indicate that changes in NHS expenditure in a particular year are expected to affect mortality in subsequent years. The mean duration of effects is highest for circulatory and gastrointestinal (approximately 11 additional years) and lowest for neurological disease (approximately 6 additional years). The pooled distribution shows considerable uncertainty, as demonstrated by its wide 80% CrI. As an illustration, the top panel of Figure 1 shows the individual experts' distributions for the duration of effects in circulatory disease (in gray), overlaid with the pooled distribution across all experts (in black). Note that the uncertainty in the pooled distribution reflects not just each individual's uncertainty but also between-expert heterogeneity.
Experts' judgments suggest that, across all disease areas, mortality effects beyond year 1 are expected to be higher than effects in the first year (section B). In circulatory disease, for example, it is expected that the effect in the second year is 1.5 times that in the first year. This can be interpreted to reflect the preventative nature of much of the expenditure in this disease area, in which health benefits of current expenditure are higher in the future. The magnitude of expected mortality effects decreases over time for all disease areas. For example, in circulatory disease, surrogacy on the third year is expected to be 1.2 and in the fourth year 0.9. The pooled distributions are wide, and the 80% CrI includes the value of 1.
Experts' judgments indicate that surrogacy relationships are expected to be greater than 1 in the year of expenditure for all disease areas (between 2.9 and 3.7, see Table 3). This implies that changes in spend are expected to reduce QALY burden proportionately more than mortality burden, although this is associated with considerable uncertainty. The individual experts' distributions on the surrogacy relationship in year 1 for circulatory disease have been graphically presented in the bottom panel of Figure 1. Only 5 of the 27 distributions (1 expert did not complete this question) have mean estimates below or equal to 1 (results not presented here). The pooled distribution across the 27 experts shows a mean of 2.9 and an 80% CrI suggesting the true value lies between 0.3 and 6.6 ( Table 3). Over time, expected values for surrogacy do not fall below 1.
Extrapolation relationships follow the same pattern as surrogacy, with expected values consistently above 1 (between 2.6 and 4.7). The 80% CrI seem to reduce width over time.

Secondary Analysis Using Policy Experts' Responses
Twenty-five policy experts participated in 2 workshops (affiliations in endnote viii). Table 4 presents a summary of pooled distributions.
Results were fairly similar to those obtained with the pool of clinical experts, but between-expert variation was lower for this group of experts (exemplified in Figure 2 for duration of effects, top panel, and surrogacy, bottom panel, in circulatory disease). With respect to mortality effects, policy experts generally indicated higher duration (in terms of expected values) than clinical experts and a similar magnitude over time.
In terms of surrogacy, expected values are also comparable with those of clinical experts. Expected values do not fall below 1 (although CrI include 1); for example, for respiratory, surrogacy had an expected value of 2.9. Expected extrapolation relationships also follow similar patterns to those of clinical experts but decrease slightly faster over time.

Face Validity and Qualitative Feedback
The information provided by individual experts is reproduced in item 4 of the Supplemental Material. Only a very small proportion of clinical experts (1/28 in section A, 3/ 28 in section B, and 0/24 in section C) indicated their responses did not reflect their views and uncertainties, with the remaining answering ''yes'' or ''unsure'' (respectively, 16 and 11 out of 28 in section A, 7 and 19 out of 28 in section B, and 14 and 10 out of 24 in section C). This was qualitatively similar for policy experts. Qualitative feedback was insightful regarding the reasons for these responses. Participants, both clinical and policy, consistently mentioned that the heterogeneity across the ICDs that composed the different disease areas made responding to questions particularly challenging. Some clinical experts also found it difficult to answer questions on disease areas that did not relate to their specialism. Some policy experts also indicated that they relied heavily on the clinical experts' answers. The qualitative feedback did not suggest that the answers lacked face validity but instead explains the wide distributions returned by participants.

Sensitivity Analysis
Results of sensitivity analyses are shown in full in Supplementary Appendix 5. Here, we present only a qualitative summary of results.
Results did not change meaningfully when removing individuals who indicated their responses did not reflect their views and uncertainties (item 2.1A in Supplementary Appendix 5). When also removing individuals who responded ''not sure'' to this question (i.e., considering only those who responded ''yes''), differences were again not meaningful, except for surrogacy, for which means were slightly higher across all disease areas (item 2.1B in Supplementary Appendix 5). In terms of heterogeneity in the primary analysis (item 2.2 in Supplementary Appendix 5), the pooled distribution of clinicians in their clinical area of expertise shows some differences in relation to the pooled results across all clinicians (see, for example, the mean duration of mortality effects for circulatory, gastrointestinal, and neurological diseases). The magnitude of such effects over time is (in general) higher for circulatory and neurological diseases. Expected surrogacy relationships are similar for the year of expenditure, except for neurological disease, for which experts indicate surrogacy to be higher. Expected extrapolation relationships are lower for  mental health, in the first year and over subsequent years, but higher for the first year in musculoskeletal disease.
In terms of heterogeneity in secondary analyses (item 2.3 in Supplementary Appendix 5), of note is the pooled distribution for group G2 (the biggest group comprising of 15 of the total 25 experts, including ''governmental bodies,'' such as the Department of Health and Social Care or Public Health England), which presents generally lower expected values and more precise distributions than the overall group. This implies that the heterogeneity introduced by the remaining groups is contributing to a widening of the CrI.
The post hoc sensitivity analyses evaluating an alternative distribution to represent experts' beliefs (item 2.4 in Supplementary Appendix 5) shows overall conclusions to be robust but that the magnitude of effects is sensitive to the choice: the log-normal distribution (prespecified in our analyses plan) has a heavier tail than the Gamma (implemented in sensitivity analyses) and hence generally returns higher expected values when fitted to the same mode and CrI bounds.

Discussion
This research developed an exemplar elicitation exercise aimed at quantitatively gathering the (uncertain) beliefs of individuals on a set of quantities for which there is currently insufficient evidence but that are central to an estimate of health opportunity costs for the UK's NHS. Resourcing decisions in the NHS require consideration of health opportunity costs, and hence this work has direct relevance for current policy in the United Kingdom. Despite being motivated by earlier research, 9 this work will also have longer-term relevance as the judgments elicited can be used to support other empirical studies for the United Kingdom, including those using different econometric methodologies, as these can be expected to suffer from the same evidence gaps.
Elicited judgments should not replace high-quality evidence, and it is paramount that primary evidence is collected on each of the uncertain quantities covered here. Our work, however, was designed in such a way that, as new evidence reports on individual quantities, the judgments elicited on the other quantities can be retained for use in policy. This was achieved by defining quantities as conditionally independent. The work presented here is also important internationally, as it can be adapted for evaluations pertaining to other countries or settings, beyond the UK's NHS.
The group estimates obtained provide a summary of the beliefs of multiple experts on quantities for which there currently is no evidence. There are, therefore, important implications for a meaningful estimate of health opportunity costs for use in policy. First, regarding the duration of mortality effects, the original analyses 9 assumed impacts only in the year of expenditure.
The results from the current work, however, indicate that mortality effects are expected also to occur in subsequent years. This suggests that the original work underestimated the QALY impacts of changes in expenditure. Second, the original work assumed perfect surrogacy in the effects of changes expenditure between mortality burden and total QALY burden. The results from this research indicate, however, that surrogacy is expected to be greater than 1 (this holds across disease areas for the first, second, and third years), indicating that the effects of changes in expenditure on total QALY burden are, in proportionate terms, expected to be higher than (rather than equal to) those on mortality burden. Again, this suggests that the original work underestimated the QALY impacts of changes in expenditure. Third, in terms of extrapolation, the original work assumed changes in spend to have equal effects on diseases with, and without, measured mortality effects. This work demonstrates that the extrapolation relationship is generally expected to be greater than 1. That is, the health effects in disease areas without measured mortality effects are expected to be higher than what was assumed in the original work. Consistently across the 3 uncertainties, experts' judgments suggest the QALY impact of changes in expenditure are likely to be underestimated when using the assumptions that underpin the ''central'' estimate of £12,936 per QALY reported in Claxton et al. 9 The exercise was carefully developed to align with the scope of the policy question, was piloted extensively, and was accompanied by an extensive training package to support experts and guide them through the tasks. As a consequence, it ran successfully. Experts were able to express their beliefs quantitatively, with only a few indicating their answers did not reflect their views (i.e., were not face valid). However, in approximately half of the answers, individuals indicated they were unsure that their answers reflected their views or uncertainties. Feedback left in open text did not, however, indicate these answers were not face valid but instead suggested that the breadth of the questions meant that the distributions retrieved were wide. Convening individuals in groups aided the delivery of the standardized training package and maximized expert engagement. However, it also made recruitment difficult: 132 clinical and 84 policy experts were contacted to recruit effective samples of 28 and 25, respectively. Issues with recruitment in elicitation have been recognized elsewhere. 27 As expected, the level of uncertainty in knowledge expressed by the individual experts was large, and group estimates were highly uncertain (as evident by the wide CrIs). In their feedback (Supplementary Appendix 6), experts consistently indicated that heterogeneity in the broad disease areas contributed to the uncertainty expressed in their responses. However, eliciting for ''finer'' definitions of disease, for example, 3-digit ICD codes of which there are more than 1500, would have been unfeasibly burdensome. Therefore, future research could instead provide further information to experts to help them make judgments about which ICDs may matter the most within each disease area.
The design of an elicitation exercise requires a number of methodological choices to be made, many of which are example specific. This exercise used methods established in the literature and justifies the choices made. However, it is important to acknowledge that methods research in this area is limited and that little is known about how different choices affect results. For example, although there is some evidence that consensus methods present a number of challenges inherent to group interaction (see the Methods section), its accuracy in relation to individual elicitation is largely unknown. This article demonstrates that structured elicitation can feasibly be used to explicitly quantify the judgments required to delimit important policy problems, judgments that otherwise would still need to be made implicitly and without the support of relevant experts. In this work, we focused on achieving a relevant estimate of health opportunity costs, a central quantity for policy on health care resource allocation decisions. We have learned that the methods used here (i.e., the elicitation protocol) are applicable in this novel context. For example, the elicitation of the mode and bounds of an 80% CrI was widely understood by the experts, and experts working close to policy valued the summaries of the judgments of clinical experts provided. We also learned that there are challenges in eliciting policy-relevant, but broad-ranging, quantities. Such broad-ranging quantities are by definition uncertain, and structured expert elicitation makes this explicit.