Are We Improving? Update and Critical Appraisal of the Reporting of Decision Process and Quality Measures in Trials Evaluating Patient Decision Aids

Background In 2014, a systematic review found large gaps in the quality of reporting of measures used in 86 published trials evaluating the effectiveness of patient decision aids (PtDAs). The purpose of this study was to update that review. Methods We examined measures of decision making used in 49 randomized controlled trials included in the 2014 and 2017 Cochrane Collaboration systematic review of PtDAs. Data on development of the measures, reliability, validity, responsiveness, precision, interpretability, feasibility, and acceptability were independently abstracted by 2 paired reviewers. Results Information from 273 measures was abstracted, and 109 of these covered the core domains of decision processes (n = 55) and decision quality including informed choice/knowledge (n = 48) and values-choice concordance (n = 12). Very few studies reported data on the performance and clinical sensibility of measures, with reliability (23%) and validity (6%) being the most common. Studies using new measures were less likely to include information about their psychometric performance compared with previously published measures. Limitations The review was limited to reporting of measures in studies included in the Cochrane review and did not consult prior publications. Conclusion There continues to be very little reported about the development or performance of measures used to evaluate the effectiveness of PtDAs in published trials. Minimum reporting standards have been published, and efforts to require investigators to use them are needed.


Introduction
The International Patient Decision Aid Standards (IPDAS) collaboration recommends that patient decision aids (PtDAs) are evaluated by their impact on 2 core domains: decision process and decision quality. 1 Decision process refers to the extent to which a PtDA helps patients to recognize that a decision needs to be made; feel informed about the options; be clear about what matters most to them in this decision; discuss goals, concerns, and preferences with their health care providers; and be involved in decision making. Decision quality is the extent to which a patient's eventual choice is informed and consistent with their values. There are many different measures available for these constructs, with new ones being developed and tailored for specific PtDAs. 2,3 To understand the impact of PtDAs, it is important that trials report on the psychometric properties of the measures that are used.
Several studies have highlighted issues with reporting of measures for evaluating PtDAs including variability in definitions, methodology, and validity and generally poor reporting of psychometrics and development. [4][5][6][7] A 2014 systematic review conducted by several of the authors examined measures used in 86 randomized trials included in the 2011 Cochrane systematic review of PtDAs and found that few provided details on psychometric properties of the individual measures. 8 This work informed the subsequent development and publication of reporting guidelines for evaluations of PtDAs, the SUN-DAE checklist. 9 This article updates and extends that previous work by conducting a review of the measures used to evaluate decision making in the new trials added to the 2014 and 2017 Cochrane systematic reviews of PtDAs. 10,11 We focus on the quality of reporting on the development and performance of the outcomes related to decision process and decision quality as recommended by IPDAS.

Methods
This study updates the previous review and follows a similar approach. 8 Pairs of reviewers independently reviewed the full-text manuscripts of the 49 new randomized controlled trials included in the 2014 and 2017 Cochrane systematic reviews of PtDAs, 10,11 determined whether they measured 1 or more of the elements of the ''quality of the decision-making process'' or ''decision quality,'' and abstracted information using standard forms. The reviewers collected information on study context, description of the measure(s) and their administration, the development process (item generation, cognitive testing, pilot studies), psychometric performance (reliability, validity, responsiveness), and clinical sensibility (interpretability, feasibility, and acceptability). Table 1 includes some of the abstracted data fields and provides examples of evidence from our past review. 8 The supplemental file includes details on the studies included in this review and the full data extraction tool.
A measure was considered new if there was no cited prior publication and/or it was not a known, named scale. Articles that cited a reference with respect to any of these issues (e.g. ''The Decisional Conflict Scale has been shown to be valid and reliable'' 16 ) were given credit for reporting those elements. However, we did not consult cited sources to confirm that information or obtain additional unreported information. The abstraction was limited to the details provided within the published trial papers, based on how a reader might evaluate the measures as described by the trial authors. Frequent calls with the entire coding group were held throughout the data abstraction process to ensure consistency. Discrepancies between reviewers were initially discussed by the paired reviewers, and most were resolved after discussion. The lead authors (K.S. and R.T.) adjudicated unresolved disagreements. The data abstracted from the studies are available from the corresponding author by request.

Analysis
We classified the measures and assessed the presence of reporting for key elements of measure development, psychometric performance, and clinical sensibility. We examined reporting for measures of knowledge, valueschoice concordance and decision process. We did not separate out subelements of the decision process (e.g., feel informed), as most measures included multiple elements and did not report separately.

Results
Of the 49 new trials, 44 (90%) measured at least 1 aspect of decision quality or decision process. Most studies  Includes internal consistency reliability (e.g., Cronbach's alpha, Kuder-Richardson coefficient), test-retest reliability, and interrater reliability (e.g., percentage agreement, Kappa coefficient; intraclass correlation coefficient). c Includes content validity (e.g., Content Validity Index), criterion-related validity (e.g., correlations to demonstrate concurrent, predictive validity), construct validity (e.g., factor analysis to demonstrate predicted convergence/divergence of constructs and/or structural invariance of the measure, discriminant analysis, known groups analysis) d Could be inferred from patterns of missing data or low response rates. studies), whereas only a minority measured values-choice concordance (24%, 12/49 studies).
We abstracted 273 reported measures related to decision making. Of these, 109 covered 1 or more core constructs of the decision process (n = 55) or decision quality, including knowledge (n = 48) or values-choice concordance (n = 12; Table 2). Of note, 6 measures covered both knowledge and concordance. The most common other type of outcomes included actual choice (n = 40), preference or preferred choice (n = 25), satisfaction with decision making or chosen option (n = 17), depression and/or anxiety (n = 14), adherence (n = 8), and decision regret (n = 7).

Discussion
Decision process and quality measures are critical to evaluating the effectiveness of PtDAs. 1 This brief report updates a previous review, 8  Reporting of the development process for new measures was poor. Generally speaking, previously published measures were more likely to have some reporting of psychometrics than new measures (41% v. 19%); however, this largely reflects strong reporting of the Decision Conflict Scale (DCS). 17 The DCS was used in more than half of the trials (72/135, 53%), often with detailed descriptions of performance.
Most new trials include decision-making evaluation measures (90%), which is similar to the previous review (88%). 8 Reliability reporting was also similar (23% v. Missing data on development process for n = 3. 21%), whereas validity was worse (6% v. 16%) in these new studies. A common misperception is that validity and reliability are properties of the survey instrument, when in reality they are properties of data and the interpretation of the data (which includes understanding the administration, setting, sample, and analysis procedures). 18 This underscores the importance of reporting relevant information on psychometric performance for each study and each use of an instrument or measure. Detailed reporting of psychometric properties is important to allow appropriate interpretation of results, improve our understanding of the impact of PtDAs on decision process and outcomes, and support replication and synthesis of findings. 19 There are many great resources that describe how to assess the adequacy of psychometric evidence, with the authors recommending a text by Waltz et al. 20 The SUNDAE checklist was developed in 2018 to support completeness and transparency of reporting of PtDA evaluation studies, including psychometric properties of the measures used. 9 While the checklist did not affect this update, which included trials published up to 2017, it may improve reporting in future, particularly if journals adopt the SUNDAE checklist. Few studies include details on the clinical sensibility of the measures. This information is important to allow appropriate interpretation of the results and to support successful implementation of trials into routine clinical practice. Patient-reported measures provide insight into the outcomes and experience of care from the patients' perspective and are valuable to monitor quality of care and outcomes. [21][22][23][24] However, without information on the acceptability, feasibility, and interpretability of the measures, their implementation into practice may be hindered.
Our study has several limitations. First, we focused on randomized controlled trials included within the Cochrane review, although we would expect these to be the highest-quality evaluations. Second, we did not review the cited sources of previously published measures; hence, our findings reflect only the quality of the reporting of measures not the quality of the measures themselves. Third, it is possible that developers reported more details about the measures elsewhere, and this would not have been captured in our review.
Several questions remain to be answered. What other measures should be used to evaluate PtDAs, if any (e.g., health outcomes, cost-effectiveness, potential harms), and when should they be measured? What components of PtDAs are core to effectiveness? Are different measures needed for disadvantaged patients (e.g., individuals with low literacy or low incomes)? Increasingly, 1 or more options in situations covered in PtDAs involve a large behavior change component (e.g., surgery versus diet and exercise for obesity/weight management). In what ways does this behavior change component change our strategies (if at all) for the evaluation of PtDAs (e.g., do we need to assess levels of self-efficacy and motivation in addition to knowledge and concordance)? How do we support decisions in which an option is considered of low value (e.g., prostate-specific antigen screening for certain groups)?
There are also theoretical issues. A growing body of research suggests that defining what a good medical decision is, and how to measure it, is more complicated than is often assumed in theoretical decision-making frameworks. 25 For example, real-life decision making is influenced by interpersonal factors, structural constraints, and affect/emotions. It provides an argument for consideration of how these factors (and others) contribute to the definition of good medical decision making and a tailored approach to the measurement of decision quality.
There continues to be very little reported about the development or performance of measures used to evaluate the effectiveness of PtDAs within published trials. Minimum reporting standards (SUNDAE) have been published, and wide use should be promoted to support transparent and accurate reporting and clearer interpretation of the outcomes of PtDA trials.