Taking a Disagreeing Perspective Improves the Accuracy of People’s Quantitative Estimates

Many decisions rest on people’s ability to make estimates of unknown quantities. In these judgments, the aggregate estimate of a crowd of individuals is often more accurate than most individual estimates. Remarkably, similar principles apply when multiple estimates from the same person are aggregated, and a key challenge is to identify strategies that improve the accuracy of people’s aggregate estimates. Here, we present the following strategy: Combine people’s first estimate with their second estimate, made from the perspective of someone they often disagree with. In five preregistered experiments (N = 6,425 adults; N = 53,086 estimates) with populations from the United States and United Kingdom, we found that such a strategy produced accurate estimates (compared with situations in which people made a second guess or when second estimates were made from the perspective of someone they often agree with). These results suggest that disagreement, often highlighted for its negative impact, is a powerful tool in producing accurate judgments.

not feasible for a single person to collect the estimates of multiple individuals. Remarkably, research suggests that the same principles underlying the wisdom of crowds also apply when multiple estimates from the same person are aggregated-a phenomenon known as the "wisdom of the inner crowd" (Herzog & Hertwig, 2009;Van Dolder & Van den Assem, 2018;Vul & Pashler, 2008).
It is not clear from the outset why aggregating multiple estimates from the same person would be beneficial. If a person's first estimate represents their best guess, then any other estimate would simply add noise (Hourihan & Benjamin, 2010;Vul & Pashler, 2008). An alternative account based on probabilistic representations, however, posits that averaging estimates from the same person cancels out the errors that permeate people's judgments. According to this account, people's initial estimates represent samples drawn from an internal distribution of possible estimates, where second estimates are resampled guesses from that same distribution (Vul & Pashler, 2008;Wallsten et al., 1997). When second, resampled estimates are sufficiently diverse, averaging increases accuracy by canceling out errors across estimates (Ariely et al., 2000;Herzog & Hertwig, 2009;Keck & Tang, 2020;Litvinova et al., 2020).
With such a powerful tool available to individuals, a key challenge is to identify strategies that can help improve the accuracy of people's aggregate estimates (Herzog & Hertwig, 2014a). Research so far agrees that the inner crowd falters when people anchor too heavily on their first estimate when generating a second guess, thereby reducing diversity and independence (Herzog & Hertwig, 2014a;Vul & Pashler, 2008). At least two methods have been applied to negate this. The first relies on the passage of time. For example, the benefits of aggregation tend to be higher with the introduction of time delays between both estimates (Steegen et al., 2014;Van Dolder & Van den Assem, 2018;Vul & Pashler, 2008). In these cases, the passage of time effectively deanchors people from their first estimate (presumably because they forget their initial estimate), thereby improving the diversity and independence of both estimates. A second method to increase diversity and independence is to rely on the mind's ability to construct alternative, opposing realities (Herzog & Hertwig, 2009, 2014a, 2014b. A demonstrated way to do this has been through "dialectical bootstrapping," in which people are prompted to base their second estimate on different assumptions and considerations (Herzog & Hertwig, 2009). These dialectical estimates ideally result in errors with different signs relative to first estimates, and there are different techniques to elicit such dialectical estimates. One technique, based on the "consider-theopposite" strategy (Lord et al., 1984), instructs people to actively question the accuracy of their first estimate when generating a second guess. This technique has been shown to increase the accuracy of people's aggregate estimate by getting the same person to generate first and second estimates that are more diverse and independent (Herzog & Hertwig, 2009, 2014b. In the present research, we similarly relied on the mind's ability to construct opposing realities by prompting people to complement their initial estimate with a second estimate made from the perspective of someone they often disagree with. Perspective taking refers to people's ability to consider situations and events from the viewpoint of others (Piaget, 1932(Piaget, /1965. It has been associated with many positive outcomes, such as altruistic behaviors, decreased stereotype expressions, and increased creativity (Batson et al., 1997;Galinsky & Moskowitz, 2000;Hoever et al., 2012). However, according to the principles of within-person aggregation, simply getting people to take the perspective of others would not be enough. What is needed is to add an estimate from the perspective of someone whose views are substantially different-in other words, to create a diverse inner crowd. To do this, we suggest using an oft-encountered component of people's interaction with othersdisagreement. More specifically, as a viable method to obtain more diverse estimates, we propose to combine people's initial estimate with their second estimate made from the perspective of someone they often disagree with.
Disagreement is often decried as an undesirable component of people's interactions with others. In today's polarized society, disagreement has been associated with conflict, division, and misinformation (Kennedy & Pronin, 2008;Reeder et al., 2005;Sunstein, 2002). However, although disagreement is generally undesirable, research in group decision-making indicates that it may

Statement of Relevance
In today's polarized society, disagreement is associated with conflict and division, but are there also benefits to disagreement? By utilizing people's ability to take the perspective of others, we propose that disagreement is a powerful tool for producing accurate estimates. In five experiments, people made estimates of unknown quantities from various perspectives. Following principles of within-person aggregation, we found that aggregating people's first estimate with their second estimate, made from the perspective of someone they often disagree with, produced accurate estimates. In explaining this accuracy, we found that taking a disagreeing perspective prompts people to consider estimates they normally would not consider to be viable options, resulting in first and second estimates that are more diverse and independent (and by extension more accurate when aggregated). Together, these results underscore the importance of perspective taking and disagreement as strategies to improve the accuracy of people's quantitative estimates.
actually be beneficial when groups address complex problems, such as making estimates of unknown quantities or events (de Oliveira & Nisbett, 2018;Hong & Page, 2004;Mutz, 2006;Page, 2008). These effects occur because of the notion that disagreeing individuals tend to produce more diverse estimates, and by extension errors, which are canceled out across group members when averaged (Page, 2008). It is precisely this aspect of disagreement that we rely on in our pursuit to foster more diverse estimates from the same individual. More specifically, we surmise that just as disagreement between different individuals is beneficial for the wisdom of crowds, so too, through perspective taking, will this be beneficial for the wisdom of the inner crowd.
To understand the benefits of disagreement, we tested the hypothesis (in Experiment 3) that taking a disagreeing perspective leads to two distinct observations. First, from a disagreeing perspective, people are more likely to consider estimates that are strikingly different from their own guesses, thereby opening the sampling space of possible second estimates. And second, people are more likely to adopt these different estimates as their second estimates when viewing problems from a disagreeing perspective, leading to first and second estimates that are more diverse and independent. These conjectures follow from earlier work on anchoring showing that people typically avoid making second estimates that are strikingly different from prior estimates or anchors (Epley & Gilovich, 2006;Lewis et al., 2019;Tversky & Kahneman, 1992). Making an estimate from a disagreeing perspective is expected to attenuate this tendency, given that disagreeing others (almost by default) consider and adopt entirely different estimates as one's own estimate. However, although taking a disagreeing perspective is generally beneficial, the final experiment (Experiment 4) identified a situation in which taking a disagreeing perspective backfired, undermining the benefit of averaging (i.e., in situations in which second estimates were likely to be made in the wrong direction).

The Present Research
For easier reading, we first present general methodological information that concerns all five experiments. Ethical approval for all experiments was obtained from the ethical review board at Eindhoven University of Technology (Reference No. ERB2020IEIS29). For all experiments, we report how we determined the sample size, all data exclusions (if any), all manipulations, and all measures. The questions used in all experiments can be found in the Supplemental Material available online. Data, code, and materials are publicly available on OSF at https://osf.io/qsxp8/. All experiments' hypotheses, designs, and main analyses were preregistered 2 (see the Open Practices section for links).
Sample-size estimation for all experiments was based on a priori power analyses using G*Power (Version 3.1.9.4;Faul et al., 2007). For Experiments 1a, 1b, 2, and 3, analyses determined that 410 participants per condition would be necessary to achieve a Cohen's d effect size of 0.30 with 99% power and that 394 participants per condition would be necessary to achieve a Cohen's d effect size of 0.20 with more than 80% power. For Experiment 4, analyses determined that 290 participants per condition would be necessary to achieve a Cohen's d effect size of 0.30 with 95% power and that 253 participants per condition would be necessary to achieve a Cohen's d effect size of 0.25 with 80% power. Alpha was set at .05. For all experiments, we stopped data collection when we reached the predetermined sample size. Following previous studies on the inner crowd, to verify the accuracy of people's estimates, we relied on the mean square error 3 obtained by squaring the subtraction of the true answers from the estimations and then averaging.
For analysis, we used mixed-effect models, which allowed us to make more generalizable claims across a wide range of participants and questions by employing random intercepts for participants and questions ( Judd et al., 2012). We fitted the models using lme4 (Version 1.1-27.1; Bates et al., 2015) and produced p values using the Satterthwaite approximations for degrees of freedom from lmerTest (Version 3.1-3; Kuznetsova et al., 2017). Because there is little agreement on how to calculate effect sizes for mixed models, we report classic Cohen's d or d z effects calculated from the t values of the fixedeffect results obtained in the models (Cohen, 1988). For comparison of correlation coefficients between experimental conditions, we took a two-step approach. Because participants responded to multiple questions twice, we first calculated the correlation between the errors of first and second estimates (i.e., the true answer subtracted from an estimate 1 ) for each participant. We then compared these Pearson's r values across experimental conditions using independent-samples t tests and calculated Cohen's d effect sizes.

Method
In Experiment 1a, participants made two weight estimates of 10 objects shown in pictures (see Table S1 in the Supplemental Material). In Experiment 1b, we used a different estimation task: Participants made two estimates of six questions on a scale ranging from 0% to 100%. The questions' true answers were obtained from various online sources (e.g., Wikipedia for Experiment 1a and the The World Factbook, Central Intelligence Agency, 2020, 2021, for Experiment 1b). For Experiment 1a, we recruited 900 participants using Amazon Mechanical Turk (MTurk). Following the preregistration plans, we excluded participants who failed the instructional check and those who said they looked up the answers, leaving a final sample of 880 U.S. participants (age: Mdn = 36 years, interquartile range [IQR] = 16 years; 51% female). For Experiment 1b, we recruited 1,000 participants using Prolific. Excluding those who failed the instructional check and those who looked up the answers resulted in a final sample of 894 UK participants (age: Mdn = 35 years, IQR = 20 years; 69% female). After making their first estimate for all questions, half of the participants were told to make a second guess, and the other half were instructed to make their second estimate from the perspective of a friend they often disagree with.
Participants were instructed not to look up the true answers during the study. They were randomly presented with the questions in two estimation stages (histograms for the distribution of participants' answers on both estimates for all five experiments can be accessed at https://osf.io/q3tfh/). Participants were not told at the beginning of the experiment that they would be asked to make an additional, second estimate. In the first stage, participants simply provided their own estimates to the questions. The instructions for the second estimation stage were different depending on the condition. For the self-perspective condition, participants were told, We will now ask you to provide a second guess at the answer to each of the [ten/six] questions you were asked in the first session. These answers should not be the same as your previous answers: these should reflect your 'second guess'.
For the disagreeing-perspective condition, participants were told, Now picture a friend whose views and opinions are very different from yours. To illustrate, when discussing politics, you often find yourself disagreeing on various issues. How would he or she answer these [ten/six] questions? Please answer these questions again, but now as this friend.
After responding to the questions, participants were asked to provide their age and gender. In addition, they were presented with a manipulation-check question instructing them to choose a particular option in a multiple-choice array and a question asking them whether they looked up any of the answers to the questions.

Results
Correlations. Comparing the two experimental conditions, we found that our instructions led to lower correlations when second estimates were made from a disagreeing perspective (Experiment 1a: mean r disagreeing = .54 vs. mean r self = .71; Experiment 1b: mean r disagreeing = .34 vs. mean r self = .73). In both experiments, these two correlation coefficients were significantly different (Experiment 1a: d = 0.44, Experiment 1b: d = 0.98; both ps < .001), indicating that participants in the disagreeingperspective condition produced more diverse estimates and errors compared with participants in the self-perspective condition (see Figs. 1a and 1b; scatterplots for each question separately for all five experiments can be accessed at https://osf.io/q3tfh/).

Inner-crowd effects.
For the inner crowd to be more accurate, the aggregate of both estimates should have a lower error than a person's first estimate alone. Taking into account both conditions (overall) and looking at the self-and disagreeing-perspective conditions separately, we found an inner-crowd effect in Experiments 1a and 1b (see Table 1 for summary statistics). The average of both estimates had a lower mean square error than the first and second estimates alone, respectively (for descriptive statistics, see Tables S2a and S2b in the Supplemental Material).
Benefit of averaging. Would participants in the disagreeing-perspective condition benefit more from averaging their estimates than participants in the self-perspective condition? To test this, we calculated the benefit of averaging by subtracting the square error of average estimates from the square error of first estimates (similar procedures have been used before; Herzog & Hertwig, 2009;Steegen et al., 2014;Vul & Pashler, 2008). The higher this number, the larger the benefit of averaging (i.e., the more accurate the inner crowd). Overall, the results indicated that in both Experiments 1a and 1b 4 participants in the disagreeingperspective condition indeed benefited more from averaging their estimates than participants in the self-perspective condition (Experiment 1a: d = 0.16, p = .02; Experiment 1b: d = 0.18, p = .01).
Bracketing. To more concretely test whether people in the disagreeing-perspective condition benefited more from averaging, we looked at bracketing rates across conditions. Bracketing is a key component underpinning the benefit of aggregating multiple estimates (Larrick & Soll, 2006). It refers to the observation that if two estimates are on the opposite sides of the true answer, thus bracketing it (i.e., one overestimating the true answer and the other underestimating it), aggregating them will typically result in a more accurate average estimate (Larrick & Soll, 2006;Soll & Larrick, 2009). Consequently, for each question, we verified whether the question's true answer was bracketed by the two estimates. As expected, the bracketing rate was much higher in the disagreeingperspective condition at 29% (Experiment 1a) and 38% (Experiment 1b), compared with the self-perspective condition, in which 19% (Experiment 1a) and 20% (Experiment 1b) of people's estimates bracketed the questions' true answers (Experiment 1a: d = 0.56, p < .001; Experiment 1b: d = 0.80, p < .001).

Experiment 2
People who made a second estimate from the perspective of a person they often disagree with benefited more from averaging than people who simply made a second guess. Experiment 2 provided an important extension. Specifically, we included a third experimental condition in which participants were instructed to take the perspective of a person they often agree with. We included this condition to underscore the need to take a disagreeing perspective to improve the accuracy of people's inner crowds.

Method
The procedure of this experiment was similar to that of Experiment 1b. However, we added an additional condition (i.e., the agreeing-perspective condition) in which the instructions for the second estimation stage were, Now picture a friend whose views and opinions are very similar to yours. To illustrate, when discussing politics, you often find yourself agreeing on various issues. How would he or she answer these six questions? Please answer these questions again, but now as this friend.
We recruited 1,425 participants using MTurk. After excluding those who failed the instructional check and those who said they looked up the answers, we obtained a final sample of 1,389 U.S. participants (age: Mdn = 35 years, IQR = 16 years; 44% female).

Results
Correlations. The estimates' errors were highly correlated in the self-perspective and the agreeing-perspective conditions (mean r self = .73 vs. mean r agreeing = .74). This correlation was much lower in the disagreeing-perspective condition (mean r disagreeing = .32; see Fig. 1c). Comparing these correlation coefficients, we found that participants in the disagreeing-perspective condition produced more diverse errors than participants in both the self-perspective condition (d = 0.99) and agreeing-perspective condition (d = 1.00; both ps < .001), whereas there was no difference in the diversity of errors between the self-and agreeingperspective conditions (d = 0.02, p = .81).
Inner-crowd effects. There was again an inner-crowd effect overall (i.e., across all three conditions) and in the three conditions separately (see Table 2; for descriptive statistics, see Table S3 in the Supplemental Material).
Benefit of averaging. There was no difference between the self-and agreeing-perspective conditions with regard to benefit of averaging (d = 0.03, p = .61). Importantly, participants in the disagreeing-perspective condition again benefited more from averaging both estimates, compared with the self-and agreeing-perspective conditions (d = 0.18, p = .01, and d = 0.21, p = .001, respectively).
Bracketing. With 21% and 20% of people's estimates bracketing the questions' true answers, there was no difference in bracketing rates between the self-and agreeingperspective conditions (d = 0.08, p = .25). Crucially, however, the bracketing rate was again higher in the disagreeing-perspective condition: 39% of people's estimates bracketed the questions' true answers, compared with both the self-perspective (d = 0.85, p < .001) and agreeing-perspective (d = 0.90, p < .001) conditions.

Experiment 3
In Experiment 3, we tested the proposed mechanism explaining our observation of more diversity and independence when second estimates are made from a disagreeing perspective. Earlier work on inner crowds suggests that people typically anchor too heavily on first estimates when generating a second guess, thereby not producing diverse enough estimates and errors (Herzog & Hertwig, 2009;Vul & Pashler, 2008). Making an estimate from a disagreeing perspective was expected to attenuate this tendency, given that disagreeing others

Method
The design and procedure was similar to that of Experiment 2. However, before making their second estimate, participants in the self-perspective condition were asked, "What is the most extreme estimate (either extremely high or extremely low) that you would consider as second guess to this question?" In the agreeing-and disagreeing-perspective conditions, participants were asked, "What is the most extreme estimate (either extremely high or extremely low) that your friend would consider as answer to this question?" We recruited 1,500 participants using MTurk. After excluding those who failed the instructional check and those who said they looked up the answers, we obtained a final sample of 1,426 U.S. participants (age: Mdn = 36 years, IQR = 17 years; 48% female).

Results
Correlations. The estimates' errors were again highly correlated in the self-and agreeing-perspective conditions (mean r self = .74 vs. mean r agreeing = .72). This correlation was much lower in the disagreeing-perspective condition (mean r disagreeing = .46; see Fig. 1d). Comparing these correlation coefficients, we found that participants in the disagreeing-perspective condition again produced more diverse errors than participants in both the self-perspective condition (d = 0.78) and agreeingperspective condition (d = 0.71; both ps < .001), whereas there was no difference in error diversity between the self-perspective and agreeing-perspective conditions (d = 0.07, p = .26).
Inner-crowd effects. There was an inner-crowd effect overall (i.e., across all three conditions) and in the three conditions separately (see Table 2; for descriptive statistics, see Table S4 in the Supplemental Material).
Benefit of averaging. Participants in the agreeing-perspective condition benefitted slightly more from averaging than did those in the self-perspective condition (d = 0.13, p = .04). Importantly, participants in the disagreeingperspective condition again benefited more from averaging both estimates, compared with the self-perspective condition (d = 0.15, p = .02). However, there was no difference between the agreeing-and disagreeing-perspective conditions (d = 0.04, p = .55), 5 although the effect was in the right direction: The benefits of averaging were numerically higher in the disagreeing-perspective condition.
Bracketing. With 20% and 21% of people's estimates bracketing the questions' true answers, there was no difference in bracketing rates between the self-and agreeingperspective conditions (d = 0.05, p = .42). Crucially, however, the bracketing rate was again higher in the disagreeingperspective condition: 33% of people's estimates bracketed the questions' true answers, compared with both the selfperspective condition (d = 0.64, p < .001) and the agreeingperspective condition (d = 0.61, p < .001).
Extreme-estimate analysis. To test the proposition that taking a disagreeing perspective prompts people to consider more extreme estimates as possible answers to a question, we computed the (absolute) difference score between each participant's first estimate and the most extreme estimate that they (or their friend) would consider as an answer. As expected, there was no difference between participants in the agreeing-and self-perspective conditions (d = 0.04, p = .51). Importantly however, participants in the disagreeing-perspective condition considered far more extreme estimates as possible answers than participants in either the self-perspective condition (d = 0.41, p < .001) or the agreeing-perspective condition (d = 0.46, p < .001). Moreover, to test whether participants in the disagreeing-perspective condition would also be more inclined to adopt these extreme estimates as their second answers, we computed the (absolute) difference score between each participant's second estimate and the most extreme estimate. The lower this number, the closer the second estimate was to the most extreme estimate. As expected, there was no difference between participants in the agreeing-and self-perspective conditions (d = 0.12, p = .06). Importantly, participants in the disagreeing-perspective condition made second estimates much closer to the extreme estimate than either the participants in the self-perspective condition (d = 0.29, p < .001) or the agreeing-perspective condition (d = 0.14, p = .03; for descriptive statistics, see Table S5 in the Supplemental Material). The willingness of participants to adopt these extreme estimates as answers is noteworthy, given people's general propensity to avoid making extreme judgments (Lewis et al., 2019). This aversion seems to dissipate when second estimates are made from the viewpoint of disagreeing others. Interestingly, even if people made second estimates equally close to their most extreme estimate from a disagreeing perspective, they would still produce more diverse estimates, given that these extreme estimates are generally more extreme. Overall, these results underscore the conjecture that taking a disagreeing perspective prompts people to consider and adopt second estimates that are strikingly different from their initial estimate, rendering a set of estimates that is more diverse and independent.

Experiment 4
The final experiment identified a situation in which taking a disagreeing perspective backfired. Specifically, this was expected in situations where a question's true answer lies close to the lower or upper end of a scale (e.g., if the true answer is 2% or 98% on a scale from 0% to 100%) and when a person's initial estimate is close to this answer. For example, imagine being asked the following question: "What percentage of China's population identifies as Christian?" The true answer to this question is 5.1%, and if you are like most people, your first estimate probably leaned toward the lower end of the scale (say your first estimate was 10%). Given the position of the question's true answer and your first estimate, your second estimate is likely (in general) to move away from the answer toward the opposite side of the scale ( Juslin et al., 2000), effectively hurting the accuracy of your average estimate. Importantly, such a movement is expected to be especially harmful when second estimates are made from a disagreeing perspective because, given people's propensity to adopt more extreme estimates from such a perspective (see Experiment 3), these estimates move away from the true answer to a much greater extent (resulting in an average estimate that is far worse than the initial estimate).

Method
We gathered data in two waves. We preregistered our hypotheses and analysis plan for the second wave. Because the procedures in the two waves were identical, we decided to combine them (analyzing the data separately yielded similar results, which can be accessed at https://osf.io/ewpyq/). The procedure of this experiment was similar to that of Experiment 2 in all but two respects. First, we added an additional six questions to make 12 questions in total. Second, we categorized the questions according to where the true answer fell-that is, whether the true answer was in the middle of the scale or the end of the scale (0%-10% or 90%-100%). Participants thus made two estimates about a set of 12 questions, all of which had a true answer that was in the 0% to 100% range. Crucially, half of the questions' true answers were close to the lower or upper end of the scale, from 0% to 10% and 90% to 100%. For the other half of the questions, true answers were relatively far from the end of the scale (e.g., 58%). Combining the two data-wave collections, we recruited 1,889 participants using MTurk. As in the prior experiments, we excluded those who failed the instructional check and those who said they looked up the answers, leaving a final sample of 1,836 U.S. participants (age: Mdn = 36 years, IQR = 17 years; 51% female).

Results
Correlations. Correlations between the estimates' errors in the self-and agreeing-perspective conditions were again high (mean r self = .77, mean r agreeing = .79). This correlation was lower in the disagreeing-perspective condition (mean r disagreeing = .57; see Fig. 1e). Participants in the disagreeing-perspective condition produced more diverse errors than participants in both the self-perspective (d = 0.83) or agreeing-perspective (d = 0.96) conditions (both ps < .001). The difference between the self-and agreeingperspective conditions was also significant (d = 0.15, p = .01). Overall, the disagreeing-perspective condition again produced more diverse errors.
Inner-crowd effects. Taking into account both types of questions (mid-scale and end-scale questions) and all three conditions, we did not find an inner-crowd effect (see Table 3). Importantly, and in line with our proposal, results showed that the perspective-taking instructions had a markedly different impact when the mid-scale and end-scale questions were considered separately. For the mid-scale questions, there was an inner-crowd effect similar to those in the previous experiments. However, when looking at the end-scale questions for the disagreeingperspective condition, we found that the average of both estimates had a much higher error than the first estimate alone (for descriptive statistics, see Table S6 in the Supplemental Material).

Benefit of averaging.
For the mid-scale questions, we found no difference between the self-and agreeingperspective conditions (d = 0.05, p = .39), whereas the benefit of averaging was again higher for participants in the disagreeing-perspective condition compared with the self-perspective (d = 0.28, p < .001) and agreeingperspective (d = 0.24, p < .001) conditions. Thus, for the mid-scale questions, the results echo those obtained in the previous experiments. For the end-scale questions, there was no difference between the self-and agreeingperspective conditions (d = 0.05, p = .39). However, in the disagreeing-perspective condition, averaging was actually much more disadvantageous than in the self-perspective (d = −0.41, p < .001) and agreeing-perspective (d = −0.40, p < .001) conditions.
Bracketing. For the mid-scale questions, with 23% and 21% of the estimates bracketing the questions' true answers, there was slightly more bracketing in the selfperspective than the agreeing-perspective condition (d = 0.13, p = .03). Importantly, as expected, the degree of bracketing was again higher in the disagreeing-perspective condition: 37% of the estimates bracketed the questions' true answers, compared with both the self-perspective (d = 0.65, p < .001) and agreeing-perspective (d = 0.78, p < .001) conditions.
Focusing on the end-scale questions, we generally saw lower rates of bracketing. With 13% and 11% of the estimates bracketing the questions' true answers, there was slightly more bracketing in the self-perspective than the agreeing-perspective condition (d = 0.15, p = .01). The degree of bracketing was again higher in the disagreeing-perspective condition: 19% of estimates bracketed the questions' true answers, compared with both the self-perspective (d = 0.34, p < .001) and agreeingperspective (d = 0.48, p < .001) conditions.
Understanding averaging and bracketing-when is it beneficial? Prior research suggests that bracketing is a key component in understanding why averaging estimates renders an improvement (Larrick & Soll, 2006;Soll & Larrick, 2009). However, as demonstrated by our results on the end-scale questions, this may not always be the case. Specifically, although we observed higher rates of bracketing in the disagreeing-perspective condition for end-scale questions, averaging nonetheless led to a greater overall disadvantage in this condition. To better understand why this occurred, we took a closer look at the underlying components that determine whether averaging first and second estimates is beneficial or not. We formalize each component in Equation 1.
n 2 is the subset of observations in n in which X s moves away ) X X − f ; and n 4 is the subset of observations in n in which X X f = while X s ≠ X f . To clarify, the left-hand side of the equation represents the benefit of averaging, and the right-hand side represents the unique components that make up this benefit of averaging. The first component (n 1 ) represents those observations in the total set of observations (n), where the error of the average estimate is always lower than the error of the first estimate. These observations bring the benefit of averaging estimates (each observation in this subset yields, by definition, a positive number). Here, the second estimate lies in what has been called the "gain range" (Herzog & Hertwig, 2009, p. 232). The other three components (n 2 , n 3 , and n 4 ) are those observations where the error of the average estimate is always higher than the error of the first estimate. 6 These observations bring the disbenefit of averaging (each observation in these subsets yields, by definition, a negative number). Averaging first and second estimates (following Equation 1) results in an overall improvement when the part that brings benefit (i.e., the first component) outweighs the parts that bring disbenefit (i.e., the other three components). Likewise, when the parts that bring disbenefit outweigh the part that brings benefit, averaging estimates becomes unbeneficial overall. When we look at each component for the end-scale questions separately per condition (see Table 4; n c refers to the total set of observations for a particular condition), it becomes clear that the parts that bring disbenefit clearly outweigh the part that brings benefit for the disagreeing-perspective condition (rendering an overall disadvantage of −132.20 in this instance).
What about the observed higher bracketing rate in the disagreeing-perspective condition for end-scale questions? There are two types of brackets (following Equation 1). There are brackets-which we refer to as beneficial brackets-in which the average estimate is by definition more accurate than the first estimate (e.g., X = 30, X f = 20, X s = 50, X a = 35). Beneficial brackets are observations that follow from n 1 . Unbeneficial brackets, on the other hand, are those observations in which the average estimate is by definition less accurate than the first estimate. Unbeneficial brackets are observations that follow from n 3 . These brackets are unbeneficial because the two estimates overbracket a question's true answer, rendering an average estimate that is worse than the first estimate (e.g., X = 30, X f = 20, X s = 80, X a = 50). Although these types of brackets are relatively rare, they occurred more frequently in the disagreeing-perspective condition for the end-scale questions (percentage of observations in a condition: disagreeing perspective = 7%; self-perspective = 4%; agreeing perspective = 3%). In sum, although bracketing is indeed a key component when it comes to averaging estimates, it does not by definition render an improvement. That is, averaging estimates becomes unbeneficial once the part that brings benefit (i.e., n 1 -observations, including beneficial brackets) is canceled by observations in which the average estimate performs worse than the first estimate (i.e., the n 2 , n 3 , and n 4 observations).

General Discussion
Many decisions depend on people's ability to make accurate estimates of unknown quantities, and a demonstrated way to improve the accuracy of estimates is to aggregate multiple estimates made by the same person. The potential contained in such an intervention is enormous, and a key challenge is to identify strategies that can help improve the accuracy of people's aggregate estimates (Herzog & Hertwig, 2014a). In this article, we introduced the following strategy: Combine people's first estimate with their second estimate made from the perspective of someone they often disagree with. Across five experiments, we found evidence that such a strategy produces accurate estimates. These results underscore the importance of perspective taking and disagreement as strategies to improve the accuracy of people's quantitative estimates. The presented findings indicate the benefits of disagreement, a component of people's social interactions that is usually presented as undesirable (Kennedy & Pronin, 2008;Reeder et al., 2005;Sunstein, 2002). What is particularly interesting is that people obtained more accurate estimates by changing their perspective. It remains to be seen whether taking the perspective of any other people-say, experts in a particular fieldwould lead to similar benefits. This might be an important future research direction, as our findings demonstrate that taking the perspective of other people (e.g., an agreeing perspective) might not always render an increase in accuracy compared with simply making a second guess.
Although the inner crowd offered a gain in accuracy, we also identified a situation in which it backfired, leading to no improvement or even worse performance. We found this to be the case when a question's answer was close to the scale's end. Importantly, for participants who employed the disagreeing-perspective strategy, the accuracy of their average estimate was much worse than their first estimate for these types of questions. What is particularly interesting is that the propensity of people to move away from the answer when making second estimates is introduced through a feature of the situation rather than some innate bias (Gaertig & Simmons, 2021;Herzog et al., 2019;Müller-Trede, 2011).
Our research also has several limitations. First, the presented evidence is restricted to populations from the United States and United Kingdom, and future work needs to confirm whether these findings hold true in other parts of the world. Second, although combining initial estimates with second estimates made from a disagreeing perspective is beneficial, the presented research remains mute as to whether people would be willing to aggregate both estimates when given the opportunity (Fraundorf & Benjamin, 2014;Herzog & Hertwig, 2014b;Müller-Trede, 2011). Although prior work indicates that people are more likely to combine their estimates when they actively opposed themselves through dialectical bootstrapping (Herzog & Hertwig, 2014b), it remains to be seen whether this holds true when the opposition comes from someone with whom they often disagree. People typically view others holding opposing views and opinions less favorably (Iyengar & Westwood, 2015;Kennedy & Pronin, 2008;Reeder et al., 2005), potentially undermining their willingness to include the viewpoints of disagreeing others into their own judgments. Future research could address this issue in more detail by testing under what conditions people are willing to combine their estimates with the estimates of disagreeing others to obtain more accurate estimates.
On a final note, whereas previous studies often relied on natural processes such as forgetting or the passage of time to improve the accuracy of inner crowds, the present findings report a strategy that is more convenient and time efficient. Similar to other, more active interventions (Herzog & Hertwig, 2009;Litvinova et al., 2020;Winkler & Clemen, 2004), taking a disagreeing perspective can likewise be used as a potent strategy when people cannot benefit from the wisdom of an actual crowd. Overall, combining one's first estimate with a second estimate made from the perspective of disagreeing others proves to be a convenient and effective judgment tool.

Transparency
Action Editor: Marc J. Buehner Editor: Patricia J. Bauer

Author Contributions
Both authors contributed equally to the work presented in this article, wrote the manuscript, and approved the final manuscript for submission.

Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.

Supplemental Material
Additional supporting information can be found at http:// journals.sagepub.com/doi/suppl/10.1177/09567976211061321 Notes 1. Here, we kept the sign of the error because the size as well as the direction of the error are informative. 2. Following prior work on the inner crowd, we initially preregistered an intention to conduct simple t tests rather than mixed-effect models for Experiments 1a, 2, and 4. We made this change in response to a suggestion during the review process. 3. Using the mean absolute error produces the same results qualitatively. Mean-absolute-error results for all experiments can be found at https://osf.io/ewpyq/. 4. Note that in Experiment 1b, we also measured the time participants needed to generate their second estimates. Comparing this time between the self-and disagreeing-perspective conditions showed that there was no difference, d = 0.08, p = .21, Bayes factor favoring the null over the alternative hypothesis (BF 01 ) = 8.78. 5. When multiple experiments are conducted, the presence of some nonsignificant findings is to be expected given the nature of hypothesis testing (Lakens & Etz, 2017). To assess the overall evidential value of the prediction that the benefit of averaging is higher when one takes a disagreeing perspective, we aggregated the data of the same six questions from Experiments 2, 3, and 4. Results showed that, overall, participants in the disagreeing-perspective condition benefited more from averaging than participants in the agreeing-perspective condition (d = 0.18, p < .001). 6. Note that there are also observations where the error of the average is identical to the error of the first estimate-that is, observations where X s moves toward X while X

and observations
where X s = f X = X a . These observations are not included in the equation because including them does not render any benefit or disbenefit.