People Endorse Harsher Policies in Principle Than in Practice: Asymmetric Beliefs About Which Errors to Prevent Versus Fix

Countless policies are crafted with the intention of punishing all who do wrong or rewarding only those who do right. However, this requires accommodating certain mistakes: some who do not deserve to be punished might be, and some who deserve to be rewarded might not be. Six preregistered experiments (N = 3,484 U.S. adults) reveal that people are more willing to accept this trade-off in principle, before errors occur, than in practice, after errors occur. The result is an asymmetry such that for punishments, people believe it is more important to prevent false negatives (e.g., criminals escaping justice) than to fix them, and more important to fix false positives (e.g., wrongful convictions) than to prevent them. For rewards, people believe it is more important to prevent false positives (e.g., welfare fraud) than to fix them and more important to fix false negatives (e.g., improperly denied benefits) than to prevent them.


Research Article
Governments, firms, and countless other institutions frequently use various types of punishment and reward policies.However, no policy is perfect.Sometimes, those who deserve to be punished or rewarded are not (false negatives); at other times, those who do not deserve to be punished or rewarded are (false positives).Which errors do people believe are worse, when, and why?
In this research, we develop a generalizable framework describing preferences regarding these errors (see Table 1).Specifically, we find that preferences to address false positives versus false negatives vary along two dimensions: (a) whether errors are considered before or after they occur and (b) whether they pertain to punishments or rewards.
For example, suppose an insurance company decides to increase the premiums charged to unsafe driversa punishment.Two types of mistakes are possible: some safe drivers might have their premiums raised (false positives), whereas some unsafe drivers might not (false negatives).We find that people believe it is more important to prevent false negatives than to fix them and more important to fix false positives than to prevent them.
For rewards, the opposite pattern holds.For example, suppose instead that the same insurance company decides to reduce the premiums charged to safe drivers-a reward.Some unsafe drivers might have their premiums reduced (false positives), whereas some safe drivers might not (false negatives).Here, we find that people believe it is more important to prevent false positives than to fix them and more important to fix false negatives than to prevent them.
To help explain these patterns, we first note that for punishments and rewards alike, false positives and false negatives can either harm "good actors" (those who do not deserve to be punished but are, and those who deserve to be rewarded, but are not) or help "bad actors" (those who deserve to be punished but are not, and those who do not deserve to be rewarded, but are).This common denominator matters because we expect people to naturally relate more to good actors harmed than to bad actors helped (Chambers & Davis, 2012;Sedikides et al., 2003).In other words, people can more easily imagine themselves as someone who does not deserve to be punished or deserves to be rewarded (as opposed to someone who deserves to be punished or does not deserve to be rewarded).We therefore propose that the good actors harmed will be relatively more vivid than the bad actors helped.
Second, judgments about errors to prevent versus fix can be conceptualized as judgments about the future versus the past.Critically, past outcomes are more accessible and concrete than future outcomes (Caruso et al., 2008;Kane et al., 2012;Tversky & Kahneman, 1973;Van Boven & Ashworth, 2007).Therefore, we additionally propose that when errors have already occurred (i.e., when considering errors to fix), there will be larger differences in vividness between good actors and bad actors relative to when errors have not yet occurred (i.e., when considering errors to prevent).Even if the latter outcomes are certain, they are less accessible and concrete because they are unrealized (Small & Loewenstein, 2005).
To illustrate, consider again that for punishments (e.g., increasing premiums charged to unsafe drivers), two types of errors are possible: those who deserve to be punished might not be (false negatives) and those who do not deserve to be punished might be (false positives).After these errors occur, people can more easily imagine themselves as a safe driver who had their rates raised by mistake (i.e., as a good actor) than as an unsafe driver who did not (i.e., as a bad actor).

False negatives
Those who deserve to be rewarded will not be rewarded

<
Those who deserved to be rewarded were not rewarded More important to fix improperly denied benefits than to prevent these mistakes from happening in the first place

Statement of Relevance
Administering punishments and rewards inevitably requires resolving trade-offs between different types of errors, and there is often considerable debate about which are the most problematic.For example, some promote aggressive law-enforcement tactics (e.g., "tough-on-crime" policies) out of concern for false negatives, whereas others seek exoneration of the wrongfully convicted out of concern for false positives (e.g., support for The Innocence Project).This research develops a generalizable framework for understanding these beliefs.Specifically, we find that people are concerned with different errors when evaluating proposed policies (i.e., considering errors to prevent) than when evaluating existing policies (i.e., considering errors to fix), depending on whether such policies pertain to punishments or rewards.Consequently, framing a policy one way or another (e.g., describing affirmative action as rewarding the underrepresented or punishing the overrepresented) can similarly shift preferences.This research accordingly provides a novel theoretical lens for understanding real-world phenomena spanning political, managerial, and marketing contexts.
Similarly, for rewards (e.g., reducing premiums for safe drivers), two types of errors are possible: those who deserve to be rewarded might not be (false negatives) and those who do not deserve to be rewarded might be (false positives).After these errors occur, people can more easily imagine themselves as a safe driver who did not have their rates reduced by mistake (i.e., as a good actor) than as an unsafe driver who did (i.e., as a bad actor).However, in both cases, before these errors occur, these same unrealized outcomes are less vivid.Altogether, the hypothesized differences in vividness led us to predict that the most concerning types of errors will be those that both (a) harm good actors and (b) have already happened.Indeed, we find that people maintain stronger preferences for fixing false-positive punishments than for preventing them and stronger preferences for fixing false-negative rewards than for preventing them.The overall effect is the endorsement of harsher policies in principle (before errors occur, when people focus more on preventing mistakes that help bad actors) than in practice (after errors occur, when people focus relatively more on fixing mistakes that harm good actors).

Open Practices Statement
We report 14 preregistered experiments (six in the main text, eight in the appendix available online; total N = 7,278; see Table 2).All sample sizes were set a priori and were sufficient to detect a small interaction ( f 2 = 0.06) with 80% power.This threshold was set on the basis of a pilot experiment that revealed a small interaction ( f 2 = 0.06, 95% confidence interval, or CI = [0.02,0.12]) and a subsequent power analysis that suggested that at least 47 participants per cell would be required to detect it.We therefore conservatively targeted a sample size of at least 50 participants per cell across all experiments.
Finally, all experiments met the ethical requirements and legal guidelines of the University of California, Los Angeles's Institutional Review Board.

Method
Participants.We recruited 300 participants from the behavioral laboratory at the University of California, Los Angeles's Anderson School of Management.We excluded participants who did not complete the survey in its entirety, yielding a final sample of 296 (age: M = 22.60 years, SD = 5.29 years; gender: 73% female, 25% male, 2% nonbinary).
For the punishment policy, participants read one of the following prompts: 1. Punishments-Prevent: "A policy is being designed to punish people.It will result in two mistakes.
Only one of these mistakes can be prevented."2. Punishments-Fix: "A policy was designed to punish people.It resulted in two mistakes.Only one of these mistakes can be fixed." We then asked: "Which mistake should be [prevented/fixed]?"Participants selected between "10 individuals [will be/were] punished, but they [will not deserve it/did not deserve to be]" (false positives) and "10 individuals [will deserve/deserved] to be punished, but they [will not be/were not]" (false negatives).
For the reward policy, participants read one of the following prompts: 3. Rewards-Prevent: "A policy is being designed to reward people.It will result in two mistakes.
Only one of these mistakes can be prevented."4. Rewards-Fix: "A policy was designed to reward people.It resulted in two mistakes.Only one of these mistakes can be fixed." We then asked: "Which mistake should be [prevented/fixed]?"Participants selected between "10 individuals [will be/were] rewarded, but they [will not deserve it/did not deserve to be]" (false positives) and "10 individuals [will deserve/deserved] to be rewarded, but they [will not be/were not]" (false negatives).
The prevent and fix frames were thus identical across the punishment and reward policies save for references to "punish" or "punished" and "reward" or "rewarded," respectively.There were no other differences across conditions.We counterbalanced the order of all choices.

Results
We coded false-negative choices as 1 and false-positive choices as 0, and we contrast-coded both policy (-1 for punishments; +1 for rewards) and frame (-1 for prevent; +1 for fix).

Discussion
Experiment 1 revealed the predicted asymmetry.Additionally, in a supplemental experiment that was otherwise identical to Experiment 1, we replicated this asymmetry when treating frame as a within-subjects factor (i.e., participants indicated which errors to both prevent and fix; see Supplemental Experiment 1 in the appendix available online).Experiment 2 accordingly tests more naturalistic scenarios.
Procedure.Experiment 2 employed a 2 (policy: punishment vs. reward; between-subjects) × 2 (frame: prevent vs. fix; between-subjects) × 3 (scenario: pay vs. insurance vs. taxes; within-subjects) mixed design.We tested multiple scenarios to bolster generalizability but did not predict any systematic differences across them.Each participant responded to three scenarios on three separate pages (see Table 3 and Table 4).For the punishment policy, the first scenario (pay) described docking pay for poor work performance, the second scenario (insurance) described assessing surcharges for unsafe driving, and the third scenario (taxes) described levying fines for using too much water during a drought.For the reward policy, the first scenario (pay) described issuing bonuses for good work performance, the second scenario (insurance) described the provision of discounts for safe driving, and the third scenario (taxes) described the issuance of tax credits for conserving water during a drought.As in Experiment 1, we described false positives and false negatives, and participants indicated which type of error should be prevented or fixed.We counterbalanced the order of all choices.

Results
We coded false-negative choices as 1 and false-positive choices as 0, and we contrast-coded both policy (-1 for punishments; +1 for rewards) and frame (-1 for prevent; +1 for fix).We then took the mean of each participant's choices across the three within-subjects scenarios.Note that this analysis deviated from our preregistration, which called for a mixed model.However, because all simple effects and two-way policy-frame interactions were significant and consistent with our predictions, we elected to present this simpler analysis (i.e., collapsing our results over scenario).

Discussion
Experiment 2 conceptually replicated Experiment 1 with richer stimuli, bolstering generalizability.In For punishments, participants believed it was more important to prevent false negatives than to fix them, and more important to fix false positives than to prevent them; for rewards, participants believed it was more important to prevent false positives than to fix them, and more important to fix false negatives than to prevent them.Numbers above each bar correspond to the stimuli outlined in the procedure; error bars represent 95% confidence intervals.
Experiment 3, we tested whether merely framing a policy as a punishment or reward shifts beliefs similarly.

Experiment 3
We described affirmative action as either rewarding the underrepresented or punishing the overrepresented.Past research has similarly framed these policies as either helping minority or harming majority individuals and groups (Crosby et al., 2003;Lowery et al., 2006;Munguia Gomez & Levine, 2022).
For the punishment policy, participants read, "An organization [is planning to implement/has implemented] an affirmative action policy to punish people from over-represented backgrounds.However, this policy [will result/has resulted] in two types of mistakes."We then asked, "Which mistake should be [prevented/fixed]?"Participants selected between false negatives ("Some people from over-represented backgrounds [will not be/were not] punished (even though they [will deserve/deserved] to be punished)") and false positives ("Some people from under-represented backgrounds [will be/were] punished (even though they [will not/did not] deserve to be punished)").
For the reward policy, participants read: "An organization [is planning to implement/has implemented] an affirmative action policy to reward people from underrepresented backgrounds.However, this policy [will result/has resulted] in two types of mistakes."We then asked: "Which mistake should be [prevented/fixed]?"Participants selected between false negatives ("Some people from under-represented backgrounds [will not be/were not] rewarded (even though they [will deserve/deserved] to be rewarded)") and false positives ("Some people from over-represented backgrounds [will be/were] rewarded (even though they [will not/did not] deserve to be rewarded)").We counterbalanced the order of all choices.

Results
We coded false-negative choices as 1 and false-positive choices as 0, and we contrast-coded both policy (-1 for punishment; +1 for reward) and frame (-1 for prevent; +1 for fix).

Pay
Ten of the 100 employees whose pay will be docked will not deserve it (that is, they will not in fact be low performers).
[FP] Ten additional employees will deserve to have their pay docked (that is, they will in fact be low performers), but will not. [FN] Ten of the 100 employees whose pay was docked did not deserve it (that is, they were not in fact low performers).
[FP] Ten additional employees deserved to have their pay docked (that is, they were in fact low performers), but did not. [FN] Ten of the 100 employees who will receive a bonus will not deserve it (that is, they will not in fact be high performers).
[FP] Ten additional employees will deserve a bonus (that is, they will in fact be high performers), but will not receive it. [FN] Ten of the 100 employees who received a bonus did not deserve it (that is, they were not in fact high performers).
[FP] Ten additional employees deserved a bonus (that is, they were in fact high performers), but did not receive it.
[FN] Insurance Ten of the 100 policyholders whose premiums will be increased will not deserve it (that is, they will not in fact be reckless drivers).
[FP] Ten additional policyholders will deserve to have their premiums increased (that is, they will in fact be reckless drivers) but will not. [FN] Ten of the 100 policyholders whose premiums were increased did not deserve it (that is, they were not in fact reckless drivers).
[FP] Ten additional policyholders deserved an increase in their premiums (that is, they were in fact reckless drivers), but did not receive it. [FN] Ten of the 100 policyholders whose premiums will be reduced will not deserve it (that is, they will not in fact be safe drivers) [FP] Ten additional policyholders will deserve a reduction in their premiums (that is, they will in fact be safe drivers), but will not receive it. [FN] Ten of the 100 policyholders whose premiums were reduced did not deserve it (that is, they were not in fact safe drivers).
[FP] Ten additional policyholders deserved a reduction in their premiums (that is, they were in fact safe drivers), but did not receive it.
[FN] Taxes Ten of the 100 residents who will be fined will not deserve it (that is, they will not have in fact gone over their allotment).
[FP] Ten additional residents will deserve to be fined (that is, they will have in fact have gone over their allotment), but will not be. [FN] Ten of the 100 residents who were fined did not deserve it (that is, they did not in fact go over their allotment).
[FP] Ten additional residents deserved to be fined (that is, they did in fact go over their allotment), but were not.[FN] Ten of the 100 residents who will receive a credit will not deserve it (that is, they will not have in fact remained below their allotment).
[FP] Ten additional residents will deserve to receive a credit (that is, they will have in fact remained below their allotment), but will not.[FN] Ten of the 100 residents who received a credit did not deserve it (that is, they did not in fact remain below their allotment).
[FP] Ten additional residents deserved to receive a credit (that is, they did in fact remain below their allotment), but did not.[FN] Note: FP = false positives; FN = false negatives.Designations were not shown to participants.

Discussion
Experiments 1 through 3 offer convergent evidence for the basic effect.In a second supplemental experiment, we also elicited preferences via titration (as opposed to forced choice), and the resulting "exchange rates" conceptually replicated the results observed in Experiments 1 through 3 (see Supplemental Experiment 2 in the appendix available online).We therefore designed Experiment 4 to probe one potential mechanism: the relative vividness of good actors to bad actors.
Participants reviewed the pay scenario from Experiment 2. After indicating which type of error to prevent or fix, participants rated, on a separate page, the vividness of those affected by each type of error using measures adapted from Keller and Block (1997).Specifically, we asked, "How vivid are these salespeople?";"How personal are the stories of these salespeople?";"How concrete do these salespeople feel?"; "How easy is it to imagine these salespeople?";"How easy is it to relate to these salespeople?";and "How easy is it to picture these salespeople?"(each rated on a scale ranging from not at all, 1, to extremely, 7).We counterbalanced the order of all choices.

Results
We coded false-negative choices as 1 and false-positive choices as 0, and we contrast-coded both policy (-1 for Fig. 2. Results from Experiment 2. Across the managerial context (pay), the consumer context (insurance), and the policy context (taxes), participants believed that for punishments it was more important to prevent false negatives than to fix them and more important to fix false positives than to prevent them; for rewards, participants believed it was more important to prevent false positives than to fix them and more important to fix false negatives than to prevent them.Error bars represent 95% confidence intervals.
Next, for each participant and for each type of error, we computed absolute vividness scores by averaging all six scale items (α = .92).Then, to construct a relative vividness score for each participant, we subtracted the absolute vividness score for the false positives from the absolute vividness score for the false negatives.

Discussion
Although there are numerous well-documented limitations associated with the use of statistical mediation to collect psychological process evidence (e.g., Imai et al., 2010;Rohrer et al., 2022), the results of Experiment 4 offer initial, suggestive evidence for the potential role of relative vividness.We comment on additional possible processes in the General Discussion.Our final experiments test a boundary condition.

Experiments 5a and 5b
If the relative vividness of good actors to bad actors helps explain, in part, different preferences regarding which errors to prevent or fix, then describing policies that generate false positives and false negatives without yielding corresponding good actors or bad actors should attenuate the effect.We thus manipulated whether a program was intended to motivate reduced water use (via punishments and rewards) or simply measure it.

Method
Participants.Given that all simple effects of frame (e.g., prevent vs. fix) within the punishment and reward conditions in Experiments 1 through 4 were significant (and in opposite directions), we tested punishments and rewards separately in Experiments 5a and 5b to maximize statistical power.For Experiment 5a, we opened an MTurk HIT for 400 participants.We excluded participants who failed a preregistered attention check, yielding a final sample of 360 (age: M = 40.77years, SD = 12.06 years; gender: 55% female, 44% male, 1% nonbinary).For Experiment 5b, we opened an MTurk HIT for 400 workers.We excluded participants who failed a preregistered attention check, yielding a final sample of 363 (age: M = 40.69years, SD = 14.04 years; gender: 50% female, 50% male, 0% nonbinary).
Procedure.Experiments 5a and 5b both employed a 2 (frame: prevent vs. fix) × 2 (goal: motivation vs. no motivation) between-subjects design and adapted the taxes scenario from Experiment 2.
Experiment 5a tested punishments, and Experiment 5b tested rewards.All participants in the motivation condition of each experiment first read, "Smallville [is experiencing/recently experienced] a drought, and as a result the local government [will be piloting/piloted] a program in which they [will test/tested] whether they [can/could] effectively encourage households to reduce their water use by collecting water usage information from 'smart meters.'Smallville's local government [plans to randomly select/randomly selected] 100 households to join the program based upon the last digit of their telephone number." In the motivation condition of Experiment 5a, participants learned that "at the end of [the coming year/ the year], the government [will assess/assessed] a fine of $500 upon any household which [increases/increased] their water use by 25% or more."Participants then chose to prevent or fix either false negatives ("10 households [will deserve/deserved] to be fined $500 but [will not be/were not]") or false positives ("10 of the households that [will be/were] fined $500 [will/did] not actually deserve it").In the motivation condition of Experiment 5b, participants learned that "at the end of [the coming year/the year], the government [will issue/issued] a rebate of $500 to any household which [reduces/ reduced] their water use by 25% or more."Participants then chose to prevent or fix either false negatives ("10 households [will deserve/deserved] to be issued a rebate of $500 but [will not be/were not]") or false positives ("10 of the households that [will be/were] issued a $500 rebate [will/did] not actually deserve it").
The no motivation conditions in Experiments 5a and 5b were identical.All participants first read: "Smallville [will be piloting/piloted] a program in which they [will test/tested] whether they [can/could] effectively collect water usage information from 'smart meters.'Smallville's local government [plans to randomly select/randomly selected] 100 households to join the program based upon the last digit of their telephone number."Participants then chose to prevent or fix either false negatives ("10 households should [be/have been] enrolled in the program but [will not be/were not]") or false positives ("10 of the households that [will be/were] enrolled in the program should not [be/have been]").We counterbalanced the order of all choices in Experiments 5a and 5b.

Results
We coded false-negative choices as 1 and false-positive choices as 0, and we contrast-coded both frame (-1 for prevent; +1 for fix) and goal (-1 for no motivation; +1 for motivation).

Discussion
Experiments 5a and 5b suggest that our account does not extend to any policy that generates false positives and false negatives, revealing an important boundary condition: absent good actors and bad actors who are erroneously harmed and helped, the effect is attenuated.

General Discussion
When administering punishments and rewards, mistakes happen.Do some mistakes seem worse than others?This research finds that the answer depends not only on whether mistakes pertain to punishments or rewards but also on whether they are evaluated before or after they occur.
It thus offers a novel theoretical backdrop for understanding many real-world policy debates.For example, in a series of supplemental studies, participants chose between unspecified numbers of each type of error in the context of 10 real-world punishment and reward policies (see Supplemental Experiments 3 and 4 in the appendix available online).Even in this less controlled setting, we observed patterns directionally consistent with our laboratory experiments (though not all differences were significant; see Figs. 5 and 6).These findings potentially suggest that at least some disagreement about these issues may relate to how (or when) they are evaluated.
Notably, across experiments, the asymmetry seems to have been largely driven by beliefs about which errors to fix-that is, all choice shares for fixing, but not for preventing, differed significantly from 50%.Though we do not make normative claims regarding whether these preferences constitute a mistake, people may indeed have stronger convictions about which errors to fix than to prevent.If so, public support might be higher for policies that hew closer to the preferences observed in the fix frames.Additionally, it is unclear whether people themselves view these inconsistencies as problematic, given that we replicated these patterns when asking participants to indicate which errors to both prevent and fix simultaneously, using a within-subjects design (see Supplemental Experiment 1 in the appendix available online; Nielsen & Rehbeck, 2022).
Separately, false-positive and false-negative errors arise in numerous contexts, raising a natural question about generalizability.For example, the replication crisis fundamentally reflects a tension between the types of errors that the scientific enterprise has chosen to prevent versus fix (Ioannidis, 2005); medical testing requires calibrating tolerance for false-positive and false-negative results; markets that pick winners and losers sometimes err.Although the results of Experiments 5a and 5b suggest that these beliefs may be context dependent, it is still an open question whether they arise in other settings.
Several limitations warrant discussion and suggest other potential mechanisms.For example, inferences about the certainty of outcomes might have differed across our studies.We presented information unambiguously (e.g., "10 individuals will be punished, but they will not deserve it") to cleanly test our predictions, psychological phenomena beyond Western cultures (Henrich et al., 2010;Thalmayer et al., 2021).Future work might therefore explore cross-cultural differences.Moreover, because punishments and rewards can be reframed as losses and gains, respectively, a natural consideration is the potential role of prospect theory and the related concept of reference dependence (Kahneman & Tversky, 1979).For example, participants may have hesitated to claw back false-positive rewards because doing so would have imposed losses on certain individuals.However, if participants were concerned about imposing losses, then in the context of punishments they should not have been less willing to prevent false positives than to fix them.Reference dependence thus seems limited in its ability to parsimoniously explain our results.

Conclusion
Sir William Blackstone famously posited that "it is better that ten guilty persons escape than that one innocent suffer" (Blackstone, 1769).However, the present research suggests that this claim is incomplete.These preferences also depend on whether errors are considered before or after they occur, and whether they pertain to punishments or rewards.Our findings thus provide a framework for understanding seemingly inconsistent perceptions about various types of policies and offer critical insights for policymakers, managers, and marketers (among others).

Fig. 1 .
Fig.1.Results from Experiment 1.For punishments, participants believed it was more important to prevent false negatives than to fix them, and more important to fix false positives than to prevent them; for rewards, participants believed it was more important to prevent false positives than to fix them, and more important to fix false negatives than to prevent them.Numbers above each bar correspond to the stimuli outlined in the procedure; error bars represent 95% confidence intervals.

Fig. 3 .
Fig.3.Results from Experiment 3. Framing affirmative action as either "rewarding" the underrepresented or "punishing" the overrepresented yielded asymmetric beliefs about which types of errors should be prevented versus fixed.Error bars represent 95% confidence intervals.

Fig. 4 .
Fig. 4. Results from Experiments 5a and 5b.When a program that was intended to measure water use (as opposed to motivating reduced water use) yielded false positives and false negatives, but no good actors or bad actors, the effect attenuated.Error bars represent 95% confidence intervals.

Table 1 .
An Asymmetry Between the Types of Errors People Prefer to Prevent Versus Fix

Table 2 .
Overview of Experiments