Optimising collective accuracy among rational individuals in sequential decision-making with competition

Theoretical results underpinning the wisdom of the crowd, such as the Condorcet Jury Theorem, point to substantial accuracy gains through aggregation of decisions or opinions, but the foundations of this theorem are routinely undermined in circumstances where individuals are able to adapt their own choices based after observing what other agents have chosen. In sequential decision-making, rational agents use the choices of others as a source of information about the correct decision, creating powerful correlations between different agents’ choices that violate the assumptions of independence on which the Condorcet Jury Theorem depends. In this paper, I show how such correlations emerge when agents are rewarded solely based on their individual accuracy, and the impact of this on collective accuracy. I then demonstrate how a simple competitive reward scheme, where agents’ rewards are greater if they correctly choose options that few have already chosen, can induce rational agents to make independent choices, returning the group to optimal levels of collective accuracy. I further show that this reward scheme is robust, offering improvements to collective accuracy across wide range of competition strengths, suggesting that such schemes could be effectively implemented in real-world contexts to improve collective wisdom.


Introduction
It has long been recognised that aggregating the opinions, estimates or decisions of many individuals can give superior results compared to relying on a single individual alone [1,2].Sometimes termed The Wisdom of Crowds [3], such aggregation is a simple but potentially powerful example of collective intelligence, and one that acts as both a justification for democratic decision-making institutions [4,5] and a motivation for utilising fora such as social media to harness the potential of global collective knowledge [6].
The Condorcet Jury Theorem (CJT) [1,7] demonstrates that collective accuracy, in the form of a majority vote, can far exceed individual accuracy under an idealised assumption that agents choose independently.While the CJT has motivated many appeals to The Wisdom of Crowds (e.g.[4,8,9]), in reality this independence assumption is routinely violated in collective decision-making scenarios where agents are able to observe each other and use social information to motivate their own choices [3].The Wisdom of Crowds requires that a group must effectively aggregate the private information held by its members, but information cascades can result from social learning, such that within a group a large proportion of individuals simply follow the decisions made by others, without reference to any private information they may have [10].Empirical studies have demonstrated how readily humans copy the actions of others [11,12,13], in common with other animals [14], when those actions are readily observable.The tendency of agents to follow the decisions of others can be rational from an individual perspective [10,15,16,17], but such self-reinforcing cascades of social information can cause very large scale errors in collective judgement, as illustrated anecdotally in the historical examples given by Mackay in 'Extraordinary Popular Delusions and the Madness of Crowds' [18].Scientific study also suggests that, under controlled conditions, allowing individuals to update their own beliefs in the light of observing others tends to reduce the accuracy of collective estimations [19], even as it increases the average accuracy of individual agents [17].
The dangers of relying on social information are highly pertinent since sequential decision-making is common across a wide variety of domains.We often choose what to buy, where to eat, or even how to vote based on the choices or expressed opinions of others before us.Sequential decisions may be present even when a system is designed to elicit individuals' independent decisions.Consider for example the case of formalised peer-review of scientific publications or grant proposals.Here, reviewers apparently provide their reviews independently, but this ignores the effect of author status, which provides a proxy for the decisions of past reviewers of the same author.Likewise, characteristics such as the fame of an individual, or the market share of a product, may serve to indicate a preponderance of past choices made, even if these are not directly observed.Advertising that points to the number of users or consumers of a product is suggestive of the influence such past decisions can have on future purchases.
Given the prevalence of sequential decision-making across many areas in which we may wish to access collective knowledge, how might we overcome its deleterious effects upon collective wisdom?One potential solution is the introduction of competition between agents who make the same choice [20,21], thus penalising agents who follow others.Previous work on sequential decision-making [22,23,16,24] has assumed that rewards are independent of which choices other agents make, with such choices being useful only as a source of information about the rewards available in the environment.In this paper I extend this framework to allow for rewards that depend intrinsically upon the choices made by others, such that an option may become more or less rewarding based on how many other agents have also chosen it.Using this model, I show how social information in the absence of competition can reduce the collective accuracy of a group, and how introducing competition in the form of diminishing rewards for options already chosen by other agents can eliminate correlations between agents' choices, and return the group to an optimal level of collective accuracy.

Model
I consider a binary choice scenario with potential options labelled as A and B. In any given choice, one option is 'correct' and the other is 'incorrect'.This scenario is similar to that in [16], and the model described below largely follows the framework developed in that paper.
Agents sequentially choose either A or B, and are able to observe all choices made before their own, such that these such constitute common knowledge [25].The choice made by individual i can be labelled as chosen, and a sequence of k decisions S is an ordered series C 1 , C 2 , . . .C k .The collective decision is defined as the majority choice when all n agents have decided, and for simplicity I consider only cases where n is odd so there are no tied collective decisions.
Agents choose between the two options based on reward criteria and their own inferences about the probability that each option is correct, so as to maximise their expected reward.The true state of the world can be given by a variable x, which takes the value x = 1 if option A is correct, and x = −1 if option B is correct.All agents are assumed to share a common, symmetric and uninformative prior about the value of x: Agents are informed by two sources of information.The first is a noisy private signal ∆ i received independently by each agent i, with variance : where φ(•) is the standardised normal probability distribution function.The second source of information is the social information provided by the sequence of previous decisions S. Agents update their knowledge of x by performing Bayesian inference: where the equation above makes use of the assumed independence of private signals, and thus the independence of S and ∆ i conditioned on x.

Rewards
Agents are motivated to make accurate choices by a retrospective reward policy that assigns rewards once the true correct choice is known.A simple and intuitive policy is to reward agents if they made the correct choice, thus motivating each individual to be as accurate as possible.This can be labelled as 'binary' rewards, in common with previous models on simultaneous decision-making [20,21], since agents receive a reward of either zero or one (in some standardised reward units) for each choice.This reward policy can be defined mathematically via a reward function r(C i , S, x) that depends on the choice, C i , made by individual i, the sequence of past decisions S and the true state of the world x, with binary rewards being defined as: where δ l,k is the Kronecker delta function.
A binary reward function attributes rewards based solely on the accuracy of an individual's choice, and is independent of the decisions made by others.More generally we can consider a reward scheme that depends on past decisions that the agent can observe.A simple way to do this is to make modulate the reward with a function that depends on the individual choice and S: r general (C i , S, x) = f (C i , S)δ Ci,x .
(5) This continues to reward (and thus incentivise) accuracy through the δ Ci,x term, but can also directly reward or penalise choosing the same option as others, thus incentivising either conformity or diversity of choices.

Rational individual choice
Given a reward function r(C i , S, x) = f (C i , S)δ Ci,x , an agent can evaluate the expected reward E(r A | S, ∆ i ) from choosing A, conditioned on the available private and social information: and similarly for choosing B: According to the principle of expected reward maximisation, a rational agent will then select A if and only if a tied expectation has zero probability mass).Using the general reward function above, this condition simplifies to: That is, for an agent to choose A, its assessment of the difference in probability for A to be correct rather than B must outweigh any penalty it receives for choosing A over B based on the past decisions.
A feature of the above decision-making procedure is that there exists some critical value of an agent's private information, ∆ * i , which would make the expected reward of choosing A or B equal: This implies that agent i will choose A if and only if ∆ i > ∆ * i .Substituting the definition of the expected reward and the conditional probability P (x | ∆ i , S), this gives: We can recognise that ), and thus the expression above can be rearranged to give: Since subsequent agents are able to observe the value of S that agent i was responding to, they can calculate the corresponding value of ∆ * i .Combined with observing the decision agent i makes, this enables them to infer whether agent i's private information was greater than or less than this threshold value.The probability of a sequence S, conditioned on x, can therefore be evaluated with reference to each of the thresholds calculated for the previous agents: where Φ(•) is the cumulative probability function of the standard normal distribution.Since the thresholds depend themselves on the past sequence of decisions, the probability of a sequence can be evaluated recursively by calculating the threshold for each sub-sequence.

Social response under binary rewards
The influence of social information on decision making can be characterised by observing its effect on both individual decisions and on the aggregate outcomes in groups.A simple way to visualise the influence of previous choices on an individual's decision is via the probability that a focal individual will choose the correct option, arbitrarily taken to be option A, conditioned on there having previously been n A and n B agents choosing A and B respectively.This probability is shown in Figure 1A, assuming that agents are responding to binary rewards (f Because the decision to choose A or B depends in theory on both the full sequence of previous choices and the agent's private information, the probability shown in this figure is a weighted average over all sequences consistent with specified values of n A and n B , and all possible values of the focal agent's private information: where the summation is over the set of all sequences with n A and n B individuals choosing A and B. This figure shows that agents respond strongly to the decisions made by others, such that the probability to choose A is highly dependent on the values of n A and n B .In particular, in most cases where n B > n A the focal agent is less likely to choose the correct option than they would be if they chose independently; the red contour lines indicates this independent choice probability.This implies that incorrect decisions by agents at the beginning of the sequence can lead to a cascade of later agents also making incorrect choices.This is reflected in distribution of aggregate outcomes at the group level, characterised by the probability that n A agents will select option A in total.This is plotted in Figure 1B for both the case of independent decisions (red bars) and for agents using social information with binary rewards (blue bars).This plot shows the dramatic difference in aggregate outcomes that results from social information use.When agents choose independently the aggregate outcomes are clustered in a binomial distribution that peaks at the the mean value of n A = nΦ(1/ ), with a very low probability that fewer than half the agents choose A. Under social information the aggregate outcomes become bimodal, with a large peak at n A = n and a secondary peak at n A = 0.The result of this is that the mean number of correct decisions increases (compare the blue and red dashed lines), but there is a much greater probability that a majority of agents will choose the incorrect option (B).As such, each individual is more likely to choose the correct option, but the majority choice of the group is less likely to be correct.

Condorcet-retrieving reward function
Under binary rewards, agents tend to follow past decisions with increasing strength over the course of a sequence of choices (Figure 1A).Since this breaks the assumption of independence in the CJT, it also reduces the accuracy of collective decisions as defined by the majority choice, as shown in Figure 1B.To improve collective accuracy it is therefore necessary to reduce the correlation between decisions.If agents make choices independently, this implies that the threshold value of ∆ * S should be independent of the value of S. Since agents begin with a symmetric prior P (x = 1) = 1/2, it further implies that this threshold must be zero -i.e.agents will choose A or B based solely on the direction of their private information.One can therefore retrieve independent choices, and thus the accuracy implied by the CJT, by seeking a reward function r condorcet (C i , S, x) such that: Expanding the definition for the expected reward, this gives: By substituting Bayes rule for the conditional probability of x, we get: By construction, under this Condorcet reward scheme, all thresholds for private information are zero.The probability P (S | x) thus simplifies to a product of independent choices: where a is the number of agents who have previously chosen A and b the number who have chosen B. Substituting this expression and recognising that p( , we therefore get: This expression can be simplified by defining q = Φ(1/ ) as the probability that a single agent will independently choose the correct option.This then reduces to: where Q = q/(1 − q).This expression can be satisfied by a reward scheme: This expression shows that rational agents can be motivated to make independent choices if the rewards for each option are reduced geometrically with the number of agents that have already chosen that option.This is a very convenient reward system for several reasons: First, it is symmetric in the way it treats both options, so neither option needs to be arbitrarily favoured or penalised.Second, the form of the required penalty for each option depends only on the number of agents that have previous chosen it, so these penalties can be implemented locally without reference to the number choosing the other option, or the order in which those choice were made.Third, it resembles a form of competition, with each agent exhausting a fixed proportion of the potential reward remaining for the option it chooses.The geometric reduction in rewards means that for any group size the total rewards available from each option are bounded by: ) Similarly, the expected total reward can be calculated as: Any system that assigns rewards under this scheme can therefore estimate and bound the total rewards it would potentially need to allocate.It is notable that high values of Q indicate problems that are relatively simple for individual decision makers, and these represent the lowest expectation and bound on total rewards; this naturally allows a reward system to allocate the greatest reward budget to the most difficult problems.

Robustness of collective accuracy under varying competition
The reward scheme derived above is constructed so as to maximise the accuracy of the majority choice by making individual rational decisions statistically independent, and it accomplishes this through imposing a specific form of competitive penalty.As discussed above, this form of competitive penalty has many agreeable features for implementation in real world decisions problems.However, selecting the precise strength of the competitive penalty requires knowing in advance how difficult the decision problem is, i.e. knowing the value of Q.In general it is unlikely that this would be precisely known in advance, although a system designer may have some intuition about whether a given decision is easy or difficult.As such, it is important to assess how robust such reward system is to misspecification of the competition strength.To do this, we can evaluate the collective accuracy under a reward scheme with variable competition strength β: where we know from the above argument that the optimum value of β should be Q.Under this reward scheme, the relation for critical thresholds given by equation 11 can be simplified and evaluated efficiently as: where n A and n B are the number of decisions for A and B respectively within the sequence S. Recognising the recursive pattern this further simplifies to: The expected collective accuracy under this reward scheme can be evaluated by directly calculating the expected proportion of accurate majority decisions as a function of the adjustable competition parameter β.This is done by calculating the probability of every possible sequence of decisions in a group of n agents (hence 2 n possible sequences) for x = 1 and summing the probability of that set of sequences where the majority of decisions are for the correct option A: where P (S | x = 1) is given by evaluating equation 12.
Figure 2A shows the collective accuracy as a function of β for group sizes from n = 3 to n = 25 with an environmental noise level set of = 2.32, implying q = 2/3 and Q = 2 (i.e.individuals will make the correct choice twice as often as the wrong choice when choosing alone).This demonstrates a clear peak in accuracy in each case at the expected value of β = 2, indicated by the dashed red line.At this optimum point collective accuracy matches that expected from the CJT.While a range of values of β > 1 induce greater collective accuracy than under binary rewards (β = 1), value of β < 1, which reward agents for copying past decisions, dramatically reduce collective accuracy.Figure 2B shows the individual accuracy for the same range of group sizes and competition strengths, demonstrating that increases in collective accuracy induced by competition lead to decreases in individual accuracy -collective accuracy is maximised when individual accuracy falls to that expected from a single agent without social information, as this is when agents choose independently.Average individual accuracy is maximise at values of β slightly greater than one, since under binary rewards each agent is motivated to maximise its own accuracy, without regard for the value of the social information it provides to those further along the sequence of decision makers (cf.[26]).Figure 2C shows the relationship between the collective accuracy achieved by the Condorcet reward scheme and that achieved without competition (binary rewards), showing that the effect is stronger in larger groups, which suffer relatively more from information cascades under binary rewards.Although collective accuracy is maximised when competition is optimised to produce independent decision making, there is a range of values of β which induce greater collective accuracy than under binary rewards, as seen in Figure 2A.The size of this range shows how well-tuned competition must be to generate improvements in collective accuracy, and thus is indicative of how plausible effectively implementing such a reward scheme might be in practice.Figure 2D shows the maximum value of β that outperforms binary rewards as a function of q, for group sizes from n = 3 to n = 25 (solid lines), as well as the optimal value of β for comparison (dashed line).Inherently easier decisions permit a greater range of effective competition strengths, and this range increases very rapidly as q approaches one (note the log scale on the y-axis).Larger groups also permit a wider range of effective competition strengths, even though the optimal competition strength does not depend on n.The maximum value of β for which competitive rewards outperform binary rewards, for varying group size and as a function of q (representing the probability for a solo agent to choose correctly).The dashed line shows the optimal value of β = Q for comparison.The range of effective competition values (those that improve on binary rewards) is greater for easier decisions and in larger group sizes.Note the logarithmic scale on the y-axis.

Discussion
When agents are rewarded solely for their individual accuracy they tend to follow previous decisions.While this increases the expected proportion of agents that make the correct choice, it reduces the probability that the majority of agents is correct compared to agents who make their decisions independently.Errors in early decisions can make subsequent decision-makers less accurate than they would have been alone.Hence, while on average individually beneficial, social information is deleterious to anyone seeking to use the Wisdom of Crowds by relying on the majority opinion.
Social information may potentially be restricted exogenously, by insisting that individuals make their choices without access to the choices made by others.However, such a scenario requires tight control of the information individuals have access to, and is unlikely to be plausible when making use of collective wisdom in real-world contexts such as online review systems and social media [6].
Here I have demonstrated that, among rational and selfish agents, a simple competitive reward scheme that reduces the rewards available from already-popular choices can, in theory, return a group to the accuracy implied by the Cordorcet Jury Theorem.This result depends on the assumption that the environmental information received by the agents is truly independent and is not systematically wrong, but effectively balances the expected gains of following social information by choosing the more popular option, and so prevents the information cascades that limit the collective accuracy of sequential decision-making.Under such a reward scheme, and within the assumptions of the model used here, agent's decisions become independent, and depend only on their private information.This increases the probability that the majority of the group will make the correct choice, albeit at the cost of making each individual somewhat less accurate on average.This paper has derived the optimal form and magnitude of this competition in the context of a model in which an agent observes the full sequence of previous decisions, but because agents' decisions become independent under the optimal competition it would retain the same form if agents instead observed simplified aggregate statistics regarding how many agents had made each choice [27].As such it is applicable across a wide range of domains where the nature of social information may vary.
Introducing competition that penalises agents for following popular choices is an established mechanism for motivating agents to make decisions that improve collective accuracy by reducing the correlation between different decision-makers [20,21], and is an important feature of markets as a forecasting mechanism, whether explicitly prediction markets [28], betting exchanges or financial markets.In this paper I have shown that competition can also fulfil this role in a sequential decision-making context where agents can observe the choices made by all those who decide before them and utilise that information in their own decision-making.While the optimal level of competition is unlikely to be known a priori for any given decision or decision-making system, sensitivity analysis shows that introducing a small degree of competition typically improves upon performance from binary rewards alone; in an adaptive system competitive pressure can thus be gradually raised to determine optimal performance.Except in very difficult decisions (q 0.5), competitive rewards are relatively forgiving to miscalibration, providing improvements on binary rewards across a wide scale of competition strengths.
The theoretical efficacy of competitive rewards raises the possibility that such incentives could be used to improve collective accuracy across a range of real world contexts.For example, the collective judgement of the scientific community (as reflected in majority expressed opinion) on issues where there is significant uncertainty could potentially be improved by systematically assigning greater rewards to those later proved correct when fewer others also expressed that opinion; these rewards might be in the form of promotions, research funding or simply scientific reputation.To some degree such competitive rewards already feature in many communities, and many scientists, economists and political pundits have made their reputation by advocating for a minority viewpoint that was later proved correct: a notable example is the case of Barry Marshall and Robin Warren, who won the 2005 Nobel Prize in Physiology or Medicine for their discovery of the link between H. pylori and stomach ulcers [29].However, other pressures that incentivise social and professional conformity are also common, such as what Irving Janis termed 'Groupthink' [30] the tendency to excessively value consensus with other group members.Conformity may also be imposed by systemic factors such as needing to convince others that your ideas are plausible before they can be explored [31].
From an external point of view, the results of this study suggest we should assign greater credibility to the collective wisdom of communities where such competitive rewards are the norm, motivating both accuracy and independence.Conversely, the collective wisdom of communities characterised by strong social norms of conformity (effectively negative competition) should be assigned lower credibility.Notably, although competitive, some communities such as political punditry rarely demand or reward specific, falsifiable predictions [32]; for competitive rewards to drive collective accuracy there must be penalties (or lack of reward) for inaccurate predictions, otherwise individuals are simply motivated to identify and state a unique opinion without regard for its accuracy.The optimal reward structure identified here requires that rewards are still contingent on accuracy.
Where the collective accuracy of group decisions (at least as expressed via majority voting) is highly desirable, we should seek to reduce pressures that induce conformity and introduce competitive rewards that motivate more independent judgements.However, as well as potentially causing social friction (if norms of social conformity are violated), this also comes at the cost of a likely reduction in individual accuracy.While the group may be more accurate, more individuals will be wrong.This highlights that systems of collective decision-making that aim for collective accuracy must not only seek to be tolerant of conflicting views, but must also tolerate a greater level of individual decision-making failure.
To be maximally effective such competitive rewards need to be predictable.Agents should be able to either rationally adjust their choices in the light of known reward schemes, or such rewards should be consistent enough to allow adaptation by reinforcement learning -a process which may take time to affect behavioural change [33,34,35].Rewards should also be well-tuned to the specific context (particularly the level of individual certainty, but also the size of the community).This suggests that competitive reward structures should be made more explicit, and calibrated through systematic trial and error in a particular community.The model presented here is theoretical and assumes that agents are either well informed about potential rewards and respond rationally, or reliably adapt via reinforcement based on experience.Such assumptions, and the efficacy of innovative collective decision-making systems are are ultimately empirical questions that must be tested experimentally.
Finally, this paper has considered problems in which agents seek to ascertain the answer to a question of external empirical fact, such as whether it will rain tomorrow, or which of two teams will a sporting event.It should be noted that some attempts to leverage the predictive power of social information instead focus on questions where the answer is endogenous to the community from which that social information is drawn.An example is the use of social media to predict which movies will attract large box office returns [36], since presumably the commenters on social media represent a sample of potential movie-goers.In these cases there is likely to be more value in simply measuring the aggregate opinion of individuals, since the expression of interest in a movie is itself a predictor of attendance, regardless of whether that interest is itself socially driven.

Figure 1 :
Figure1: Characterising the response to social information in sequential decisions under binary rewards.(A) the probability that an agent will choose option A when that is the correct choice, conditioned on the number of previous decisions for options A and B, averaging over all sequences consistent with those aggregate number of decisions.In this example the environmental noise is = 2.32, giving an individual choice accuracy of q = 2/3; the red contour line indicates this probability.(B) The probability for n A agents to select option A when that is the correct choice, averaged over a full sequence of decisions in a group of n = 25 agents.The blue bars indicate the probability when rational agents are subject to binary rewards, red bars indicate the probability if all agents select independently.The dashed lines indicate the mean of each probability distribution.Agents responding rationally to binary rewards have a higher average number of individually-successful decisions, but a lower probability of a correct majority decision.

Figure 2 :
Figure 2: The effect of competition on collective accuracy.(A) With q = 2/3, across different group sizes (n) collective accuracy increases with increasing competition (β) up to an optimal value given by β = Q (indicated by the dashed red line), where collective accuracy matches that predicted by the Condorcet Jury Theorem.Higher levels of competition reduce collective accuracy, with sufficiently high values of β leading to lower collective accuracy than under binary rewards (β = 1, indicated by the dashed black line) .Negative values of β, indicating rewards for conformity, always lead to lower collective accuracy; (B) Individual accuracy is maximised at values of β close to one, indicating weak positive competition, and increases with group size.At the optimal competition for collective accuracy, individual accuracy is the same for all group sizes as agents choose independently; (C) The collective accuracy under optimal competition (solid line) compared to that achieved under binary rewards (dashed line) as a function of group size; (D)The maximum value of β for which competitive rewards outperform binary rewards, for varying group size and as a function of q (representing the probability for a solo agent to choose correctly).The dashed line shows the optimal value of β = Q for comparison.The range of effective competition values (those that improve on binary rewards) is greater for easier decisions and in larger group sizes.Note the logarithmic scale on the y-axis.