Graduated sanctioning, endogenous institutions and sustainable cooperation in common-pool resources: An experimental test

To encourage long-term cooperation in social dilemmas such as common-pool resources, the importance of sanctioning is often stressed. Elinor Ostrom advocates graduated sanctioning: the severity of a defector’s punishment is dependent on the extent of their history of deviant behaviour. In addition, endogenously chosen sanctioning is argued to induce cooperation due to a higher legitimacy. This study compares the effect of graduated and strict mutual sanctioning on cooperation in common-pool resources at the micro and macro level. In addition, we distinguish whether the type of mutual sanction is exogenously determined or endogenously chosen. A Common-Pool Resource game is used in a laboratory experiment, integrating crucial elements of social structure and rule-making mechanisms within a common. Results support the effectiveness of graduated sanctioning compared to strict sanctioning in the long term and partial support using endogenously chosen sanctioning mechanisms versus imposed sanctioning mechanisms.


Introduction
Collective action and cooperation in social dilemmas are often threatened by the free-rider problem: when a number of profit-maximising people are interested in the same outcome -a collective good available to all of themrational individuals attempt to free-ride at the expense of others' efforts (Coleman, 1987;Kitts, 2006).Elinor Ostrom (1990) and Robert Ellickson (1991) studied the emergence of sustainable cooperation in the context of commons: common-pool resources [CPRs]like oil fields, grasslands or fishing groundsowned by no one in particular and used by many for profit.In these common-pool resources, the use of the resource is finite and the exclusion of users is infeasible (Ostrom, 1990;Ostrom and Ostrom, 1977).This makes appropriation of these resources vulnerable to the 'tragedy of the commons' as described by Hardin (1968): each member has incentives to use the limited resource unlimitedly, which leads to its inevitable decay.
Sustainable cooperation in this context refers to the long-term, stable cooperation between resource users enabling resource use over long periods of time without resource depletion or destruction.Experimental research studied sanctioning mechanisms as institutions to establish sustainable longterm cooperation in social dilemmas (Engel and Irlenbusch, 2010;Fehr andGächter, 2000, 2002;Fischer et al., 2013;Grechenig et al., 2010;Gürerk, 2013;Ostrom et al., 1992;Van Dijk et al., 2015;Van Miltenburg, 2015;Van Miltenburg et al., 2014;Yamagishi, 1986).Many studies addressing the effectiveness of sanctions in establishing cooperation in social dilemmas concluded that sanctions can and frequently do encourage cooperation (Balliet et al., 2014;Chaudhuri, 2011;Hopfensitz and Reuben, 2009;Van Miltenburg, 2015).In particular, the importance of sanctioning in increasing the efficient use of common-pool resources is pointed out (Fehr and Gächter, 2002;Van Soest and Vyrastekova, 2005).
Most experiments use strict punishment: an actor either gets punished or not.In this paper, we define strict punishment as a punishment with only one relatively high level of severity, independent of the deviant past of the defector.However, punishment can also be conceived as continuous, graduated, or dependent on the level of cooperation of others.These differentiated punishment institutions may have specific advantages (Ostrom, 1990;Couto et al., 2020;Shimao and Nakamaru, 2013;Van Weeren and De Moor, 2014).In the formulation of eight design principles -principles that are present in many successful, long-living commons - Ostrom (1990) stressed the importance of one such differentiated punishment institution, namely graduated sanctioning: a defector is punished to the extent of his deviant acts in the past, starting with a small punishment and escalating the punishment after each subsequent offence.She argues that for some violators, a small penalty may be enough to remind them of the importance of cooperation and compliance with the rules, while a large sanction may lead to feelings of unfairness and, consequently, to more rule-breaking behaviour (Ostrom, 1990).Graduated sanctioning is found outside the context of commons as well: most legal systems punish repeat offenders more severely for the same act than first offenders (Emons, 2003).A punishment based on the deviant past of the violator, reflecting the seriousness of the harm done, is suggested to result in a better fit of the sanction to the violator's culpability, more acceptance of the punishment, a better learning effect for the violator and compliance with the rules in the future (Mandiberg and Faure, 2009;Ostrom, 1990;Shimao and Nakamaru, 2013).
From a game-theoretic perspective, however, graduated sanctioning does not seem effective: a rational individual would free-ride until the severity of the punishment would make it rational to cooperate instead.In this case, strict sanctioning with one optimised severity, such that it is never rational to defect, would suffice (Polinsky and Shavell, 2000).With respect to mutual sanctioning, as is often found in commons (Ostrom, 1990), rational choice theory will even predict that sanctioning will not happen at all, if it is costly for the punisher (Yamagishi, 1986) and if future benefits of punishment do not exist or are too unsure.
Despite this, Ostrom (1990) finds that graduated sanctioning was a typical feature amongst successful and robust commons of various types, including meadows, forests and irrigation systems.She states that in many enduring self-governing CPR systems the first sanction after an offence is so low it barely has an impact on the offender.This initial sanction can be thought of as a reminder for the offender -and other resource users -that breaking the rules will get you caught, but that there is still trust that the offender will cooperate from then on.Ostrom argues that a large fine after a first offence may lead to resentment instead of cooperation.This implies a 'vengeful' type of human, who is not necessarily rational in the game-theoretic sense and for van Klingeren and Buskens whom a large fine will lead to an unwillingness to follow the rules again out of perceived unfairness of the received punishment (Ostrom, 1990;Polinsky and Shavell, 2000).In accordance with Ostrom's favoured view of graduated sanctioning, Couto et al. (2020) develop a theoretical argument for conditions under which graduated fines and taxes can have a positive effect on average group achievement and average institution prevalence and thus overall success in a collective risk dilemma (a specific form of public goods game).
Many studies point out the presence of graduated sanctioning in longliving institutions (Hayes, 2006;Gibson et al., 2005;Ostrom and Nagendra, 2006;Ghate and Nagendra, 2005;Ostrom, 1990).And while there is evidence that institutions with sanctioning mechanisms beat ones without sanctioning mechanisms in terms of cooperation (Fehr and Gächter, 2002;Gürerk et al., 2006;Yamagishi, 1986) the evidence on the efficiency of graduated sanctioning relative to strict forms of sanctioning is inconclusive.For instance, Iwasa and Lee (2013) show in a theoretical model that graduated sanctioning only works well when the probability of erroneous reporting of players' actions is low and when players are heterogeneous in their sensitivity to differences in payoffs.Evidence from case studies is also not conclusive: whereas Ghate and Nagendra (2005) show the positive effect of graduated sanctioning in their study of forest management in India, Cleaver and Franks (2005) mention in their study of river basin management in Tanzania both positive and negative sides to graduated sanctioning.In addition, there is research suggesting that long-living commons had a lesser need for sanctions in general, and that advanced sanctioning mechanisms such as graduated sanctioning were barely used (De Moor et al., 2021;De Moor and Tukker, 2015).
A design principle which may be related to the effectiveness of sanctioning -perhaps regardless of the actual sanctioning type -is the principle of having those affected by the rules participate in modifying the rules; the socalled collective choice arrangements (Ostrom, 1990).The collective decision of rules provides protection against decisions imposed by a minority of members at the expense of others (Van Miltenburg, 2015;Wilson et al., 2013).There is experimental evidence that punishments and rewards allocated to others via endogenous sanctioning institutions are perceived as more fair and acceptable, and that as a consequence these chosen institutions perform better in increasing adherence to the rules in social dilemmas (Gürerk, 2013;Gürerk et al., 2004;Strimling and Eriksson, 2014;Sutter et al., 2010;Van Miltenburg, 2015).Putterman et al. (2011) show that groups of individuals that are allowed to vote on sanctioning mechanisms in a Pubic Goods Game are quick to learn to vote for efficient mechanisms.In addition, Dal Bó et al. (2010) show in a series of prisoner's dilemma experiments that endogenously chosen rules have a greater effect on player behaviour than exogenously imposed rules.In addition, there is evidence that collective decision-making can lead to better decisions than individual decisionmaking (Wilson et al., 2004).
Concluding, only limited empirical evidence exists to compare the effectiveness of endogenously chosen or exogenously imposed graduated and strict sanctioning mechanisms.There is a lack of causal studies investigating the effectiveness of sanctioning mechanisms in upkeeping long-term cooperation.In particular, no causal studies have been conducted in the context of commonpool resources -which are distinct from for instance public good dilemmas due to the exhaustive nature of the resource.Following the design principle by Elinor Ostrom, we study to what extent graduated or strict sanctioning mechanisms are more effective in sustainable, long-term cooperation in common-pool resources (RQ1).Furthermore, including a possibly related design principle of collective choice arrangements, we examine whether endogenously chosen sanctioning mechanisms are more effective in sustaining cooperation in common-pool resources (RQ2).By combining these two design principles in our treatments, we control for the possibility that it is not the type of sanctioning, but the perceived legitimacy of the sanctioning that is most important to induce cooperative behaviour.We can distinguish and compare four sanctioning types by cross-sectioning the research questions: assigned strict, chosen strict, assigned graduated and chosen graduated sanctioning.
Analysing the effectiveness of different kinds of sanctioning mechanisms empirically in real-life settings is difficult, due to the amount of confounding factors that influence success and failure.In addition, a randomised control trial to test sanctioning types would involve sanctioning the same offence with a different punishment within the same CPR context, which may be difficult to set up and enforce.Laboratory experiments accommodate the investigation of the effectiveness of regulation mechanisms in a controlled environment, allowing for a causal test, in which participants face decisions representing related real-world settings (Van Miltenburg, 2015;Van Soest and Vyrastekova, 2005).This study aims to test Ostrom's case study findings and design principle formulations by integrating crucial elements of social structure and rule-making mechanisms within a common in such a laboratory experiment.Participants play a CPR game (Janssen et al., 2010) in groups of four, in which there is a common natural resource (a fishing ground) that they can appropriate under different sanctioning mechanisms.Another of Ostrom's design principles incorporated in the game is mutual monitoring (Ostrom, 1990), meaning that a monitoring system is in place in which resource users monitor each others' behaviour during the appropriation of the resource.The players are also the ones to decide on whom to sanction.This way everybody feels responsible for looking for defectors, while knowing that they are being watched in their actions as well.The importance of mutual monitoring in CPRs is emphasized in multiple empirical works (Christensen et al., 2021;Hayes, 2006;Gibson et al., 2005).

van Klingeren and Buskens
The main purpose of this paper is to test the effectiveness of graduated versus strict sanctioning on sustainable, long-term cooperation in commonpool resources.However, to add complexity and realism to the context of the experiment in its common-pool resource setting, the design principle of collective choice arrangements is also tested.By using an experimental approach to test and compare the effectiveness of sanctioning mechanisms, we provide the causal empirical evidence that the current literature on sanctioning in CPRs lacks.We do not test the effect of mutual monitoring, but mutual monitoring is inherent to the game design as a result of the common fishing ground context.We do not involve the remaining five design principles (group boundaries, rules matching needs and conditions, rule-making rights being respected by outside authorities, accessible dispute resolution and governing responsibilities in nested tiers (Ostrom, 1990)) because adding variations for them would complicate our experimental design too much and they are outside the scope of our experiment.
The structure of the paper is as follows.First, hypotheses based on the discussed literature are formed.Second, the experiment that lies at the core of this research is explained.Then, descriptive and explanatory analyses are performed and described.Lastly, the results are discussed in light of the hypotheses and existing literature and recommendations are made for future research.

Predictions
The CPR game we develop resembles to some extent repeated public goods games with punishment.As in the classic experiments of Fehr and Gächter (2000), we expect the average level of cooperation over time to be high.This is based on the assumption that the population consists not only of selfish free-riders, but also of a considerable percentage of conditional co-operators who are willing to punish free-riders (Fehr and Gächter, 2000;Fehr and Gintis, 2007;Ones and Putterman, 2007;Ostrom, 2000;Van Miltenburg, 2015).As the effectiveness of sanctioning versus not-sanctioning institutions is already widely investigated (Fehr and Gächter, 2002;Gürerk et al., 2006;Yamagishi, 1986) we focus in the current paper on the effectiveness of graduated versus strict sanctioning, and chosen versus assigned sanctions on cooperation levels in CPRs over time.Based on the case study evidence gathered by Ostrom (1990) and the consequently formulated design principles on graduated sanctioning, we expect graduated sanctioning to have a beneficial effect on long-term cooperation in CPR settings.Initially, graduated sanctioning might not prevent non-cooperative behaviour better than strict sanctioning, but because the effectiveness of the punishment is expected to be better for graduated sanctioning, in line with Ostrom's arguments, we expect more cooperative behaviour to be maintained in the long run.We thus formulate the following hypothesis, which is essentially an interaction effect between the effect of graduated sanctions and the time the game has evolved: Hypothesis 1. Graduated sanctioning is more effective than strict sanctioning in sustaining cooperation over time in common-pool resources.
Existing literature suggests that collective decision-making works better than imposed decisions or individual decision-making (Ostrom, 1990;Putterman et al., 2011;Van Miltenburg, 2015;Wilson et al., 2004).In reallife CPR settings too, actors often employ endogenous institutions, whereby members gather together and have majority votes on its installation (Ostrom, 1990;Van Miltenburg, 2015;Veszteg and Narhetali, 2010).Collective choice arrangements are described by Ostrom (1990) as another important design principle present in many successful long-living commons.In the choice condition of the current experiment, the appropriators can choose between strict and graduated sanctioning every five periods.If chosen institutions lead to more adherence to the rules, we may expect that it is the presence of a choice, and not the specific sanctioning mechanism, that increases the level of individual cooperation.Again, we expect that this effect will manifest itself mainly over time, because only with experience the actors involved learn what works best for them and appreciate that they could choose the sanctioning institutions themselves.The following hypothesis is formulated: Hypothesis 2. Endogenously chosen sanctioning mechanisms are more effective than imposed sanctioning mechanisms in sustaining cooperation over time in common-pool resources.
On the basis of the literature, we cannot infer whether it is the sanctioning form or the fact that the sanctioning form was democratically chosen that has the larger effect on sustainable cooperation.Therefore we do not formulate specific hypotheses on a specific effect of the combination of graduate sanctions and endogenous choice of the sanctioning institution.
Our hypotheses are tested at the macro level of group outcomes of cooperation -such as resource size, change in resource size and group appropriation -and at the micro level of cooperation, namely individual appropriation effort.We adopt a macro-micro-macro perspective, which is most famously described by Coleman (1987) but was first applied by McClelland (1961) and has many other early applications and predecessors in Sociology (Raub and Voss, 2017).Following this perspective, we argue that distinction between macro-and micro-level variables is necessary, as we are dealing with a mechanism that firstly affects the individual motives and van Klingeren and Buskens behaviours and secondly the aggregate outcomes.To understand the full mechanism, it is thus necessary to investigate the effect of sanctioning types on the individual and group level separately.In addition, insignificant differences on the individual level may yet lead to significant changes at the group level, making it important to measure both to capture the full picture.In line with our hypotheses, we monitor the levels of cooperation over time, rather than the average levels of cooperation and test the hypotheses using interactions with time.

The experiment
The general setting Many studies on peer regulation focus on the public goods games to investigate the influence of sanctions.Real-world social dilemmas such as found in commons are often more complex (Kingsley, 2015).The game that is used for this paper is a game that better encompasses the underlying mechanism of a renewable resource through implementation of specific contextualised common-characteristics: the CPR game (Kingsley, 2015;Van Soest and Vyrastekova, 2005).The basic functions of the CPR game in the current paper are identical to the CPR game as used in Van Klingeren (2020, 2022).
The common-pool resource game models a fishing ground from which fishermen make a living.There are four fishermenthe appropriatorswho use the fishing ground.There are multiple periods and two stages per period: the appropriation stage and the sanctioning stage.
Combining benefits and punishment costs, the utility function for the appropriators per period is the sum attained over the sub-periods: in which U it is the total utility of a fisherman i at time point t, which consists of the benefits b it at the appropriation stage and the profits or costs s it at the sanctioning stage of the game.

The appropriation stage
In the appropriation stage, the four appropriators can choose how much to invest in fishing.They all have an endowment of 50 units to appropriate from a resource, R, in each time period t.Since fishing is a costly activityit takes time and requires maintenance of the boatthe appropriation effort x it ð0 ≤ x it ≤50Þ represents investments of fisherman i at time t.The returns from this effort are 4 Here, R 0 is the maximal size of the resource for which we take R 0 = 600 and R tÀ1 is the size of the resource at time t À 1. Their profit thus depends on the current size of the resource relative to its original size.The overall benefits are: This function represents the benefits of the fishing efforts plus the remaining points not invested in fishing.If R tÀ1 = R 0 , which is the case at the first stage of the game and as long as the resource is at its maximal size, the return is 4 À 1 = 3 units per invested unit.When R tÀ1 < R 0 , the return will be lower.
The common-pool resource, the fishing ground here, has a certain renewal rate, modelled as: in which 1.25 is the renewal rate of the resource (after every time point the resource will grow with 25%), R t is the resource at time point t and P 4 i¼1 x it is the sum of the appropriation efforts of all four appropriators.The minimum of 600 is taken to indicate that the resource cannot grow above its initial size of 600, say the maximum capacity of fish in the lake.The actors can choose how much they want to invest in appropriation of the resource each period.They do so independently, without knowing what the other fishermen do.
Only at the end of the period they observe what the other fishermen have done.
The endowment height of 50, resulting in a maximum aggregate fishing effort of 200, was chosen so that subjects have the opportunity to defect, but that full defection would not immediately deplete the entire resource (which would be the case if the maximum aggregate fishing effort would be 600).Instead, maximum overexploitation affects the resource size a lot, but does not damage it irreparably -as would be the case with a real life fishing stock.The chosen parameters for the endowment, resource size and regrowth function make sure that appropriation and overappropriation reflect a natural process of depletion and regrowth of the fishing stock.

Overexploitation
Individuals have collectively overexploited the resource if R t < R tÀ1 , which x it > 120.This is the case because 600 1:25 = 480 and 600 À 480 = 120, where 1.25 is the growth rate per period, and thus the resource size can only fully van Klingeren and Buskens restore when the total appropriation is 120 or lower.If P 4 i¼1 x it ¼ 120, the appropriators have invested on average 30 in appropriation effort per person.This is the social optimum investment level for the group.
The maximum appropriation effort fishermen can invest without depleting the resource (further) is always 30 -regardless of the resource size -as a result of the decreasing returns on fishing when the resource becomes smaller: with a renewal rate of 1.25 = 5 4 , the maximum sustainable catch is 1 5 and thus per person, the fishermen may only take 1=5 4 = 1 20 of the resource.Since the catch is formulated as R tÀ1 R 0 × x it , the maximum sustainable appropriation effort is x it = R 0 20 = 30 in every period.Appropriating more than 30 is thus non-sustainable behaviour and will lead to a depletion of the resource.Therefore, this is considered as defection.As long as R t > R 0 4 , appropriators have a short-term individual incentive to appropriate as much as they have, because returns on the investment are larger than the costs.However, the game is not one-shot but extends over many periods.Collective overappropriation is only more profitable than collective sustainable appropriation for the first couple of periods, even if the collective overappropriation is not that much.For instance, with a group appropriation of 130 -an average individual appropriaton of 32.5 -the cumulative utility is only higher than sustainable group appropriation of 120 for 6 periods.After that, the collective utility is lower than it could have been, had every player appropriated sustainably.On the individual level it's also not profitable in the long run to overappropriate, if others do not underappropriate.If one player invests more than 30 while other players consistently invest 30 on average, the final cumulative payoff for the defector will be lower than if they had invested 30 in all periods.See online appendix A for four figures illustrating what happens to the resource size and cumulative utility of a group and individuals under different levels of appropriation.Only if the overappropriation of one or multiple players is consistently offset by underappropriation of other players -keeping the collective appropriation at 120 -will overexploitation be more profitable than cooperative appropriation in the long run.
Investing stops being profitable if the resource is equal to or smaller than 25% of the original size.When investments are less than 30, the resource will also grow again as long as it is not at full capacity.The resource would see the largest regrowth (125%) if all appropriators invest 0 in fishing.

The sanctioning stage
In the sanctioning stage, we consider three conditions, which are applied in the experiment.The first condition is 'strict sanctioning': punishments are always equally severe, regardless of the punishment history of the defector.The second condition is 'graduated sanctioning': severity of punishments increases stepwise with the number of times the defector was punished before.The third condition is the 'choice condition': appropriators can choose every five periods between graduated and strict sanctioning, on a majority-vote basis, mimicking endogenous institutions found in many successful commons.
In all conditions, anti-social punishmentthat is, punishing appropriators that did not overexploit the resourceis not possible.All appropriators receive a separate endowment of 15 units per period for sanctioning each other.This sanctioning endowment is separate and independent of the investment endowment of 50.For both sanctioning mechanisms, it remains the decision of the appropriators themselves whether they want to punish any defector(s) or not.
Costs of punishment consist of a 'punishment effort' part and a 'received punishment' part: Here s it denotes the utility gained from the sanctioning stage: 15 is the total sanctioning endowment received, P ijt is the punishment given by actor i to actor j at time point t and 3 is the multiplication factor of the punishment units invested by others (P jit ) to punish actor i.

Strict sanctioning
In strict sanctioning, the cost of punishing another player is 5, leading to a cost of 15 for the punished player.An overexploiting player (i.e. a player that invested an unsustainable amount of more than 30 in fishing this period) can thus be punished with a maximum of 45 points by the three other players, which is considerable given that the maximum extra gain for overexploitation if the resource is at the maximum size equals 20 × 3 = 60.

Graduated sanctioning
In graduated sanctioning, the cost of punishing another player can be 1, 3, 5 or 7, dependent on whether it is the first, second, third or more than third time that the defecting player is punished for overexploitation.The cost of punishment increases automatically.For instance, if a defecting player would van Klingeren and Buskens overexploit the resource (i.e.invest an unsustainable amount of more than 30 in fishing) for the first time and would be punished by all other players, he would receive 3 × 1 × 3 = 9 punishment units.If he would subsequently break the rules again and would be punished again, this time by only two other players, he would receive 2 × 3 × 3 = 18 punishment units, and so on.This means that at the third punishment occasion, the graduated sanctioning has the same impact as the strict sanctioning.From the fourth punished occasion on, graduated punishment is more severe than strict punishment.

Endogenous punishment institution
In the endogenous choice condition, the appropriators can vote every five periods for either strict or graduated sanctioning to be maintained in their game for the subsequent five periods, based on a majority vote.If both institutions have equal votes, it is randomly decided which one is used.When the actors choose graduated sanctioning, all previously punished defections are taken into account for the calculation of the punishment costs.This way, the punishment costs, and thus the severity of the punishment of the defector, do not have to be 'built up' from the bottom every time graduated sanctioning is chosen.

Game dynamic
The game played by the participants is a dynamic game, in which last period's results influence the next period's starting point.Contrary to a series of one-shot decisions, such as often seen with public good games, previous actions of all players are influential for the future.Although there is a regrowth of the resource, the regrowth factor between periods is not large enough to undo overappropriation between one period and the next.This reflects how a real natural resource reacts to overappropriation: cutting down too many trees or taking out too many fish in the lake on one day means less trees and fish to take in consequent days.A consequence of this dynamic nature of the game is that potential 'start-game effects' where players are still learning how to play the game will influence the rest of the game and each player's eventual payoffs: making a big dent in the resource in the first few periods of overappropriation may be hard to recover from.In addition -as shown in online Appendix A -each time a player overappropriates at any point in the game it will reduce future profit for all players -including the defector -unless other players compensate for the exploitation by appropriating less than 30 so that the resource does not fall below its original size.As such, it would be in players' best interest to take the future into account and to invest sustainably.However, players are uncertain of others' propensity to be forward looking which may lead them to overappropriate, especially at the start of the game.The game dynamic forces players to come up with a long-term behavioural strategy, using the sanctioning mechanisms to ensure the survival of the resource to optimise their payoffs.

Notes on equilibrium analysis of the game
The single stage game always has the shape of a standard public goods game with punishment as long as the resource is larger than R 0 4 .Therefore, as long as the resource is large enough, the unique subgame perfect equilibrium of the single stage game is full appropriation (investing the entire endowment) without punishing others, as punishing is costly.Given that there is no punishment, there are individual incentives to appropriate as much as possible, independent of what the other players do.As soon as the resource is smaller or equal to R 0 4 , there are no more individual incentives in the stage game to appropriate the resource.Thus, then there is a Nash equilibrium in the stage game in which no appropriation is done and clearly still no punishment is chosen.
Game-theoretic analyses of the dynamic game are beyond the scope of this paper.Clearly, there will be many cooperative equilibria in an indefinitely repeated game if the continuation probability is large enough.For the finitely repeated game, a strict backward-induction analysis would predict overappropriation in all periods in which this is profitable in the short-run and would be less relevant to predict behaviour comparable to related experiments on repeated public goods games with punishment (Fehr and Gächter, 2000).As, e.g., Fiala and Suetens (2017) show, cooperation is common in such games and depends on what participants observe about the behaviour and earnings of others.Therefore, an evolutionary game-theoretic analysis of the game analysing, for example, dynamics of populations of cooperators, defectors and conditional cooperators (Zhang et al., 2021) could provide additional insights into the dynamics of this game, beyond the more intuitive hypotheses we aim to test in this article.

Data
A computerised laboratory experiment was designed and programmed in z-Tree (Fischbacher, 2007).The experiment was conducted at the Experimental Laboratory for Sociology and Economics [ELSE] at Utrecht University from February to March 2017.The subjects were recruited from the Online Recruitment System for Economic Experiments [ORSEE] (Greiner, 2015).After a pre-test with Master and PhD students from the Department of Sociology of Utrecht University, the experiment was held in eight sessions van Klingeren and Buskens containing 20, 24 or 28 subjects, leading to a total of 188 subjects of which 60 in the strict sanctioning, 60 in the graduated sanctioning and 68 in the choice condition.87% of the subjects were students, from varying disciplines and years/stages.67% of the subjects was female, 61% was Dutch and the average age was 23.This study received ethical approval (Ref.FETC17-028, Buskens) of the ethical committee of the Faculty of the Social and Behavioural Sciences at Utrecht University.Written consent was obtained from all subjects in the study before the start of each experimental session.The data were anonymised before the analyses.

Experimental sessions
Subjects were first placed randomly in groups of four to play 10 periods of the CPR game without any sanctioning mechanism.Afterwards, they were randomly placed in new groups with a randomly assigned sanctioning condition and the CPR game started anew for another 40 periods.In the sanctioning condition, subjects were assigned to either the strict, graduated or choice condition.The composition of groups remained the same throughout the game.The subjects did not know who the other three players in their group were, nor did they know that there were three different conditions in the second part of the experiment.Completing the experiment took about 80 minutes.
The first 10 periods without sanctioning had two main functions.First, it served to show how cooperation does or does not occur in common-pool resource situations without a sanctioning mechanism.Second, it served as a learning stage for the participating subjects, so that they all had approximately the same level of knowledge of and experience with the game before starting the second part of the experiment, where sanctioning mechanisms were imposed.There was thus no need to vary the order of treatments with and without sanctioning.General written instructions in English were handed out to the subjects at the start of the experiment.In the second part of the experiment, subjects received specific instructions corresponding to their condition (see online appendix B for the instructions and explanation of the stages of the experiment).Subjects played for real money (EUR) under an exchange rate of 500 units = 1 EUR.To compare, subjects could earn respectively 140 and 155 units per period in the two parts if everyone in their groups were to cooperate.The average earning was 14.05 EUR (SE = 1.06), with a maximum of 15.5 EUR and a minimum of 11 EUR.
After completing the main experiment subjects were asked to fill out a survey containing questions about their characteristics such as age, gender, nationality, student status, experience with game theory and how many other people in the experimental session they knew by name.In addition, they were asked questions on how they behaved in the experiment and how they perceived the experiment in terms of fairness.Lastly, some questions were asked on risk-aversion and altruism.For a full overview of the postexperimental survey questions see online appendix C. The survey items on participant-characteristics were used as control variables in robustness checks of the presented models.The in-depth analysis of the other survey questions is outside the scope of the current paper.

Dependent variables
Resource size.The first dependent variable is one at the macro level, namely resource size per period, which represents group cooperation in the experiment.The closer the resource size is to its original size of 600, the higher the level of cooperation, since defection of players leads to a decrease in the resource size.
Δ Resource size.The second dependent variable is also at the macro level: change in resource size in period t relative period t À 1. Positive values indicate growth of the resource, while negative values indicate a decrease in size.While absolute resource size may be impacted heavily by behaviour in the first few periods of the game, change in resource size can be identified after one period of more cooperative behaviour already, thus reflecting more precisely cooperative outcomes at the macro level over time.
Total appropriation effort.The third dependent variable is total appropriation effort.This represents group-level cooperation in the experiment.A lower collective appropriation effort of the group represents a higher level of cooperation, since higher appropriation levels will lead to a decrease in resource size.Note that in the game, a collective appropriation investment up to 120 units does not affect decrease resource size in the next period, so appropriation efforts equal to or below 120 can be considered cooperative.However, if the resource has decreased in size, the resource size can only be restored with a collective appropriation effort under 120 units.The variable takes this into account by treating lower appropriation effort as more cooperative.
Individual appropriation effort.This dependent variable is one on the micro level, namely the individual appropriation effort per period; this represents individual cooperation in the experiment.The lower the appropriation effort, the higher the level of cooperation.In a similar fashion to the total appropriation effort, an individual appropriation investment up to 30 units does not decrease resource size in the next period, so appropriation efforts equal to or below 30 can be considered cooperative.However, a shrunk or neardepleted resource can only regrow in size if the average appropriation effort is under 30 units.The variable takes this into account by treating lower appropriation effort as more cooperative.
Group profit.In addition to objective cooperation levels as measured by the previous four dependent variables, group profit shows the gains in terms of utility (profit) of the group.The height of the profit is no indicator of cooperation.However, it does give us insights on the incentives faced by the actors within the group, and the consequences of their cooperative or noncooperative behaviour.This variable is operationalised as the average profit of a group of four players per period.
Individual profit.As with group profit, the height of the individual profit is no indicator of individual cooperation, but it shows the consequences of behaviour on the individual level.Regardless of the objective cooperation estimations, it is interesting to see which treatment made for the highest individual payoff for players.This variable is operationalised as the profit of an individual player per period.
Punishing behaviour.Operationalised as whether a player punishes another player (1) or not (0) in a period in which there was the opportunity to punishi.e. a period in which one or more other players invest more than 30 units in appropriation of the resource.This variable will provide insight into whether specific sanctioning mechanisms induce more (or less) punishing behaviour in situations where someone can be punished.

Independent variables
Experimental conditions.For the multilevel analyses, we construct two dummy variables: graduated sanctioning [Grad] and choice condition [Choice].Assigned strict sanctioning is the reference category -this is the treatment if both dummy variables are zero.The main effect of the chosen sanctioning dummy should be interpreted as the effect of chosen strict sanctioning.The main effect of the graduated sanctioning dummy should be interpreted as the effect of assigned graduated sanctioning.Lastly, the combination of the main effect of graduated sanctioning and the interaction of the graduated sanctioning and the chosen sanctioning variable should be interpreted as the effect of chosen graduated sanctioning.
As described before, the groups in the experimental sessions are divided into three sanctioning conditions: strict sanctioning, graduated sanctioning, and the choice condition in which players may vote for strict or graduated sanctioning every five periods.Within the choice condition, we need to distinguish whether players played under 'chosen strict' or 'chosen graduated' sanctioning, as it is important for the analyses of player behaviour and outcomes to know which sanctioning type was used.As the unit of analysis in our models is a cooperative outcome in a single period, it is clear for every period whether the outcome belongs to chosen strict or chosen graduated sanctioning.As will be explained further in the analytical strategy, we use multilevel models to account for the interdependence of observations within the same groups and individuals over time.Because the sanctioning type is chosen every fifth period in the choice condition, chosen strict and chosen graduated sanctioning can be clustered within the same groups and players.

Control variables
Group mean punishment size.We add this control variable to be able to single out the treatment effect of graduated sanctioning from sanction severity in the two-level models.As graduated sanctioning becomes more severe over time, the time interactions of graduated sanctioning could be merely driven by sanctioning severity rather than higher compliance due to other reasons.We construct this variable by taking the average punishment severity of each period, if each of the players would be punished.In other words, the average punishment size that "hangs over the players' heads".
Individual punishment size.We add this control variable to be able to single out the treatment effect of graduated sanctioning from sanction severity in the three-level models.As graduated sanctioning becomes more severe over time, the time interactions of graduated sanctioning could be merely driven by sanctioning severity rather than higher compliance due to other reasons.We construct this variable by taking the punishment severity of the player, if they would be punished that period.In other words, the average punishment size that "hangs over the player's heads".
Resource size in t À 1.Following Van Klingeren (2020), we control for resource size in the previous period in the Δ Resource size model and macroand micro-level models on appropriation effort to single out treatment effects rather than the effect of previous events on behaviour.The possible range of change in resource size is affected by the resource size in the previous period.In addition, appropriation effort can be heavily influenced by the outcomes in the previous period.
Sum of appropriation of others in t À 1. Again following Van Klingeren (2020), we add this as a control variable for the micro-level model on individual appropriation effort, controlling for the influence of other players' behaviour in the previous round.
van Klingeren and Buskens Period.To control for general linear trends in appropriation efforts and the natural resource size throughout the game, the model contains a variable indicating the periods.When including period in the models, we use a version of the variable centered around Period = 20 to be able to interpret interaction effects more easily.
For robustness, we include several other control variables in models that can be found in online appendix D. These control variables include: individual characteristics such as age, sex, being a student, experience with game theory and the number of acquaintances in the experimental session.For macro-level models, group-level averages of the individual characteristics are included.In addition, the squared term of period is added in these models to control for a varying effect of interactions with treatments and time throughout the game.In addition, a model without any endogenous control variables -only including the exogenous treatment and period variables -is presented in online appendix E.Although this model is less well suited to test the treatment effects on player behaviour -our main interest in this studythis model does give a good overview of the significance in the differences between the treatment outcomes.

Analytical strategy
To test our hypotheses we use multilevel models.Multilevel modelling is commonly used to handle clustered or hierarchical data structures, in which lower-level units belong to a single higher-level unit.The multilevel model accounts for the increased similarity of repeated measurements nested in individuals and individuals nested in groups when evaluating parameters (Hox, 1998).In our case, our macro-level models contain time points nested in groups and are fit with two-level multilevel models with a random intercept for groups.Our micro-level models contain time points nested in individuals nested in groups, and are fit with three-level multilevel models with random intercepts for individuals and groups.The models on resource size, change in resource size, total group appropriation, individual appropriation, individual profit and group profit are fit with a normal multilevel model.The models on punishing behaviour, with the binary depending variable of punishing (1) or not (0) is fit with a binary multilevel logit model.
As stated before, we follow amongst others Coleman (1987) by investigating the macro and micro levels of our dependent variable cooperation separately in order to understand the full mechanism of sanctioning on individual behaviour and its aggregate outcomes, as these effects may be related but are not necessarily the same in terms of effect size and statistical significance.For example, although we have fewer observations at the group level than at the individual level, differences between groups might be more pronounced than differences between individuals because individuals' behaviour relates to the behaviour of their group members.

Hypothesis testing
To test the effects of assigned and chosen graduated and strict sanctioning, all models include interaction effects of graduated sanctioning and the choice condition.The main effect of graduated sanctioning should be interpreted as the effect of assigned graduated sanctioning in the middle of the game, and the main effect of the choice condition is the effect of chosen strict sanctioning in the middle of the game.The combination of the main effect of graduated sanctioning with the interaction with the choice condition provides a coefficient for chosen graduated sanctioning.
As we are investigating long-term cooperation, we are interested in the effect of graduated and strict sanctioning over time and will test the hypothesis accordingly, by interacting the main effects of the treatments with the time periods.The main treatment effects only indicate an average difference in cooperation over the entire game.The interaction effects indicate a difference in cooperation level for each period.Combining the main treatment effects with their time-interactions will provide the treatment effects over time.The time period variable is centered to Period = 20.This means that assigned strict sanctioning in the middle of the game is the reference category when interpreting the main and interaction effects of the other treatments.
After interpretation of the multilevel models, marginal effects tables and plots are shown to interpret the differences found between the sanctioning and choice treatments over time.The marginal effects are calculated so we can compare all players in graduated sanctioning with all players in strict sanctioning, and all players in the choice condition with all players in the assigned condition (regardless of the sanctioning type they chose).This allows us to have a good overview of the performance of the two sanctioning types, and the effect of having a choice, regardless of the chosen sanctioning type.

Macro-level models
Hypotheses 1and 2 on the group level are tested with a two-level multilevel model with random intercepts for groups.Random intercepts are included for groups to define the nesting structure of observations over time within groups: we account for the similarity of observations within groups over time.We use the group-level variables resource size, change in resource size van Klingeren and Buskens and total group appropriation as the dependent variables.The model is represented in the following function: with α j ∼ N ð0, σ 2 α Þ and e tj ∼ N(0, σ 2 ).β 0 indicates the overall intercept and α j indicates the intercept for groups.There are t periods for j groups and k control variables x with coefficient γ.G j represents the graduated sanctioning dummy with coefficient ψ, C j represents the choice dummy with coefficient λ.Period t is included as a main effect with coefficient f and as interactions with graduated sanctioning and choice with respective coefficients θ and ξ.The interaction of graduated sanctioning and choice has coefficient ω.

Micro-Level models
Hypotheses 1 and 2 on the individual level are tested with a three-level multilevel model with random intercepts for groups and individual players, in which individual appropriation effort per period is the dependent variable.Random intercepts are included for groups and individual players to define the nesting structure of observations over time within players within groups: we account for the similarity of observation within players within groups over time.The model is represented in the following function: Þ and e tij ∼ N(0, σ 2 ) and where Ψ i indicates the intercept for players.

Exploratory models
In addition to the models on resource size and appropriation, the effect of the sanctioning treatments on group and individual profit and punishing behaviour is investigated.These models do not test hypotheses, but provide the necessary background information on the functioning of each sanctioning mechanism -whereas a sanctioning mechanism may have a certain effect on cooperation, it may have a different effect on profit.In addition, if a certain sanctioning mechanism has influence on the decision to punish or not, that may have influenced the cooperative outcomes per treatment.Hence, these variables are investigated to provide a view on the sanctioning types from various angles.
The three-and two-level models for individual profit and group profit can be described with the same formulas as the micro-and macro-level models respectively.
To analyse punishing behaviour we use a binary multilevel logit model with random intercepts for groups and players, in which the dependent variable is whether an individual punishes (1) or not (0).The model is represented in the following function: and where P tij is the punishment cost of player i in group j at time t with coefficient γ.

ICC
For all models, the intraclass correlation [ICC] is shown, indicating the proportion remaining variance for the dependent variable at the group and player level after accounting for the fixed effects.Indirectly, it also indicates the need for a multilevel model rather than an ordinary multivariate regression.However, even with a low ICC we choose to use multilevel models to treat our hierarchical data appropriately.The binary multilevel logit models have a slightly different but essentially similar specification, which we will not show in detail here.
For the regular multilevel models, the ICCs are calculated as: As players are part of a fixed group throughout the game, we calculate the player-level ICC to take into account variance due specific player van Klingeren and Buskens characteristics plus the variance due to group characteristics of the group the player is a part of, rather than just the specific player variance.
As there is no direct estimation of the residuals on the first level σ 2 , we calculate the ICC for the binary multilevel logit models as suggested by Goldstein et al. (2002): For all models, we present the ICC for the null model without independent variables as well as the ICC controlling for the fixed effects of the particular model that is being discussed.

Assumptions
The assumptions underlying multilevel models are similar to the assumptions underlying ordinary multiple regression models: we assume linear relationships, homoskedasticity and normally distributed residuals (Maas and Hox, 2004;Shaw and Flake).Research suggests that moderate violations of these assumptions do not lead to highly inaccurate estimates or standard errors (Maas and Hox, 2004).Although often a large sample size is assumed for the use of multilevel models, Browne and Draper (2000) show that useful variance estimates can be produced with as few as 48 groups on the group level.In addition, Maas and Hox (2005) show that fixed regression estimates are generally unbiased even with a small sample size.As our own design includes 47 groups on the group level and 1880 observations on the individual level, we think our sample size is large enough to produce useful estimates -although a bigger sample size may have allowed us to detect smaller effects.
We tested the model assumptions for all shown models.For testing the assumptions of the binary multilevel logit models we used the Diagnostics for HierArchical Regression Model [DHARMa] package in R (Hartig, 2022), which provides a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted generalized linear mixed models.We find no assumption violations, save for modest violations of normality of the residuals on the second and third-level residuals for the individual appropriation effort model and the second-level residuals for the Δ resource size model.However, as Maas and Hox (2004) indicate, non-normal distributions of residuals at a higher level have little or no effect on the parameter estimates.We thus have confidence that the presented results are not affected by those deviations.

Descriptive results
Before the hypotheses are tested, some descriptive results are examined to obtain insight in the relation between the sanctioning mechanisms and some relevant variables.Descriptive statistics on key variables are presented in Tables 1, 2 and 3.Although punishment effort is not used as a variable in the analyses, we do provide some descriptive statistics on this variable in Tables 2 and 3 to assess the course of the experiment and the behaviour of the players in the game.
The mean individual appropriation effort is lowest under chosen graduated sanctioning, and highest in the no sanctioning treatment.Although the mean appropriation under chosen graduated sanctioning is lowest, both chosen strict and graduated sanctioning have lower appropriation efforts than the assigned graduated sanctioning, assigned strict sanctioning and no sanctioning situations.This may indicate the positive effect of having a choice for the effectiveness of both sanctioning types, as expected in hypothesis 2. However, the differences in appropriation effort are small.Between all conditions, the average resource size is highest under assigned strict sanctioning and lowest under assigned graduated sanctioning.
The development of the average appropriation effort and average resource size for the different experimental conditions is shown in Figures 1 and 2 respectively.Note that an average resource size cannot be plotted for 'chosen strict sanctioning' or 'chosen assigned sanctioning' as players in the choice condition could choose every 5 periods which sanctioning type they preferred.Hence, a plotted average would potentially depict resource sizes of a subset of different groups every 5 periods, making it hard to interpret.
As is clearly visible, and as we already know from previous experimental research, the presence of a sanctioning mechanism to guide behaviour is effective in enforcing cooperation: the average individual appropriation effort in the no punishment condition exceeds 30 units in almost every period and the natural resource size drops dramatically from the first period on already.Even when disregarding start-and end-game effects, it is clear that the individual appropriation effort is higher when no sanctioning mechanism   is in place.So although we cannot test this through a strict experimental comparison, it is evident that both sanctioning mechanisms help to improve cooperation.However, the real interest of this research lies in the differences in effects between strict and graduated sanctioning.
Figure 1 shows that with sanctioning, the average appropriation effort is constant over time when disregarding the start-and end-game effects.The initial average appropriation effort is higher under assigned graduated sanctioning than under assigned strict sanctioning, while the choice condition is in between, but after some rounds all three conditions look rather similar.This is reflected also in Figure 2, in which it can be seen more clearly that the resource deteriorates more for assigned graduated sanctioning than for assigned strict sanctioning.Again the choice condition is in the middle.It can also be seen that after about 10 periods this leads to clearly different, but constant sizes of the resources, while the differences are related to these different appropriation efforts in the beginning.For the two conditions with higher initial appropriation efforts, the resource is unable to restore itself to the same level as the assigned strict sanctioning.Although this difference can be regarded as part of a start-game effect, it may also suggest that assigned strict sanctioning has a stronger preventive effect than the other conditions.
The absolute number of votes for strict and graduated sanctioning in the periods where subjects from the choice condition could vote for a sanctioning mechanism are shown in Figure 3.In 45.59% of all choice occasions, strict sanctioning was chosen as the sanctioning mechanism for the subsequent five periods.In 54.41% of all choice occasions, graduated sanctioning was chosen.
In the first four voting periods, graduated sanctioning is chosen more often than strict sanctioning.Especially remarkable is the first voting period, in which more than 80% of the groups chose graduated sanctioning through a majority vote.However, in the second half of the session, with an exception of the sixth voting period, strict sanctioning is chosen more often.This could reflect the subjects' self-regarding preference for low punishment costs and small punishments after they defected a few times in the first half of the game.

Hypotheses testing
The multilevel regression models on natural resource size, change in (Δ) resource size, total group appropriation per period and individual appropriation per period are presented in Table 4.The ICC of the null model of the resource size model is ICC groups = 0.651, indicating a large between-group variance relative to the total variance.The resource size model shows a negative main effect for graduated sanctioning (B = À73.604,p < .001), to be interpreted as the effect of assigned graduated sanctioning, indicating lower van Klingeren and Buskens levels of cooperation for this sanctioning type relative to assigned strict sanctioning.
The interaction of graduated sanctioning with period is positive and significant (B = 2.070, p < .001)indicating that over time, the difference between assigned and strict graduated sanctioning becomes smaller.In addition, the interaction between graduated sanctioning and choice is positive and significant (B = 45.491,p = .007)-adding this coefficient to the main effect of graduated sanctioning, the effect of chosen graduated sanctioning is still negative compared to assigned strict sanctioning.The main effect of the choice condition -to be interpreted as the effect of chosen strict sanctioning -is not significant, indicating that chosen and assigned strict sanctioning do not significantly differ from each other in terms of resource size.However, the interaction of the choice condition with period is significant and positive (B = 0.957, p < .001),indicating that over time, the effect of chosen strict sanctioning becomes 'more positive' and the difference between assigned and chosen strict sanctioning becomes smaller.The ICC of the random intercept for groups is still substantive, indicating a  van Klingeren and Buskens large between-group variance relative to the total variance also after accounting for the fixed effects.
The null model on change in resource size produces an ICC of ICC groups = 0.000, indicating no between-group variance relative to the total variance.When looking at the model on change in resource size, we see that assigned graduated sanctioning has a positive significant main effect (B = 3.199, p < .001)indicating that, on average, the resource size grows under assigned graduated sanctioning relative to assigned strict sanctioning.In addition the interaction between graduated sanctioning and period is positive (B = 0.228, p < .001)indicating that the main effect increases in size every period.The interaction of graduated sanctioning and choice is significant and negative (B = À2.880,p = .049).Taken together with the main effect of graduated sanctioning, this indicates that the change in resource size is smaller for chosen graduated sanctioning than for assigned graduated sanctioning.A marginally significant effect of the choice condition -chosen strict sanctioning -(B = 2.084, p = .054)is shown, indicating that the average change in resource size for chosen strict sanctioning is slightly larger than for assigned strict sanctioning.Controlling for the fixed effects, the ICC of the random intercept for groups is still zero, meaning that for this model, there is no variance of change in resource size directly attributable to group characteristics.As the change in resource size varies a lot per period and is not dependent on the change in resource size in other periods, this makes sense.
The null model on total group appropriation produces an ICC of ICC groups = 0.000, indicating no between-group variance relative to the total variance.The presented model on total group appropriation shows a significant negative main effect of assigned graduated sanctioning (B = À2.841,p = .015)and its interaction with time (B = À0.197,p = .007),indicating that on average the total group appropriation effort is lower for assigned graduated sanctioning than for assigned strict sanctioning, and that this difference increases over time.Controlling for the fixed effects, the ICC of the random intercept for groups expectedly remains zero, meaning that for this model, there is also no variance attributable to the specific groups.As we expected to observe some group-level dynamics over periods, this is surprising.
Lastly, for the individual appropriation effort models, the group level ICC of the null model of the individual appropriation model is ICC groups = 0.000 and the player level is ICC players = 0.031, indicating no between-group variance and limited between-player variance relative to the total variance.The presented model on individual appropriation effort shows a significant negative interaction of graduated sanctioning and time (B = À0.030,p = .028),indicating that over time the individual appropriation effort of players in the graduated sanctioning condition is lower than in the assigned strict sanctioning condition.Controlling for the fixed effects, the ICC of the random intercept for groups remains zero.The ICC of the random intercept for players is slightly larger but not substantial, indicating that there is -also after controlling for the fixed effects -only a small proportion of variance attributable to specific player characteristics, which is surprising.Compared to the null model, we see a higher ICC for the player level, indicating that a larger part of the observation-level variance is explained by the independent variables in the model, causing the proportion of variance on the player level relative to the group and residual variance to increase.Apparently, a large part of the variance in behaviour is related adaptive behaviour over periods rather than that it is driven by the specific cooperative nature of players and groups.
As a lower group appropriation is linked to higher cooperation, these results seem to contradict the resource size results, where graduated sanctioning was associated with lower levels of cooperation.However, these main effects only provide coefficients for the average period in the game.In addition, the interactions with time all point in the same direction; that behaviour under graduated sanctioning becomes more cooperative over time.As the low group and player level ICC's indicate, most of the variance takes place on the observational level: time periods.
To understand the changing behaviour under graduated sanctioning, we look at the marginal treatment effects in Table 5.Note that the marginal effects are calculated so we can compare all players in graduated sanctioning with all players in strict sanctioning, and all players in the choice condition Table 5. Marginal treatment effects to the multilevel models on resource size, change in (Δ) resource size, total group appropriation and individual appropriation.

Marginal effect at
Resource Size

Δ Resource Size Total Group Appropriation
Individual Appropriation Grad Grad Grad Grad with all players in the assigned condition (regardless of which type of sanctioning).This allows us to have a separate assessment of the two sanctioning types and the effect of having a choice regardless of sanctioning type chosen.For resource size, the effect of graduated sanctioning starts and ends negative compared to strict sanctioning, but the difference in resource size becomes a lot smaller over time.Regarding the change in resource size the marginal effects show negative coefficients until period 6 -indicating a decrease in resource size -but this changes afterwards as the coefficient becomes positive, hence indicating a growing resource size.Total group and individual appropriation show similar developments: they start with positive coefficients indicating higher levels of appropriation -and thus lower levels of cooperation -which changes throughout the game into lower levels of appropriation and thus higher levels of cooperation.
For the choice treatment we see an increasingly positive coefficient for resource size, and a positive coefficient for change in resource size, indicating higher levels of cooperation and overall a positive trend in resource size over the game.Regarding total group and individual appropriation we see that players in the choice condition appropriated less throughout the game compared to the assigned conditions.
To better understand the differences between the treatments, Figure 4 visualises the marginal effects discussed in Table 5.
Regarding our hypotheses, the results are nuanced: in the beginning of the game, strict sanctioning outperforms graduated sanctioning with respect to resource size; only near the end of the game graduated sanctioning catches up to the resource size levels of strict sanctioning.Looking at the change in resource size, however, we see that graduated sanctioning surpasses strict sanctioning around period 10 already in having an upward change rate (becoming less negative, indicating a move towards increasing resource size).The appropriation models show that total group appropriation starts off as higher in graduated sanctioning, but quickly falls below the group appropriation in strict sanctioning, indicating higher levels of cooperation throughout the game.Lastly, individual appropriation under graduated sanctioning also falls below strict sanctioning after several rounds of the game already, showing higher levels of cooperation at the micro level.
The initial drop in resource size, caused by higher appropriation efforts, is hard to restore in the game.Taking into account the combination of macroand micro-level dependent variables, we conclude that graduated sanctioning, in the long term, works as well as and near the end even better than strict sanctioning in inducing cooperative behaviour in CPRs.However, in one-shot interactions or interactions with fewer rounds, strict sanctioning is likely to outperform graduated sanctioning.We find partial support for hypothesis 1 at the macro level, and considerable support for hypothesis 1 at the micro level when regarding them in the long term.
With respect to endogenously chosen versus assigned sanctioning, we find that for graduated sanctioning, resource size in the middle of the game is significantly higher than in the assigned alternative.For strict sanctioning we find that the choice condition has a significantly 'more positive' change in resource size than assigned strict sanctioning, meaning that the resource is closer to growing.When comparing the marginal effects of chosen versus non chosen sanctioning mechanisms, we see that the resource size for chosen sanctioning becomes higher than non chosen sanctioning throughout the game, and that total group and individual appropriation are smaller in the chosen sanctioning group.These results provide some support for hypothesis 2, stating the positive effect of endogenously chosen sanctioning mechanisms on cooperation levels.

Profit
After looking at the players' behaviour in the game, we can look at what that behaviour yielded in terms of profit at the macro and micro level.Table 6 shows the results of a three-and two-level multilevel regression model on group profit and individual profit per period respectively.The null model for individual profit provides a group-level ICC of ICC groups = 0.498 and a player-level ICC of ICC players = 0.501, indicating substantive between-group and between-player variance relative to the total variance.The null model for group profit provides a group-level ICC of ICC groups = 0.602, indicating a substantive between-group variance relative to the total variance.Model 1 on individual profit shows no significant main effects of graduated sanctioning, the choice condition and the interaction of the two, indicating that assigned and chosen graduated sanctioning nor chosen strict sanctioning have a higher or lower individual profit than assigned strict sanctioning in the middle of the game.However, the interactions of graduated sanctioning and the choice condition with time are significant.A negative interaction of graduated sanctioning and time (B = À0.065,p = .003)indicates that over time, the individual profit per period is lower than the individual profit in assigned strict sanctioning.The negative interaction of the choice condition with time (B = À0.073,p = .001)indicates the same for chosen strict sanctioning relative to assigned strict sanctioning.The group profit model shows the same effects: a negative significant effect for the interaction between graduated sanctioning and time (B = À0.225,p = .032)and a negative significant effect for the interaction between the choice condition and time (B = À0.284,p = .008).However, in both the micro-and macro-level model, the effect sizes -and thus the differences between treatments -are very small.
In both models, the control variable resource size at t À 1 has a significant positive effect, indicating the built-in game feature that a higher resource size yields higher profit by nature.In addition, the control variable sum appropriation of others in t À 1 has a significant positive effect, as a higher appropriation of others probably means a higher appropriation effort in the next period as well, which yields higher profit than lower appropriation in the same round.Note that if every round was a one-shot game, a higher appropriation effort would yield a higher profit in each round regardless of the resource size.In the long term, however, cumulative profit would be higher under sustainable, cooperative, appropriation behaviour leaving the resource size closer to its original size.
After controlling for the fixed effects, the ICC for the random intercepts for groups and players are small but not zero in both models, indicating some remaining variables at these levels.This, in combination with the substantive ICC from the null models, indicates the importance here of the player-and group-level explanations and thus of taking the micro-macro approach of considering both the individual-and group-level variables.
Table 7 shows the marginal treatment effects on individual and group profit.For an easier interpretation, Figure 5 visualises the differences between treatments in a plot of the marginal treatment effects over time.Again -the marginal effects only differentiate between graduated and strict sanctioning (regardless of whether there was a choice) and between the choice and assigned treatment (regardless of which sanctioning mechanism was chosen).This way, we can assess the treatment differences in a different manner from the already shown multilevel models.Both plots show that the differences between treatments are small, especially near the end of the game under the choice condition -both for chosen strict and chosen graduated sanctioning.In addition, the plot shows that in the assigned graduated sanctioning condition, players are not deterred by the punishing cost of 7they still punish others even if the cost gets higher.
Table 8 shows a binary multilevel logit model on whether a player punishes another player or not.The group-level ICC for the null model is ICC groups = 0.087 and the player-level ICC for the null model is ICC players = 0.408, indicating some between-group variation and a large proportion of betweenplayer variation relative to the total variation.
The first model shows that players in graduated sanctioning are more likely to punish than under strict sanctioning, and that players in the choice condition are more likely to punish than players in the non-choice condition.In the second model, however, the significance of the effect of graduated sanctioning disappears, and a negative effect of punishment cost on punishing is visible.The effect of choice condition on punishing behaviour still stands in the second model.When controlling for punishment cost in the third model, the positive effect of the choice condition still stands with marginal significance (B = 0.605, p = .058),indicating that players who choose their own sanctioning mechanism are more likely to punish defectors.The likelihood to punish defectors in the choice condition increases over time (B = 0.019, p = .021).
After controlling for the fixed effects, the ICC for the random intercept for groups remains small but not zero in all models.The ICC for the random van Klingeren and Buskens intercept for players is substantive in all models, indicating that players individually vary in their punishment tendency and that these variations remain also after controlling for the fixed effects.

Discussion
The aim of this article is to study the effectiveness of graduated versus strict, and endogenously chosen versus exogenously imposed sanctioning on sustainable cooperation in common-pool resources.Using three sanctioning treatments in a CPR game with mutual monitoring, we touch on the effects of two design principles as stated by Ostrom (1990): graduated sanctioning and endogenous policy choices.While most studies assume strict sanctioning to enforce rules, Ostrom (1990) advocates graduated sanctioning: a defector will be punished to the extent of his deviant acts in the past.Next to this, having a choice between sanctioning mechanisms is expected to have a positive effect on their effectiveness, since endogenous (sanctioning) institutions are expected to induce higher levels of perceived fairness of sanctioning, regardless of the type of sanctioning (Gürerk, 2013;Gürerk et al., 2004;Van Miltenburg, 2015).As existing research lacks causal tests and direct side-by-side comparisons between different sanctioning institutions, our results contribute to the literature by providing just that, and by presenting results on the performance of different sanctioning mechanisms in long-term interactions in a CPR setting using a dynamic game.
The analyses show that graduated sanctioning initially -that is, at the beginning of the game -performs worse than strict sanctioning in terms of macro-and micro-level effects: the resource size is lower, the change in resource size is negative, and the group and individual level appropriation is higher relative to strict sanctioning.However, this changes later in the game, as appropriation levels fall below those in strict sanctioning, indicating a turn to more cooperative behaviour under graduated sanctioning.The gap in resource size is -due to the constraints of the game -hard to close, but had the game continued graduated sanctioning would have been at least as successful if not more successful in sustaining higher levels of cooperation.Our results thus suggest that graduated sanctioning in the long term may be more effective than strict sanctioning to keep cooperation levels high.This fits the case-study literature, showing the presence of graduated sanctioning in many long-living institutions.However, just like other research on sanctioning mechanisms points out, our results do not show that graduated sanctioning beats strict sanctioning in all cases: if the interactions between groups of people are shorter, graduated sanctioning may lose its advantage and strict sanctioning may be more effective.
In terms of profit, players in the graduated sanctioning treatment benefit from their overappropriation in the initial periods of the game, but over all the utility gained under graduated sanctioning does not differ substantially from strict sanctioning.Neither sanctioning mechanism induced more or less punishing behaviour than the other, but players were more likely to punish others if they had been able to vote on the sanctioning type.
The results provide support regarding the expected positive effect of having a choice on the effectiveness of both sanctioning mechanisms in sustaining cooperation.Indeed, collective choice arrangements as proposed by Ostrom (1990) seem to induce higher levels of long-term cooperation than imposed sanctioning rules.This result fits earlier evidence from the literature stating that chosen institutions outperform imposed institutions by leading to higher adherence to the rules (Dal Bó et al., 2010;Gürerk, 2013;Gürerk et al., 2004;Putterman et al., 2011;Strimling and Eriksson, 2014;Sutter et al., 2010;Van Miltenburg, 2015;Wilson et al., 2004).The resource size under chosen graduated sanctioning was significantly higher than under assigned graduated sanctioning, although the change in resource size under chosen graduated sanctioning proved to be positive but smaller than in the assigned condition.In addition, over time the resource size of chosen strict sanctioning grew relative to assigned strict sanctioning.No significant differences were found in the models on group or individual appropriation.When comparing the chosen versus the assigned conditions -regardless of sanctioning type -the results showed an increasingly higher resource size, positive resource size change and lower appropriation on the group and the individual level.In addition, players in the choice condition were more likely to punish others.Overall, we find partial evidence for the positive effects of the design principle of endogenously chosen institutions.
Taking into account individual-and group-level cooperation following the micro-macro approach has enabled us to present the full mechanism of sanctioning and cooperative behaviour.Our investigation of micro-level behaviour showed how sanctioning mechanisms affect individual cooperative and punishment behaviours which in turn were reflected by group outcomes.Note that this reflection of individual changes at the group level is not obvious beforehand, because it depends on how individual changes are distributed over groups.
It has to be noted that this study has some shortcomings and that there is potential for improvement.First, as can be deduced from the choices made in the choice condition, subjects seem to choose strict sanctioning in the second half of the game, and graduated sanctioning only in the first part of the game.This may indicate self-serving behaviour of defectors: after the third punished defection, strict sanctions are lower than graduated sanctions.The choices between the sanctioning mechanisms in the choice condition were likely based on selfishness to protect the subjects from higher sanctions in the last parts of the game.In addition, the positive effect of graduated sanctioning in the long term could be explained by the fact that after a few punishments, graduated sanctioning becomes more severe than strict sanctioning.However, by controlling for average punishment size in the macro-level models, and for individual punishment size in the micro-models we were able to see that the effects of graduated sanctioning and the choice treatment have positive effects on cooperative behaviour, regardless of the punishment size.
Second, a known criticism of laboratory experiments is their generalisability: although the setting of the game was contextualised as a commonpool resource, the decisions made in an experimental setting may not be truly representative of decisions made in real-life settings, in which not just some money, but one's livelihood is at stake.However, by adding context to the laboratory experiment this research provides an intuition of what could be found in real-life settings with respect to the effectiveness of different sanctioning mechanisms.Another potential threat to the generalisability of the results is that most of the subject pool consists of students.However, if the aim of the overarching research is to investigate relationships between behaviour and biological, economic or social variables, experiments are a good tool to use, regardless of the subject pool (Anderies et al., 2011;Falk and Zehnder, 2013;Falk and Heckman, 2009;Levitt and List, 2007;Ostrom, 2006).The context of the experiment ensures that the environment under which the results are generated captures essential characteristics of the realworld version of CPRs, which secures the external validity of the results to a large extent (Fehr et al., 2003).We are convinced that the addition of context to the game enabled us to introduce graduated sanctions in an understandable manner to a CPR game, and consequently to reproduce parts of Ostrom's findings; a result that is not known to be produced by more abstract experiments.
Third, our sample size is relatively limited given our used method of multilevel modeling, providing only 47 groups at the second level.However, Browne and Draper (2000) show that useful variance estimates can already be produced with as few as 48 groups on the second level.Whereas a larger sample size on the second level would have been ideal, experimental set-ups limit the sample size possibilities.The CPR game chosen for this paper was designed for four players to capture dynamic interaction, increase and decrease of the resource with consequences for the players and opportunities for punishment.The consequence of this design is that an even larger amount of participants needs to be found in order to have a decent sample size at the group level.Future studies should take this into account and replicate the found results to test for their robustness.
It would be interesting for future research to investigate whether the findings of this paper are indeed generalisable to real-life contexts.Although this paper provides an intuition on the effectiveness of sanctioning mechanisms in CPRs, the causal mechanisms of real-life sanctioning mechanisms are not easily observable.In addition, Ostrom (1990) suggests that graduated sanctioning may be especially useful for more 'vengeful' actors in society; actors that will defect after being punished too heavily after only one time of defection.Future research could involve player-type mechanisms to test whether different sanctioning mechanisms are more effective against certain player types.In addition, Kimbrough and Vostroknutov (2015) show in their dynamic CPR experiment that the regrowth factor is a big determinant of the survival and sustainable use of the resource.Variations on our own experiment with different regrowth rates could help us understand whether it is institutions or ecological factors that determine the survival of the resource.Lastly, future research should investigate whether making the game longer will lead to more pronounced differences in the outcomes of the various sanctioning types.As we saw in the current experiment, it takes time for players to understand the game and to come to a stable behavioural strategya longer game may be better able to measure the different outcomes of sanctioning mechanisms in the long term.In addition, players in our game knew that the game was only going to last for 40 periods.If players did not know the length of the game, or if they knew the game would last 100 periods, they might show more cooperative behaviour from the start on already -or at least the macro-and micro-level repercussions of unsustainable behaviour would become more noticeable.

Conclusion
This study provided partial support for the effectiveness of graduated sanctioning compared to strict sanctioning, and partial support for using endogenously chosen sanctioning mechanisms versus imposed sanctioning mechanisms.This study also highlights that results regarding the effectiveness of sanctioning mechanisms are nuanced, and choosing the 'best' functioning sanctioning rule is not as straightforwards as it may seem.Choosing sanctioning mechanisms collectively, however, is shown to improve the effectiveness of the sanctioning mechanism and increases cooperation.More research is needed to investigate which groups may benefit the most from graduated and strict sanctioning.Research on how to induce cooperative behaviour in common-pool resource settings is not only fundamental to social sciences, but also for the current state of affairs concerning the overuse of common natural and man-made resources.Forests, fishing grounds, pastures and fresh water supplies are all resources subject to the risk of overexploitation.Endogenously chosen institutions could be the key to driving down resource use and to increase cooperation and resource restoration.The further investigation of the effectiveness of sanctioning mechanisms to achieve sustainable cooperation will provide new insights on the use and preservation of these common-pool resources.

Figure 3 .
Figure 3. Votes in the choice condition per voting opportunity (absolute numbers).

Figure 4 .
Figure 4. Marginal treatment effects for resource size [RS], change in resource size [ΔRS] total group appropriation [TGA] and individual appropriation [IA].

Table 1 .
Descriptive statistics of key variables in the 10 periods without sanctioning.

Table 2 .
Descriptive statistics of key variables in the assigned graduated and strict sanctioning conditions.

Table 3 .
Descriptive statistics of key variables in the choice condition.

Table 4 .
Two-and three-level multilevel models on resource size [RS], change in resource size [ΔRS] total group appropriation [TGA] and individual appropriation [IA] with a random intercept for groups (RS, ΔRS, TGA & IA) and players (IA).

Table 6 .
Three-and two-level multilevel models on individual profit [IP] and group profit [GP] with a random intercept for groups (IP & GP) and players (IP).

Table 8 .
Binary multilevel logit model on punishing (Y) or not punishing (N) with random intercepts for groups and players.