Does Danger Level Affect Bystander Intervention in Real-Life Conflicts? Evidence From CCTV Footage

In real-life violence, bystanders can take an active role in de-escalating conflict and helping others. Recent meta-analytical evidence of experimental studies suggests that elevated danger levels in conflicts facilitate bystander intervention. However, this finding may lack ecological validity because ethical concerns prohibit exposing participants to potentially harmful situations. Using an ecologically valid method, based on an analysis of 80 interpersonal conflicts unobtrusively recorded by public surveillance cameras, the present study confirms that danger is positively associated with bystander intervention. In the presence of danger, bystanders were 19 times more likely to intervene than in the absence of danger. It extends this knowledge by discovering that incremental changes in the severity level of the danger (low, medium, and high), however, were not associated with bystander intervention. These findings confirm the importance of further investigating the role of danger for bystander intervention, in larger samples, and involving multiple types of real-life emergencies.

For decades, social scientists have studied the willingness of individuals to intervene in public emergencies (Lofland, 2017;Milgram, 1970). In psychology, this concern has motivated experimental work on the "bystander effect," the finding that individuals are inhibited from taking action in the presence of others (Darley & Latane, 1968). Meta-analytical evidence (Fischer et al., 2011) and observations of real-life violent situations (Levine et al., 2011;Parks et al., 2013;Philpot et al., 2020;Planty, 2002) suggest that the presence of danger raises the probability of bystander intervention, as danger communicates urgency to bystanders about the need of help, and the presence of other bystanders generates a feeling of safety rather than diffusing responsibilities.
Theories of altruistic and prosocial behavior explain the facilitating effect of danger on bystander intervention, either as a means of reducing the intervener's own distress (Dovidio et al., 2006) or as a means of reducing the distress of the victim (Batson, 2011). Rational choice theories emphasize self-interest, and propose either a similar effect with bystanders being more likely to intervene in high than low danger situations, because of the perceived high personal costs on nonintervention when someone is exposed to serious threats (Fischer et al., 2006) or an opposite effect: Increased danger will reduce the willingness to intervene because bystanders acknowledge their intervention may harm themselves (Krueger & Massey, 2009;Shotland & Stebbins, 1983).
The empirical evidence about the role of danger in bystander intervention remains inconclusive. First, the meta-analytical evidence has limited external validity for intervention in actually dangerous intervention contexts (Fischer et al., 2011). This owes to the fact that the majority of the 105 studies included are experimental, and for ethical and practical restrictions, these studies do not involve serious danger, nor do they expose participants to real risk (Cherry, 1995). Typically, experimental studies operationalize danger as situations that might be threatening to the victim (e.g., simulation of an asthma attack or nervous seizure, a fall from a bookshelf), to the intervening bystander (e.g., indicators of fire, big physical stature of antagonist in video vignette) or as remotely disturbing (e.g., stealing objects; Fischer et al., 2011). Second, studies relying on observations of real-life violent incidents tend to include danger as ever-present and constant within incidents (Levine et al., 2011;Liebst et al., 2019;Philpot et al., 2020), aggregating danger to a situational characteristic rather than treating it as fluctuating over the course of an incident. These studies typically operationalize danger as involving some level of aggression (Parks et al., 2013).
Based on a systematic observational analysis of naturally occurring interpersonal conflicts in public places recorded by public surveillance cameras, the current study is the first to investigate whether danger, measured as the fluctuating level of aggression observed between conflict parties, affects the likelihood of individual bystander intervention. We use a within-person design, which allows us to study sequential relationships between moments of intervention and nonintervention within the behavioral sequence of the same bystander, as correlated with the observed danger level before the moment of intervention and nonintervention. We operationalize danger as the degree of aggression used by conflict parties with low level including aggressive gesturing; medium level including punches, kicks, and shoves; and high level including weapon use or aggression toward a person on the ground. This within-person design allows us to study the effect of fluctuating danger levels on intervention and nonintervention behavior with a fixed effect panel design (McNeish & Kelley, 2019) that rules out potential confounding effects of stable personal and situational characteristics-such as gender and preexisting social relationships, which are known to play a facilitating role for intervention behavior (Eck, 1994;Ejbye-Ernst et al., 2020;Hollis-Peel et al., 2011;Liebst et al., 2019;Phillips & Cooney, 2005). While these characteristics, but potentially also unobserved personal characteristics like self-efficacy and intentions (Banyard & Moynihan, 2011;Coker et al., 2011), were found to correlate with bystander intervention, previous studies did not use a within-person design that allowed them to investigate whether personal and situational characteristics determine who engages in lower rather than higher level danger conflicts.

Analytical Strategy
Most studies of bystander behavior in emergencies use experimental designs where participants are confronted with staged emergency situations, either alone or in the presence of other bystanders. The other bystanders are typically experimenter confederates who are instructed not to intervene in the situation (Fischer et al., 2011). According to contemporary ethical guidelines and regulations on human experiments, it is not acceptable to expose study participants to situations that could harm or traumatize them. These situations include experimental settings where participants witness other people suffer injuries or become the victim of serious violence. Compliant to these ethical guidelines, we used an analytical strategy that we believe to be the second-best viable alternative to answer our research question. This strategy combines three key elements: a reliable and ecologically valid data source, a rigorous data selection procedure, and a within-person statistical estimation technique.
The first key element is the data used. In the present study, we coded and analyzed observations of nonstaged, real-life conflicts captured by public surveillance cameras. In comparison to experiments, observational research has the advantage that the observed situations are likely more representative for real-life conflicts than staged experimental situations. In comparison to postconflict retrospective interviews, observational research has the advantage that the accounts are not biased by self-presentational or cognitive limitations and that (interrater) reliability can be assessed (Reiss, 1992). In addition, and in comparison to both experiments and interviews, CCTV footage contains unobtrusive measures because most individuals may become habituated to the presence of cameras .
In observational research, it is not possible for the researcher to control the number of bystanders present or their behavior. When studying factors that affect bystander intervention, the intervention behavior of other bystanders is a potentially confounding endogenous factor that is difficult to account for. The second key element of our analytical approach, therefore, is that we restrict the analysis to the first intervention behavior of the first bystander who intervenes in each conflict. Although this selection procedure does not make the design experimental, and although it ignores interdependencies between bystanders that are potentially interesting in their own right, the result mimics the typical experimental condition in which bystanders are confronted with the presence of other nonintervening "bystanders" (i.e., experimenter confederates).
The third key element of our analytical strategy is that the data were analyzed with a fixed-effects panel design, which uses only within-person variation in the dependent variable and thereby controls for observed and unobserved heterogeneity that remains stable over the course of the footage analyzed. This includes the personal characteristics of the bystanders (e.g., estimated gender characteristics) and other situationally stable factors (e.g., number of individuals involved and present, preexisting social relationships; for applications of this approach in different areas of research, see Milner et al., 2015;White et al., 2013). Fixed effects are less efficient than random effects estimates because they do not use betweenpersons variation, but they require weaker assumptions on the absence of confounding factors (Brüderl & Ludwig, 2015).

Materials
Because video observation and coding are very labor-intensive, we took a stratified random sample of 99 surveillance video clips from a larger set of 219 video clips made available by the police authorities, in Amsterdam, the Netherlands, in Lancaster, England, and in Cape Town, South Africa, that were previously analyzed for another research question (Liebst et al., 2021). The clips were captured by public surveillance cameras and manually recorded by municipal CCTV operators in the three cities. The cameras were located in the nightlife and tourist areas of the cities and involved conflicts both during dayand nighttime. Upon our instructions, the operators extended their ordinary recording procedure to involve not only incidences of criminal behavior but also of public arguments as defined by one or more of the following behavioral cues: agitated talking and gesturing, walking back and forth in apparent frustration, pointing an index finger at the individual, people standing close to each other while gesturing aggressively, and pushing or grabbing each other. We randomly selected 33 clips from each of the three cities.
Among the 99 sampled clips, 13 did not contain any bystander intervention, while in two incidents the first bystander intervention took place immediately (i.e. within 5 s) after the start of the recordings. Because, as explained below, our analysis applies to the unfolding incidents up to and including the first bystander intervention, neither incidents without intervention nor incidents with immediate intervention add variation to the within-person fixed effects panel design. Therefore, these 15 incidents were excluded from the statistical analysis. Four additional cases were excluded because the video quality was insufficient to allow for individual-level coding (Nassauer & Legewie, 2018).
The data set in the formal analysis therefore consisted of CCTV footage of 80 conflicts in Amsterdam (n ¼ 26), Lancaster (n ¼ 30), and Cape Town (n ¼ 24). This sample included conflicts that involved nondanger behavioral clues of conflict, but also physical conflicts, which at times contained severe violent acts. The minimum number of bystanders was 1, the maximum was 62, and the mean and median were 18 and 15 bystanders, respectively. Of the intervening bystanders, 52 were men and 28 were women, while 22 had a preexisting social relationship with one of the conflict parties and 57 did not (for one, it could not be established).
Six trained research assistants coded the video material using a systematic and interrater reliability tested codebook, with variable definitions consistent with existing bystander research (e.g., Levine et al., 2011;Liebst et al., 2019). For details, see Table S1 in Section A of the Supplementary Online Material. First, three coders recorded the fluctuating aggression levels within each video based on the intensity of the aggression on display. This involved watching the clips and creating time stamps for each act of aggression observed. Level 1 aggression included intrusive, nonviolent aggressive actions (e.g., pointing at a person, light pushes, N ¼ 21). Level 2 aggression included physical aggressive acts (e.g., punches, kicks, shoves, N ¼ 115). Level 3 aggression included weapon use and physical aggressive acts to a person on their knees or lying on the ground (N ¼ 12). In cases where multiple acts occurred simultaneously, the highest level of aggression was recorded. For continuous acts of aggression, the level, the start time, and the end time were coded. For short isolated acts, the level and the start time were coded. These isolated acts were assumed to have a duration of a single second. Time intervals during which no aggression was observed were automatically assigned aggression Level 0.
Next, bystander intervention was assessed. Here, three new coders observed the video clips and identified the conflict parties of each situation-that is, the two individuals from whom the dispute was initiated. Next, coders recorded the presence of intervening bystanders. An intervening bystander was any other individual who made physical attempts to placate the conflict situation with any of the following acts: blocking contact between conflict parties; pulling, holding, or pushing an aggressor away from the conflict; and pacifying gesturing and touches. To allow for the temporal synchronization of bystander intervention and the prerecorded aggression levels, coders noted the time of the first intervention act for each intervening bystander. These characteristics were coded not because they are supposed to affect the probability of intervention (as they are constants in our within-person estimation) but because they might moderate the effects of the aggression level of the conflict on bystander intervention, as we discuss below.

Data Structure and Statistical Analysis
Each video clip was analytically subdivided into a series of consecutive 5-s time intervals. 1, 2 For each 5-s interval, we calculated the highest aggression level observed at any time during the time interval. Thus, this could be during the first second, during the last second, at any time during this 5 s, or for the full 5-s interval. For each bystander, we also coded in which time interval they made their first intervention.
After these transformations, for every conflict situation, the data contains a series of 5-s intervals, and for every interval, we have a variable measuring aggression level (0 ¼ none, 1 ¼ low, 2 ¼ moderate, and 3 ¼ high) and a variable measuring bystander intervention (0 ¼ no, 1 ¼ yes). These data were purposefully right-censored, which means that the interval in which the intervention was made is the last interval included in the analysis.
The complete data set analyzed is visualized in Figure 1. The x-axis represents time elapsed in seconds since the start of the conflict, displayed in 5-s intervals. Each horizontal line represents a single conflict situation. The colors on these horizontal lines represent the aggression level during each time interval. Gray indicates the absence of aggression, and yellow, orange, and red indicate low, medium, and high aggression, respectively. Note that neither Level 1 (yellow) nor Level 3 (red) are very common. Low aggression levels occur only in eight incidents and high aggression levels only in seven incidents. Medium aggression levels are seen in 45 incidents. The black dots indicate the times of the first bystander intervention. As explained above, here we analyze conflict situations up to the time of the first intervention. On average, intervention occurred 92 s after the conflict started, but variation is large.
To assess whether bystander intervention behavior varies systematically with the level of aggression displayed by the conflict parties, a logit panel analysis was conducted with fixed effects for the individual bystanders. The resulting fixed-effects estimates are exclusively based on the within-bystander variation between the time intervals in a series. Therefore, they are free from any confounding effects of observed and unobserved time-stable personal characteristics, such as age, gender, strength, intoxication, or in fact, any other factor that remains stable during the conflict (Brüderl & Ludwig, 2015). The 5-s time intervals are thus treated as repeated observations of the same bystander under changing exposure to aggression among the conflict parties.
To assess whether aggression levels affect bystander intervention, we estimate three simple models that only vary in terms of whether the effects are assumed to be (1) contemporaneous, (2) temporally lagged, or (3) jointly contemporaneous and temporally lagged (for a discussion of temporally lagged independent covariates in fixed-effects panel models, see Vaisey & Miles, 2017). In the contemporaneous model, it is assumed that the association of aggression level and bystander intervention is limited exclusively to the same 5-s interval. In the temporally lagged model, it is assumed that the aggression level in the previous 5-s interval affects intervention in the current interval. We added this model because under the assumption of the contemporaneous model, we cannot disentangle cases in which within the same time interval the aggression level changes before or after the intervention. In the temporally lagged model, the aggression level necessarily precedes the intervention. In the joint model, the effect is assumed to be both contemporaneous and temporally lagged. We did not include second-order or higher-order temporal lags in addition to the first-order temporal lag. From an empirical perspective, it seems unnecessary to assume longer delays. From a statistical perspective, it seems impossible because the models become under-identified when they contain too many variables (and thus unknown parameters) relative to the sample size of 80 conflicts.

Inter-Coder Reliability
To assess intercoder reliability of the observed aggression level of the conflict, 17 conflicts were coded by multiple coders (15 were coded by three coders and two were coded by two coders). For each 5-s interval, we treated the aggression measures as an ordinal variable and assessed intercoder reliability by calculating for each time interval on the data two widely accepted measures, a (Hayes & Krippendorff, 2007) and AC 2 (Gwet, 2008). 3 Both measures can take on values from 0 (complete lack of agreement) to 1 (complete agreement). The mean a value across all time intervals was 0.72 (95% confidence interval [0.68, 0.75]), and the mean AC 2 was 0.92 (95% confidence interval [0.86, 0.97]). Both the a and AC 2 estimates are within the ranges of what is acceptable (Gwet, 2008;Hayes & Krippendorff, 2007;Landis & Koch, 1977). Given the high level of inter-coder reliability, for each video clip coded by multiple coders, we randomly selected for subsequent analysis the codes of one of the coders (i.e., we did not apply majority voting rules for time intervals that the coders had coded differently). Details of the inter-coder reliability analysis are provided in Section B and Table S2 of the Supplementary Online Material. Intercoder reliability of the number of bystanders present during the conflict (a ¼ .87), bystander gender (a ¼ .95), and the presence of a preexisting social relation between bystanders and conflict parties (a ¼ .50) was based on observations of 38 incidents that were all coded by three coders.

Results
The contemporaneous model in Table 1 displays fixed-effect logistic regression estimates based on the analysis of 5-s episodes. The odds of bystander intervention are 10, 16, and 39 times larger at contemporaneously low, medium, and high aggression levels than in the absence of aggression, and all three differences are statistically significant. Wald tests of the differences between the estimated effects of low, medium, and high aggression levels indicate, however, that none of the differences among the low, medium, and high aggression levels are statistically significant (low vs. medium: w 2 ¼ 0.35, df ¼ 1, p ¼ .56; medium vs. high: w 2 ¼ 0.87, df ¼ 1, p ¼ .35; and low vs. high: w 2 ¼ 1.35, df ¼ 1, p ¼ .25). Thus, the results provide only support for the contrast between nonaggression and any form of aggression but not for an increased likelihood of intervention with increasing aggression levels.
The results of the temporally lagged model in Table 1 demonstrate that under the assumption of unique lagged effects (i.e., no contemporaneous effects), the estimated odds of bystander intervention are approximately 30 and 87 times larger following medium and high aggression levels than after an interval without any aggression, but not following low aggression levels, as the estimated odds (5.91) is just not statistically significant. Although the differences between the low, medium, and high aggression levels are larger in the temporally lagged model than in the contemporaneous model, they are not statistically significant (low vs. medium: w 2 ¼ 2.85, df ¼ 1, p ¼ .09; medium vs. high: w 2 ¼ 0.75, df ¼ 1, p ¼ .39; and low vs. high: The estimates of the joint contemporaneous and temporally lagged models largely support the conclusions based on the contemporaneous and the temporally lagged models. Four of the six coefficients are significant and suggest that individuals are more likely to intervene at any aggression levels than in the absence of aggression. The exception is the estimated lagged effects of low aggression in the prior period on bystander intervention in the present episode, which is not statistically significant. Neither the contemporaneous effects sizes differ significantly from each other (low vs. medium: w 2 ¼ 0.24, To quantify the distinction between the presence of any aggression and no aggression, we collapsed the aggression Levels 1-3 into a binary aggression variable and estimated its contemporaneous, lagged, and combined effects on intervention. Complete details of the models are reported in Section C of the Supplementary Online Material. The odds ratios are 15.5 (CI [7.4,32.4]) in the contemporaneous model, 22.6 (CI [8.7,58.6]) in the lagged model, and 13.2 (CI [5.5, 31.7]) for the contemporaneous effect, and 10.8 (CI [4.0,29.4]) for the lagged effects in the combined model-all three are statistically significant. Assuming that the true effect of aggression level lies somewhere between the estimated contemporaneous and the lagged effects, this implies that the odds of bystander intervention are approximately 19 times higher when conflict parties display targeted aggression than when they do not.
Motivated by the suggestion that effects of aggression level are greater in the company of other bystanders (Fischer et al., 2011), we estimated a model that includes as predictors combinations of aggression level and the number of bystanders present. The results, which are presented in Section D of the Supplementary Online Material, show that aggression facilitates intervention, but that the effect is not significantly related to the number of bystanders present.
We also tested differences between bystanders who had a preexisting relationship with one of the conflict parties and those who had not. The results in Section E of the Supplementary Online Material show that aggression facilitates intervention for both groups, but that the effects do not differ between both groups.
We also tested differences between male and female bystanders. The results in Section F of the Supplementary Online Material show that aggression facilitates intervention for both males and females, but that the effects do not differ between them.
Finally, we explored whether bystanders in Amsterdam, Cape Town, and Lancaster are equally responsive to the presence of aggression (complete results are included in Section G of the Supplementary Online Material). The results show that the odds ratios are significant in all three cities, with those in Amsterdam (odds ratio ¼ 111.8) being more responsive than those in Lancaster (odds ratio ¼ 23.3), and Cape Town (odds ratio ¼ 6.1). Only the difference between Amsterdam and Cape Town was statistically significant (p < .01). The finding does not imply that bystanders in Amsterdam are more likely to intervene (this cannot be established with a fixed-effects panel analysis) but only that they respond more strongly to variation in aggression level.
All fixed effects logit models were also estimated for 1-s, 3-s, and 10-s intervals. Results are provided in Section F of the Supplementary Online Material. In support of the findings presented for the 5-s interval in Table 1, they demonstrate that for all four interval sizes (1, 3, 5, and 10 s) all three aggression levels (low, medium, and high) significantly increase the odds of bystander intervention. In-line also with the results in Table 1, the distinctions between the three aggression levels are nonsignificant and additionally vary widely between the interval lengths. This suggests that our findings are robust and do not depend on the length of the time interval that is chosen in the analysis.

Discussion
The results provide support for a discrete effect of danger on bystander intervention. The odds of bystander intervention are 19 times higher when conflict parties display targeted aggression than when they do not. This result is in line with theories of altruism and prosocial behavior that suggest that increases in potential harm motivate bystander interventions. It is also in line with studies suggesting that the urgency of the need for help facilitates intervention. However, we also found that the aggression intensity level did neither facilitate nor deter bystander intervention. Serious forms of aggression were not more likely to provoke bystander intervention than minor forms of aggression. This finding runs counter the idea that increased danger boosts the motivation to intervene because nonintervention might be experienced as a too high cost in serious danger situations. It also runs counter the idea that increased danger reduces the bystanders' motivation to intervene for fear being of harmed themselves in the act of intervention. A potential explanation for the absence of overall aggression intensity effect in our findings is that the two mechanisms that stimulate and deter intervention in response to danger might cancel out. According to this argument, increasing danger raises the benefits of intervention by reducing victim harm, while simultaneously raising the costs of intervention by increasing bystander harm. In nonexperimental conflict situations, danger affects the risks for victims and bystanders alike. Therefore, our observational data cannot disentangle both mechanisms. Another, less substantive but possible, the explanation is that our observational design lacks the statistical power to identify the effects of low and high aggression levels because these levels occur infrequently in the data (eight intervals with low aggression and seven intervals with high aggression) whereas medium aggression is much more common (45 instances of medium aggression). A larger sample size would be needed to identify potential effects of low and high aggression levels and to also demonstrate differences in the effects sizes of all three aggression levels.
Our within-person design rules out all stable betweenperson and between-situation confounders as explanations for these mixed results but unobserved time-varying factors such as, for example, changes in the communication of urgency could play a role. We found no moderating effect of the number of bystanders present, preexisting social relationships, and gender on the effect of danger on bystander intervention. In Amsterdam, interveners responded more strongly to variation in aggression level than in Cape Town. Future studies should consider the effect size of danger compared to stable personal and situational characteristics of conflict situations, for example, evaluate the extent to which bystander intervention is driven by dynamic or stable factors. They should also consider whether the danger has a different confounding effect on different kinds of individuals (e.g., Do people with low self-control vs. high self-control respond similarly to danger?) and in different kinds of situations (e.g., Does danger also facilitate bystander intervention in, for example, robberies and sexual assaults).
A limitation of our aggression-level measures is that they are objective physical actions and exclude verbal communications. Obviously, bystanders respond to subjective impressions, and their actions may be driven by expectations based on threats or other verbal expressions that we could not capture and thus could not code as aggression. Future studies might improve on the measurement of aggression by using footage that includes sound, for example, from body-worn camera footage, and by considering the communication of urgency other than aggression. Another limitation of our study is that we operationalize bystander intervention to physical actions aimed at stopping the conflict, while bystanders engage in nonphysical intervention behavior too, including more indirect ways of intervening such as phoning the police. Future work could also improve our test of the effect of danger on bystander intervention by including more high-intensity cases (in the current study, we only had seven incidents in which the highest aggression level was observed), for example, those with visible injuries or use of weapons. Future studies should consider the role of danger for various types of bystander intervention behavior.
In summary, we conclude that bystanders are more likely to intervene when danger is present than when danger is absent but that there is not enough evidence to conclude that the intensity of the danger makes their intervention either more or less likely.

Author Contributions
Marie Rosenkrantz Lindegaard developed the study concept, designed the study, collected the data from Cape Town and Amsterdam, developed the coding inventory, contributed to the data analysis, and wrote the first draft of the manuscript. Lasse Suonperä Liebst and Richard Philpot designed the study; developed the coding inventory; supervised the coding process; contributed to the data analysis; and reviewed, edited, and approved the final manuscript before submission. Mark Levine collected data from Lancaster and reviewed, edited, and approved the final manuscript before submission. Wim Bernasco designed the study; performed the data analysis; contributed substantially to the methods, results, and supplementary material sections; and reviewed, edited, and approved the final manuscript before submission.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.