Racial Disparities in Police Use of Deadly Force Against Unarmed Individuals Persist After Appropriately Benchmarking Shooting Data on Violent Crime Rates

Cesario et al. argue that benchmarking the relative counts of killings by police on relative crime rates, rather than relative population sizes, generates a measure of racial disparity in the use of lethal force that is unbiased by differential crime rates. Their publication, however, lacked any formal derivation showing that their benchmarking methodology has the statistical properties required to establish such a claim. We use the causal model of lethal force by police conditional on relative crime rates implicit in their analyses and prove that their benchmarking methodology does not, in general, remove the bias introduced by crime rate differences. Instead, it creates strong statistical biases that mask true racial disparities, especially in the killing of unarmed noncriminals by police. Reanalysis of their data using formally derived criminality-correcting benchmarks shows that there is strong and statistically reliable evidence of anti-Black racial disparities in the killing of unarmed Americans by police in 2015–2016.

Previous studies and data sets have shown that police in the United States kill Black citizens relative to White citizens at higher rates than would be expected under a generative model in which police encounter and kill Black and White citizens in proportion to their relative population sizes (e.g., Gabrielson et al., 2014;The Guardian, 2016;Takagi, 1981). The usefulness of these studies for identifying unjustifiable racial disparities in police behavior, however, has been called into question Fryer, 2017;Selby et al., 2016;Tregle et al., 2019) because police primarily kill individuals-Black or White-who were armed and engaging in criminal activity at the time of the interaction (Ross, 2015;Selby et al., 2016). Underlying differences in race-specific rates of armed criminal activity, rather than-or in addition to-prejudice and/or unintended stereotype bias (Payne, 2006) by police, have therefore been cited as a possible causal driver of the elevated rates of police shootings of Black Americans. Nevertheless, anti-Black disparities in police use-of-force against unarmed individuals persist at both the nonlethal (Fryer, 2016) and lethal (Ross, 2015) level of force. Conditional on being killed by police, Black compared to White decedents are also less likely to have been armed (Charbonneau et al., 2017), so armed status alone does not appear to fully explain racial disparities in the use of force by police.
It is, however, challenging to disentangle unjustifiable racial disparities in police use-of-force from the disparities that might emerge from justifiable responses by police to differential levels of crime-especially because police use-of-force data are observational and frequently lack impartial contextualizing information. Several approaches have been used to control for this important covariate, including direct study of police-onpolice shootings (Charbonneau et al., 2017), multivariate regression analyses (e.g., Hannon, 2020;Ross, 2015;Scott et al., 2017), use of multiple crime rate-"correcting" benchmarks (e.g., Cesario et al., 2019;Miller et al., 2017;Tregle et al., 2019), and the use of encounter-conditional analyses (e.g., Fryer, 2016;Johnson et al., 2019;Worrall et al., 2018). Statistical problems with encounter-conditional approaches have been addressed already with causal models (e.g., Ross et al., 2018); essentially, if a subset of the police population are biased in how frequently they engage in encounters with individuals as a function of race or ethnicity-for example, with a lower threshold of suspicion leading police to encounter Black individuals-then the results of encounter-conditional analyses will be confounded.
Formal theoretical analysis of the benchmarking methodology advanced by Cesario et al. (2019), however, has yet to be done. Cesario et al. argue that "benchmarking" the racespecific counts of killings by police on relative crime counts, rather than relative population sizes, generates a measure of racial disparity in the use of lethal force by police that is not statistically biased by differential crime rates. In their words, "if different groups are more or less likely to occupy those situations in which police might use deadly force, then a more appropriate benchmark as a means of testing for bias in officer decision making is the number of citizens within each race who occupy those situations during which police are likely to use deadly force" (p. 587). In other words, they aim to produce estimates of killing rates by police unique to the interaction of suspect race/ethnicity and criminal status and test for evidence of racial disparity holding constant the relative sizes of the criminal populations. Their publication, however, lacks any formal derivation showing that their benchmarking methodology has statistical properties consistent with their conceptual objectives.
In the following sections, we use the causal model of police use-of-force conditional on criminality implicit in Cesario et al. (2019) and attempt to derive an unbiased measure of police use-of-force using their benchmarking methodology. We first prove that their benchmarking methodology does not remove the bias introduced by crime rate differences but rather creates potentially stronger statistical biases that mask true racial disparities, especially in the killing of unarmed noncriminals by police. We then derive a benchmarking approach that does remove the effect of crime rate differences on estimates of racial disparity in killings by police and use this approach to reevaluate their empirical findings. Using these criminality-correcting benchmarks, we show that there is statistically reliable evidence of anti-Black racial disparities in the killing of unarmed, nonaggressing civilians by police in both 2015 and 2016.

A Causal Model of Police Shootings as a Function of Criminality
Following the implicit generative model in Cesario et al. (2019), we can theoretically investigate the role of differential crime rates on the apparent level of racial disparities in killings by police.
Assume we have a total population of P w , White individuals and P B , Black individuals. And then, let us assume that with probabilities, a W and a B individuals in each respective subpopulation acquire weapons and engage in violent criminal behavior. The population will then be composed of a faction of armed criminals, C w and C B , and unarmed noncriminals, N w and N B : Over some interval of time, each person in the population has a probability of being encountered and killed by police. That probability varies by both race and criminal status. The parameter f indicates the probability of police killing an armed criminal, and the parameter y indicates the probability of police killing an unarmed noncriminal. Thus, in each race/ethnic group, there will be A shootings of armed criminals and U shootings of unarmed noncriminals: For any realization from this generative model, anti-Black racial disparity in the overall probability of being killed by police through either causal path is present if the following inequality is satisfied: We can investigate this expression in more detail by taking the expectations of each side: reduces to the ratio of the probabilities of being killed by police, for Black individuals relative to White individuals, over both causal paths (i.e., engaging in criminal activity and being killed, or not engaging in criminal activity and being killed); this fact is reflected in the numerator and denominator of Equation 10e being convex combinations of the killing probability parameters for criminals, f, and noncriminals, y, where the mixing simplex is determined by the crime rate, a.
For addressing some research questions, this measure can be useful, as it shows the overall level of racial disparity in police use-of-force as it would be experienced by the Black and White communities, including through causal paths like differing levels of poverty and marginalization that might lead to differing levels of criminality. However, for evaluating the behavior of police, this measure is clearly problematic.
Differential crime rates can easily confound the measure given in Equations 10a-10f, severely limiting its usefulness-a key observation of Cesario et al. (2019). For example, assume, for illustrative purposes, that f W ¼ f B and y W ¼ y B and that f-the probability of police killing a criminal-is higher than y-the probability of police killing a noncriminal. In such a case, if a B were greater than a W , then the measure given in Equations 10a-10f would indicate the existence of anti-Black racial disparity in killings by police even if officers treated Black and White individuals-be they criminals or noncriminals-exactly the same. As such, Equations 10a-10f cannot be used to evaluate the appropriateness of policing behavior.
A Bias Correction? Cesario et al. (2019), reacting to studies using measures like that presented in Equation 10a to infer racial disparities, argue that the benchmarking of killing rates by police should be done using the count of criminals, aP, not population size, P, per se. In other words, they argue that the metric of interest from Equation 9 should be written instead as: in order to adjust for the effects of crime rate variation on apparent racial disparities in killings by police. However, there are problems with this approach. Specifically, the bias-correcting benchmark used by Cesario et al. (2019) does not actually yield unbiased estimates except in unrealistic edge cases (discussed below). Evaluating the expectations of Equation 11 yields: Equations 12a-12d do not, in general, yield an unbiased estimate of either f B f W or y B y B and no longer reduce to the ratio of the probabilities of being killed by police for Black relative to White individuals over both causal paths. The numerator and denominator of Equation 12c are also no longer convex combinations of the killing probability parameters, making interpretation difficult. Equation 12c can yield an unbiased estimate of f B f W only in unrealistic edge cases in which police never kill unarmed individuals of either race/ethnic group (i.e., y B ; y W ¼ 0) and/or when the population is composed purely of criminals (i.e., a B ; a W ¼ 1).
The validity of the Cesario et al. (2019) benchmarking methodology depends on the strong assumption that police never kill innocent, unarmed people of either race/ethnic group. While it is true that deadly force is primarily used against armed criminals who pose a threat to police and innocent bystanders (e.g., Binder & Fridell, 1984;Binder & Scharf, 1980;Nix et al., 2017;Ross, 2015;Selby et al., 2016;White, 2006), it is also the case that unarmed individuals are killed by police at rates that reflect racial disparities. Ross (2015) and Charbonneau et al. (2017), for example, show that conditional on being shot by police, a White suspect is more likely to be armed than is a Black suspect. Even unarmed noncriminals face the risk of being killed by police, and so, the relative population sizes of noncriminals cannot simply be ignored when assessing racial disparities in killings by police.

Bias Corrections
As shown above, when analyzing data pooled from armed and unarmed suspects, it is hard to generate an estimate of racial disparity in police use-of-force that is not confounded by relative crime rates. However, if we assume that those individuals who were armed when killed by police come from the criminal subpopulation and those individuals who were unarmed come from the noncriminal subpopulation, then we can derive unbiased measures by using crime rate data as a benchmark. This property is true by assumption in our causal model, but it is unlikely to exactly hold in empirical data. Nevertheless, there is likely to be a strong correlation, whereby a citizen who is armed and killed by police is more likely to be a member of the violent criminal subpopulation than the noncriminal population.
Criminals. If we condition on armed status (and consider only those incidents where armed individuals are killed by police) and then benchmark on race-specific population sizes, we get the measure: The unbiased relative probability of police killing a violent criminal is given by f B f W . So, the above measure is indeed biased exactly by a B a W . Thus, applying the relative crime rate benchmark of Cesario et al. (2019)-that is, multiplying the left-hand side of Equation 13c by a W a B -will yield an unbiased measure of f B f W . For this correction to hold, however, we must first condition on the status of suspects as armed, before estimating the parameters of interest.
Noncriminals. On the other hand, if we consider only those incidents where unarmed individuals are killed by police and benchmark on race-specific population sizes, we get the measure: The relative probability of police killing an unarmed noncriminal is given by y B y W . So, the above measure is biased by  (2019), but we break down the analysis by the status of the suspect as armed or unarmed and then apply the correct crime rate adjustment benchmark to each subpopulation. In this way, we replicate the premise of Cesario et al. (2019)-that population-level estimates of the relative risk of being killed by police can be confounded by differential rates of criminality-but correct statistical shortcomings in their methodology.

Data Sources
Data on the killing of civilians by police are taken from "The Counted," an online database managed by The Guardian (2016). As Cesario et al. (2019) state, this database is more complete than official federal databases, as police departments underreport to the federal government (Feldman et al., 2017;Klinger et al., 2016;Nix et al., 2017;White, 2006). We use data directly from The Guardian (2016) in our workflow, along with the "unarmed and nonaggressing" data as presented in Cesario et al. (2019).  (2) NCVS (Bureau of Justice Statistics, 2016), as these two data sources allow for extrapolation to population-level counts. As described in Cesario et al. (2019), the NIBRS is a federal database of incidents submitted by law enforcement to the FBI (but compliance is variable and may be nonrandom), and the NCVS is a nationally representative self-report survey of criminal victimization.
The NIBRS data used herein were taken directly from Cesario et al. (2019), and the NCVS data used herein were downloaded from their official source and processed according to the methods described in Cesario et al. (2019). We used the original source data in order to take advantage of the sampling frame and weighting variables provided therein.

Statistical Modeling
We use the generative stochastic models described previously as Bayesian statistical models (McElreath, 2018) coded in Stan (Carpenter et al., 2017;Stan Development Team, 2018b) and fit using R (R Core Team, 2018) and rstan (Stan Development Team, 2018a). Our complete workflow will be maintained on GitHub at https://github.com/ctross/disparities andbenchmarks.

Analysis of Data From "The Counted" Population-Level Relative Risk
We first investigate the relative risk of being the victim of a police killing using standard population size benchmarks. Figure 1 plots population-level relative risk of being the armed or unarmed victim of a police killing using data from "The Counted" (The Guardian, 2016); estimates are subdivided by year-2015 and 2016-and by encounter type. Encounter type categories include the following: (1) "All killings" which refers to all deaths by police, whether by shooting, death in custody, or other means; (2) "By gunshot" which refers to deaths caused by police gunfire; (3) "Not aggressing" which refers to deaths caused by police gunfire against unarmed and nonaggressing civilians-nonaggressing coding was done by Cesario et al. (2019); and finally (4) "Holding firearm" which refers to deaths caused by police gunfire against civilians who were themselves armed with a firearm.
Across all years, encounter types, and armed status categories, police are more likely to kill Black citizens than White citizens. However, these estimates will be affected to an unknown degree by relative crime rates (Equation 10e); to remove the effect of differential crime rates, we now apply the corrections derived in Equations 13c and 14c.

Crime-Benchmarked Relative Risk, Armed Suspects
For the case of armed individuals killed by police, we apply the benchmark derived in Equation 13c to the previous estimates of racial disparities in police killings and recover an unbaised estimate of f B f W , the relative risk of police engaging in what are normally classified as justifiable killings of armed criminals. Figure 2 plots these estimates. As expected for this subset of police killings, we recover a principle finding of Cesario et al. (2019): Racial disparities in the killing of armed suspects by police are proportional to the relative rates of violent criminality.
Our results highlight disparities between NCVS and NIBRS data sources. NCVS data suggest almost perfect proportionality between relative violent crime rates and relative police killing rates of armed suspects, whereas data from the NIBRS suggest that White suspects are killed by police at greater rates than expected relative to their violent crime rates. This might be a true empirical pattern, or it might reflect racial disparity in the reporting of NIBRS data. Cesario et al. (2019) argued that NIBRS data were unlikely to be affected by such biases and that "NCVS [data] are uncontaminated by police bias, yet . . . yield results consistent with the . . . NIBRS data" (p. 588). We, however, find it important to acknowledge the possibility that NIBRS data are biased by reporting. We contrast the relative crime rates calculated using the NCVS and the NIBRS data sets in Figure 3. Here, we find that across almost all years and crime classifications, NIBRS data show greater racial differences in crime rates than NCVS data. NCVS data are based on a randomized sampling design, making them less likely to be biased by differential policing intensity and reporting. Benchmarking approaches based on NIBRS data may underestimate the extent of anti-Black disparities in police use-of-force.

Crime-Benchmarked Relative Risk, Unarmed Suspects
If we apply the benchmark derived in Equation 14c to the previous estimates of racial disparities in the killing of unarmed individuals by police, we can recover an unbaised estimate of y B y W , the relative risk of police killing unarmed individuals. Figure 4 shows that across all crime benchmarks in all years, there is substantial evidence of anti-Black racial disparities in the killing of unarmed noncriminals by police. Here, we fail to recover the principle findings of Cesario et al. (2019). Racial disparities in the killing of unarmed citizens by police do not occur in proportion to the relative rates of noncriminality; unarmed and nonaggressing Black individuals are killed in greater numbers than would be expected given the relative populations of noncriminals.

Discussion
Writing almost a decade ago about policing, Goff and Kahn (2012) lament that it would be "shocking to think that there remained uncertainty about how to tell whether or not racial bias troubled one of our most important institutions" (pp. 177-178). They went on to address both the dearth of nationally representative data on police use-of-force and the lack of methodological paradigms for causal inference about the drivers of racial disparities in extant data. Progress is being made to address the data concerns (Garner et al., 2018;Goff et al., 2016). However, important issues concerning statistical methodology remain largely unaddressed.
We have presented a theoretical model in which we have a direct causal understanding of the generative mechanism of police killings as a function of race and criminality-that is, we explicitly define the probabilities for use-of-force against criminals and noncriminals as a function of race. We have shown that application of the statistical methodology advocated by Cesario et al. (2019) to data generated under this causal model can incorrectly suggest anti-White racial disparities in police use of lethal force, even when there is strong anti-Black bias hard-coded into the model. The statistical bias introduced by their methodology is conceptually, but not empirically, subtle. It can mislead not only about the magnitude but even the direction of effects, as we have shown both algebraically and through empirical analysis of the same data sets as used in their paper.
Although the principle empirical findings of Cesario et al. (2019) concerning racial disparities in the killing of unarmed citizens by police do not hold, we acknowledge that they do hold for the case of armed criminals killed by police. Our methods, however, can cover both cases reliably. Figure 2. Analysis of data on killings of armed suspects by police from "The Counted," 2015 (blue) and 2016 (orange), using crime rate benchmarking. Note. Density curves show the natural log of the posterior distributions of the relative probability (for Black individuals relative to White individuals) of being killed by police. Values greater than 0 indicate anti-Black racial disparity, and values less than 0 indicate anti-White racial disparity. Central 90% credible intervals are represented by darker vertical bars on the distributions. Panel positions without curves indicate categories not considered in our analysis. Across crime rate benchmarks using National Crime Victimization Survey data, we find that there is no evidence of anti-Black racial disparity in police killings of armed individuals after differential crime rates have been accounted for. National Incident-Based Reporting System (NIBRS) crime rate data actually suggest anti-White racial disparities but see issues with the NIBRS data in Figure 3.

The Importance of Benchmarks
Empirical findings aside, Cesario et al. (2019) argue convincingly that to understand racial disparities in killings by police, we have to engage in proper benchmarking, such that we compare the relative rates of police use-of-force against criminals, f B f W , and noncriminals, y B y W , independently. The push to take benchmarking seriously is important for two main reasons: (1) It is important not to confound racial disparities that might arise from justifiable shootings of armed and dangerous criminals by police with racial disparities in the killing of innocent, unarmed civilians; and (2) encounter-conditional approaches that appear to mitigate the need for benchmarks are themselves typically confounded by differential encounter rates and contexts.
With respect to the first issue, researchers interested in the topic of racial disparities in police use-of-force have accounted for status as unarmed (e.g., Ross, 2015) prior to publishing relative risk estimates, meaning that the statistical bias factor that is present if one does not also benchmark on crime rates is only on the order of 1Àa B 1Àa W % 1, but some presentations of data, for example, from The Guardian (2016), have not. The formal derivations provided here validate the benchmarking methods developed in Cesario et al. (2019) when applied to the relative counts of police killings of armed suspects. As in their initial analysis, we find reliable evidence that lethal force by police occurs in direct proportion to race-specific rates of violent crime. However, our formal derivations show that the benchmarking methods developed in Cesario et al. (2019) are misleading and statistically biased when applied to the relative counts of police killings of unarmed suspects. Likewise, their methods will lead to confounding when outcome data are not, or cannot be, classified into criminality categories. Our reanalysis of their data using an appropriate crime rate-correcting benchmark reveals strong and statistically reliable anti-Black racial disparity in police killings of unarmed civilians.
Second, it is important to take benchmarking seriously because, contra Johnson et al. (2019), population considerations cannot be sidestepped when estimating racial disparities in police use-of-force (see a concise proof in . A similar proof is presented in Ross et al. (2018), who use a generative stochastic model to show that the overall racial disparity in police use-of-force can be decomposed into the product of two terms-racial disparity in use-of-force conditional on encounter and racial disparity in the frequency of encounters. So even if encounter-conditional approaches (e.g., Fryer, 2016;Johnson et al., 2019;Worrall et al., 2018) suggest no evidence of racial disparity in the use of lethal or less-than-lethal force by police conditional on encounter, the overall per capita morbidity and mortality from police use-of-force can be higher in the Black population if the Black population is subjected to higher encounter rates with police. Both recent and decade-old data show that Black individuals are more likely to be stopped by police than White individuals (Fryer, 2016;Gelman et al., 2007;Miller et al., 2017; U.S. Department of Justice, 2016), even after a variety of statistical controls have been applied. Moreover, causal inference (e.g., Ross et al., 2018) approaches show that if police are biased in who they encounter, then encounter-conditional approaches can be severely confounded.
For example, consider a thought experiment in which police behavior is heterogeneous, with most officers following standard protocols and a small subset of officers engaging in additional unwarranted use of nonlethal force (like tasers) against Black individuals. Then, analysis of pooled encounterconditional data would suggest that Black individuals are less likely to be shot rather than tased, as compared with White individuals. In other words, elevated levels of sublethal assault against innocent Black individuals by a subset of police would have the effect of diminishing the apparent severity of anti-Black racial disparities in lethal force conditional on encounter (Ross et al., 2018). Racial disparities in the frequency of taser use are consistent with such an explanation (Fryer, 2016). For this reason, anti-Black encounter bias is a confounding factor in recent encounter-conditional studies finding anti-White racial disparities (e.g., Fryer, 2016;Johnson et al., 2019;Worrall et al., 2018). For encounter-conditional analyses to be convincing, researchers would have to rule out the presence of racial bias in encounter probability .
Successful interventions to mitigate racial disparities in police use-of-force require that we reliably identify the drivers of such disparities. Methods leading to accurate causal inference are essential. An important step in validating our statistical tools involves applying them to simulated data sets, generated under a process that we understand explicitlybecause we coded it. We should be asking ourselves: "If we knew the generative process of the data perfectly, would our statistical approach allow us to correctly detect real disparities?" In this article, we have used a simple generative model in just this way. We show that if the data generating process were such that unarmed, Black, noncriminals were more than twice as likely to be killed than unarmed, White, noncriminals (and if crime rates in our model were as we find empirically), then the statistical methodology of Cesario et al. (2019) would erroneously suggest anti-White racial disparities in the killing of unarmed noncriminals. This calls into question the validity of such methods.
As we move forward in studies of policing, new forms of data, like evidence from officer-worn cameras (Broussard et al., 2018;, are becoming available. Such data raise hopes for accountability and detection of discriminatory violations of individual rights (Scheindlin, 2010) and even study of racial bias in respectfulness of language use and escalation or de-escalation of encounters (Voigt et al., 2017). However, there is also concern over how such data will be managed and protected from misuse (Ringrose, 2019). These new forms of data, however, may be able to help resolve conflicting reports about the existence of racial disparities in police behavior by recording possible disparities in both encounters and use-of-force conditional on encounter at the officer level. While such data can be powerful, they can only be appreciated in light of statistical models that must themselves be validated on inferential targets.