Seeing the forest and the trees: Examining the impact of aggregate measures of recidivism on meta-analytic conclusions of intervention effects

Recidivism is a multidimensional construct that is operationalized in a variety of ways. We explored the impact of using aggregated measures of recidivism (i.e. multiple measures combined) versus disaggregated measures (i.e. defined specifically as parole violation, arrest, conviction, incarceration) in meta-analytic analyses of correctional intervention effectiveness. Using a sample of 20 meta-analyses, we compared within-study findings between aggregated and disaggregated measures. Over half (60%) of the studies differed with respect to the statistical significance of their aggregated versus disaggregated findings, suggesting that aggregated measures of recidivism may give an incomplete picture of treatment effectiveness. Disaggregating measures of recidivism in meta-analysis is recommended for a comprehensive assessment of the impacts of intervention approaches. Policy implications are discussed.


Introduction
Recidivism is one of the most fundamental concepts in criminal justice research and is a widely used performance measure to determine intervention effectiveness (Blumstein and Larson, 1971;King and Elderbroom, 2014).Broadly, recidivism is defined as "reengaging in criminal behavior after receiving a sanction or intervention" (King and Elderbroom, 2014: 2).While on the surface this definition appears simple and pragmatic, the term "recidivism" has been described as "vexingly complicated" (Weisberg, 2014: 799) and a "fruit salad concept" (Beck, 2001: 11) that constitutes conceptually different measures of offending-related outcomes (e.g.revocation, arrest, conviction, incarceration).In practice, application of the term "recidivism" has been referred to as "a conceptual and operational conundrum" (Shoeman, 2010: 80) that suffers from definitional confusion due to inconsistency in how the term is operationalized and used by researchers (Blumstein and Larson, 1971;Rorie et al., 2018).
The definitional ambiguity associated with the term recidivism (or "reoffending") is particularly problematic with respect to quantitative synthesis.Meta-analysis is frequently used to determine the overall effectiveness of correctional interventions at reducing criminal behavior, and it is common practice for authors to synthesize various measures of recidivism in their pooled analyses (see Table 1 in Lipsey, 2019).Crucially, in meta-analysis, the concept of "mixing apples and oranges" refers to the importance of grouping studies and outcomes that are conceptually comparable.As described by Lipsey and Wilson (2001), "because meta-analysis focuses on the aggregation and comparison of the findings of different research studies, it is necessary that those findings be of a sort that can be meaningfully compared" (p.2).Notably, some research suggests that when pooled recidivism outcomes in meta-analyses are restricted to more narrow definitions (such as rearrest, reconviction, etc.), results point to intervention effectiveness for some outcomes but not others (e.g.Bouchard andWong, 2018a, 2018b;Kettrey et al., 2019;Wright et al., 2020).In addition, although often used interchangeably in criminal justice literature, the terms recidivism and reoffending are not synonymous in all contexts (e.g.legal)-which may create further confusion with respect to operationalization.
In addition to the theoretical importance of conceptual similarity, there are practical implications of conceptual ambiguity with respect to the evaluation of correctional interventions.Specifically, the results of meta-analyses are commonly used by decisionmakers in governmental and nongovernmental organizations to inform evidence-based policy and practice.If stakeholders rely on meta-analyses to infer best practices in offender rehabilitation, it is imperative that meta-analytic findings present comprehensive conclusions and insightful interpretation about intervention impacts.For instance, some literature suggests that assessments of recidivism/reoffending can produce different findings depending on, for instance, the definition of recidivism, type(s) of crime, severity of crime, organizational factors/practices, follow-up period, and so forth (Maltz, 2001;Ruggaro et al., 2015), all of which can affect the interpretation of intervention effectiveness.If correctional strategies are accepted/rejected based on the interpretation of metaanalytic findings without consideration of how recidivism was operationalized, or without considering the organizational factors/practices that may influence recidivism data, conclusions about "what works" for offender rehabilitation are likely to be imprecise.
In the current study, we explore the impact of aggregate measures of recidivism on meta-analytic conclusions of intervention effects.

Overview of meta-analysis
Meta-analysis is a form of quantitative research synthesis in which systematic methods are used to identify a population of relevant studies, and explicit techniques are used to extract data and statistically pool research findings to produce an average estimated treatment effect.
The advantages of meta-analysis include increased objectivity in the selection of studies through a set of a priori-defined inclusion criteria (Thompson and Belur, 2016), improved precision in estimating the treatment effect through the conversion of study results into a common effect size metric (Wilson, 2001), ability to incorporate the direction and size of each study's treatment impact (Wells, 2009), capacity to examine potential causes for variations in treatment effect magnitude (such as the impact of treatment or measurement; Williams et al., 2017), and overall methodological transparency which limits hidden biases and enables replication (Siddaway et al., 2019).
Criticisms of meta-analysis include the large number of steps in the methodological process that involve subjective decision-making (Ioannidis, 2016), and potential validity concerns with respect to the study selection procedures and data preparation for effect size estimation that vary across meta-analyses (e.g.Lakens et al., 2017).One of the key assumptions for meta-analysis is conceptual similarity of the constructs and relationships being pooled.Yet, the definition of what study findings are conceptually comparable for purposes of meta-analysis is often fixed only in the eye of the beholder.Findings that appear categorically different to one analyst may seem similar to another. . . .It is essential that the analyst have a definition for the domain of interest and a rationale for the inclusion and exclusion of studies from the metaanalysis.(Lipsey and Wilson, 2001: 3) It is the "eye of the beholder" factor that leads to meta-analysts making different choices concerning the pooling of recidivism measures across studies.

Operational variation of "recidivism"
Scholastic debate pertaining to ambiguity of the term "recidivism" is not new; many criminologists have lamented the absence of a consistent definition of recidivism (e.g.Maltz, 2001;Shoeman, 2010;Weisberg, 2014).Central to the debate of definitional ambiguity is the multidimensional nature of recidivism.In criminology, the term recidivism generally refers to falling back/relapsing into criminal behavior following sanction/ punishment or intervention.More specifically, the National Institute of Justice (2021) defines recidivism as "criminal acts that resulted in rearrest, reconviction or return to prison with or without a new sentence during a three-year period following the prisoner's release."Put simply, according to this definition, if Offender 1 is rearrested for an offense, Offender 2 is reconvicted, and Offender 3 is reincarcerated, they are all considered to have "recidivated"; regardless of the type of crime, severity of crime, or length of survival period (Maltz, 2001).Critics argue that with such diversity in the conceptualization of recidivism, it does not seem reasonable to expect a single aggregate measure of criminal behavior to explain the nuance of "recidivism" with precision (Maltz, 2001).
A second concern about the conceptualization of recidivism is the lack of systematic operationalization and measurement of recidivism outcomes (Shoeman, 2010;Weisberg, 2014).Maltz (2001) highlights the consequences of fluctuating operationalization and measurements of "recidivism" by explaining that One program may use a follow-up time of 1 year, another of 6 months.Follow-up time may be computed starting with release from prison or with release from parole.The recidivating event may be a technical violation of the conditions of parole, or it may be a return to prison.There are so many possible variations in the method of computing recidivism that one doubts if more than a handful of the hundreds of correctional evaluations are truly comparable.(p.22) Furthermore, Maltz (2001) argues that as a result of the conceptual and operational ambiguity of the term, stakeholders can choose to define "recidivism" provisionally depending on their particular purpose, context, or research question.We contend that while differential operationalization of recidivism is not in and of itself a cause for concern as long as the definition is clearly stated, it becomes potentially problematic if differential operationalization is entrenched with underlying conceptual variability.
A third problem with "recidivism" as a measure of intervention effectiveness is that all recidivism events are often weighted equivalently; in other words, an offender is determined to have recidivated regardless of the type of crime, level of severity, or time to failure.As such, recidivism is treated as a dichotomous outcome (fail vs did not fail), ignoring the nuanced complexity of the term (Maltz, 2001).Sechrest et al. (1979) argue that "a great deal of information is lost when something as complex as possible criminal activity that may or may not culminate in detection, arrest, and conviction is finally expressed as a simple dichotomy" (p.71).Acknowledging and understanding the complexity and nuances of recidivism is essential in order to properly evaluate the effectiveness of correctional interventions and draw valid inferences with which to inform policy and practice.

The effect of operational variation in recidivism research
Although conversation surrounding definitional ambiguity of recidivism is not new, study of the impact of definitional ambiguity in criminological research is sparse.Existing research has investigated this issue in more focused types of recidivism such as corporate crime and inmate misconduct.For example, Rorie et al. (2018) examined the use of meta-analysis in corporate crime deterrence research, including the impact of varied definitions of corporate crime on the type of research design employed.The authors demonstrated that the magnitude of deterrence strategy effect varied depending on the study's conceptualization of corporate crime as "narrow" versus "broad" (in brief, defined as a focus on environmental crimes vs a focus on a wider variety of types of crime).Furthermore, results demonstrated that the magnitude of effect was statistically different for financial crimes and environmental crimes, suggesting that the way in which corporate crime is operationalized can impact the conclusions drawn with respect to the effectiveness of deterrence strategies.
Galvin (2020) also explored the consequences of definitional choice and the operationalization of white-collar crime.Comparative analyses demonstrated that regression models produced substantially different conclusions regarding the relationship between white-collar status and incarceration.In particular, while models that used narrow approaches to define white-collar crime ("Patrician" (i.e.financial offenses such as embezzlement) versus "Populist" (i.e.regulatory/public order offenses)) indicated a statistically significant relationship between white-collar status and sentencing, models that used a "Hybrid" approach to operationalize white-collar crime (i.e.prosecution for any offenses requiring power, privilege, and/or specialized access) indicated a non-significant relationship between white-collar status and sentencing.Furthermore, the opposite was true with respect to sentencing length; the hybrid model suggested a statistically significant relationship between white-collar status and length of sentence, while models that used a narrow operationalization of white-collar crime did not find a significant relationship.Galvin (2020) concluded that there exists underlying conceptual variation with respect to different definitions of white-collar crime and cautioned that the use of different definitions in research is not inconsequential.
In another study, Steiner and Wooldredge (2009) used U.S. Census Bureau data on men incarcerated in state facilities to examine whether different measures of inmate misconduct produce substantively different conclusions with respect to inmates' odds of committing rule infractions while in prison.The findings suggest that models predicting rule violations using narrow operationalizations of inmate misconduct (i.e.physical assaults, drug/alcohol offenses, nonviolent offenses) produced statistically significant estimates for some predictors of misconduct, while a similar model using a broad operationalization (i.e."all inmate misconduct") produced non-significant estimates for the same predictors.

Purpose of the study
As recidivism is a widely used outcome measure in criminal justice research it is critical that its operationalization is clear.Despite some literature suggesting that aggregate measures of recidivism for certain types of crime (i.e.corporate crime, inmate misconduct) indicate important conceptual variability, to our knowledge, research has yet to examine the impact of definitional ambiguity of "recidivism" on meta-analytic outcomes of correctional interventions.The objective of the current study is to explore whether differences exist in the magnitude and statistical significance of intervention effects when recidivism outcomes (i.e.parole violation, arrest, conviction, incarceration) are aggregated under the operationalized construct of "recidivism" versus when they are disaggregated and analyzed as separate outcomes.
To achieve this objective, we identified 20 meta-analyses of correctional interventions, and for each meta-analysis, we compared the direction, magnitude, and statistical significance of pooled effect sizes between aggregated and disaggregated measures of recidivism.We hypothesized that if there are important conceptual differences across outcome measures, the analyses within each study would produce disparate findings with respect to the overall pooled effects.Importantly, we note that the goal of this study was not to validate any of the coding or analyses of the 20 meta-analyses in the sample, nor was it to locate the set of primary studies included in each and attempt to independently reproduce the pooled analyses. 1Rather, the goal was to identify a set of relevant meta-analyses that met our inclusion criteria (specified below), disaggregate and/or aggregate the recidivism findings presented in the study, calculate new pooled effect sizes (when necessary), and compare within-study findings between aggregated and disaggregated measures of recidivism to determine whether the meta-analytic analyses differed with respect to the magnitude and statistical significance of pooled intervention effects.

Sample
The sample of meta-analyses was obtained from Criminal Justice Abstracts (EBSCO host database).To identify relevant studies, we applied the search strategy "meta analysis" AND (recidiv* OR arrest* OR incarcerat* OR convict* OR charge* OR offen*) to the Abstract field, with date limiters January 1, 2000 to August 7, 2021.To be considered for inclusion, meta-analyses were required to evaluate a correctional intervention for which the sample was primarily adult offenders, 2 and to report on at least two crimerelated outcomes (e.g.arrest and incarceration, conviction, and "recidivism").In addition, meta-analyses must have presented aggregated and disaggregated analyses, or report sufficient data to conduct these analyses.For instance, if subgroup (i.e.disaggregated) analyses were not presented in a meta-analysis, the study must have included information to enable subgroup analysis (i.e. a table that listed outcome measures and associated effect sizes for each primary study, or other data to allow for the calculation of effect sizes (e.g.means, standard deviations, binary (y/n) data, sample sizes for treatment and control groups)). 3 We emphasize that the goal of the search was to identify a fairly small set of metaanalyses of correctional interventions in the field of criminology in order to conduct a preliminary examination of whether substantive differences exist between pooled aggregated recidivism measures versus disaggregated measures.As such, the literature search was not structured as a traditional systematic search applied across a large set of electronic databases and gray literature sources.

Data extraction
Three types of data were extracted from the sample of meta-analyses: (1) operationalization of recidivism outcome measures, (2) pooled effect sizes (i.e.aggregated and disaggregated), and (3) individual-study effect size data.As the types of data presented in each meta-analysis varied, these data were extracted from studies only when necessary/ applicable.
Operationalization of recidivism outcome measures.When a meta-analysis provided clearly operationalized definitions of the outcome measures that were used in analyses (i.e.parole violation, arrest, conviction, or incarceration), that information was extracted to an Excel spreadsheet. 4When it was not possible to identify the type of outcome measure used (i.e. the outcome measure was referred to as "recidivism" or a mixture of outcomes (e.g."either parole revocation or arrest")), we retrieved the primary studies that were included in the meta-analysis and extracted the operational definition for each outcome measure that contributed an effect size to the pooled meta-analytic set.
Pooled effect sizes.For each meta-analysis, we extracted pooled effects of disaggregated measures of recidivism (i.e. at least one of the following: parole violation, rearrest, reconviction, or reincarceration) and/or pooled effects of aggregated measures of recidivism (i.e. the meta-analysis pooled two or more of the aforementioned measures of recidivism in the same analysis).The magnitude, direction, and statistical significance of the pooled analyses were also extracted (i.e. from the narrative, a table, or forest plot) and inserted into a coding spreadsheet.
Individual-study effect size data.For meta-analyses that did not report disaggregated and/ or aggregated results, we extracted the necessary data to calculate pooled analyses.Four key data points were obtained as necessary: effect size, upper and lower 95% confidence intervals, and standard error.For meta-analyses in which the effect size data for each primary study was provided in forest plots and/or tables, data were extracted directly into a coding spreadsheet.For meta-analyses in which this information was not provided, more detailed data extraction was necessary.For example, if the requisite information was not provided in a forest plot, but the meta-analysis presented a table that listed studylevel outcome data for an experimental and control group (e.g.sample size and percentage of individuals who recidivated in each group; for example, see MacKenzie ( 2006)), those data were extracted into a spreadsheet and used to calculate effect sizes.

Analytic approach
Calculating individual-study effect sizes.Effect sizes from individual (primary) studies were required to calculate missing aggregated and/or disaggregated pooled analyses.Six meta-analyses did not provide aggregated, disaggregated, or individual effect sizes that were readily suitable for the purposes of this study (see footnotes in Table 1 for more detail).For these meta-analyses, we independently calculated effect sizes for their sets of primary studies; all effect sizes were calculated using the same type of measure as in the original analysis (odds ratios).Effect sizes were calculated by computing the number of treatment group participants who recidivated (e.g. were arrested during the follow-up) compared with the number of control group participants who recidivated.Odds ratios were calculated using David Wilson's online effect size calculator. 5  Calculating missing aggregated and/or disaggregated analyses.Twelve meta-analyses required the calculation of aggregated and/or disaggregated pooled analyses that were not included in the original analysis.Two key pieces of data were required for An aggregate outcome was reported in the original study; however, as the sample included too few studies to conduct disaggregated analyses (which was essential for the purpose of our study), individual-level study information that was provided in the meta-analysis was used to calculate new effect sizes and include a more diverse set of studies in the aggregated analysis; which subsequently enabled the calculation of subgroup analyses.
b Tong and Farrington (2006) reported a pooled analysis for this outcome; however, the analysis included studies that examined the effect of Reasoning and Rehabilitation on juvenile offenders, the effect size did not meet inclusion criteria for analyses in the current study.Information from Tables 1 and 2 were used to calculate effect sizes for the studies that primarily targeted adult offenders.

Table 1. (Continued)
synthesizing data in a meta-analysis: the effect size and its associated standard error.Standard errors that were not reported in the meta-analyses were hand-calculated.All analyses were conducted using the metan command in Stata 16.Random effects models (DerSimonian and Laird, 1986) were used to synthesize data for all studies. 6

Results
As shown in Table 1, the operational definition of recidivism varied substantially across the set of meta-analytic studies, and there is considerable variation in terms of operationalization with respect to the type of analyses (i.e.disaggregated or aggregated) presented in each study.Also demonstrated in Table 1 is the substantial variation with respect to the outcome measures combined for the aggregate "recidivism" measure in analyses.Across all 20 meta-analyses, a total of 15 distinct crime-related measures were used to operationalize recidivism; outcomes ranged from "drug relapse" to "reincarceration," and also included measures that were operationalized as "multiple definitions" and "not indicated" (see Table 1).
As shown in Table 2, after calculating 34 new pooled aggregated and/or disaggregated effect sizes, the final analytic sample included a larger total number of analyses for within-study comparison.Similar to above, Table 2 demonstrates the considerable variation of outcome measures used in the aggregate analyses.As a result of these new analyses, all 20 meta-analyses included an aggregate pooled analysis of "recidivism" as well as a pooled analysis for at least one disaggregated outcome, enabling within-study comparisons for all 20 meta-analyses.The findings from the new pooled analyses (i.e. the 34 newly calculated effect sizes for the purposes of the present study) are shown in Table 2.
Table 3 provides a summary of the results of each meta-analysis with respect to the statistical significance of pooled effect sizes.The table also presents a yes/no summary of whether there were within-study differences between the statistical significance of aggregated and disaggregated meta-analytic outcomes.Notably, the objective of the study was to explore whether differences exist in the magnitude and statistical significance of effect sizes; however, as there is little within-study variation in the magnitude of effect sizes, these findings are not discussed here.As shown in Table 3, in several meta-analyses the statistical significance of treatment effect varied by outcome measure (i.e.parole violation, arrest, conviction, incarceration, and "recidivism"), suggesting that different measures of recidivism contribute unique information about intervention effects.In particular, the statistical significance of intervention effect (i.e.statistically significant (p < .05)or not statistically significant (p > .05)varies between the aggregated and disaggregated outcome measures in 12 of the 20 studies.The disparate withinstudy results support the hypothesis that measures of recidivism are conceptually dissimilar, suggesting that aggregate measures of recidivism may provide an incomplete picture of treatment effectiveness.
To explain, see, for example, Tong and Farrington (2006).Based on our revised calculations of pooled effects (i.e.excluding studies that targeted juvenile offenders), the aggregate pooled effect across 22 studies suggests that the Reasoning and Rehabilitation program has a strong, positive, and statistically significant overall impact on recidivism (odds ratio (OR) = 1.389, p < .001).When disaggregated by outcome measure, similar One effect size (Waldo, 1988) was excluded due to missing data to calculate 95% confidence intervals.The disaggregated pooled outcome was calculated from the subset of 5 studies in the aggregated analysis that reported arrest as an outcome measure. b The disaggregated pooled outcome was calculated from the subset of three studies in the aggregated analysis that reported arrest as an outcome measure. c The disaggregated pooled outcome was calculated from the subset of eight studies in the aggregated analysis that reported incarceration as an outcome measure.
d Two effect sizes (Siegal, 1997_GED andSiegal, 1997_PALS) were excluded due to missing control group post-test data.
e One effect size (Ross et al., 1988 (imprisonment)) was excluded due to a zero-cell count in the treatment group.
f One effect size (Anderson, 1995) was excluded due to missing the control group post-test sample size.
g Three effect sizes (Uggen, 1997 andMenon et al., 1992 (arrest andincarceration)) were excluded due to missing control group post-test sample size.a Statistical significance was indicated in the meta-analysis; however, the exact p value was not provided.
Refer to the 95% confidence intervals in Table 1.
conclusions can be drawn about intervention effect with respect to reconviction (OR = 1.282, p < .01).However, the intervention effects are not statistically significant for parole violation (p = .890),rearrest (p = .107)or reincarceration (p = .195);indicating no difference between the treatment and control groups for these outcomes.In other words, while the aggregate analysis suggests that Reasoning and Rehabilitation is an effective intervention for reducing overall recidivism, the disaggregated analysis suggests that intervention effectiveness can only be affirmed with respect to subsequent convictions.Similarly, in Schmucker and Losel's (2015) meta-analysis, the aggregate finding across 13 studies suggest that sex offender treatment has a significant effect on reducing recidivism overall (OR = 1.45, p < .01).When outcomes are pooled separately, however, the disaggregated findings suggest that sex offender treatment has a small, positive, and statistically significant effect on reducing convictions (OR = 1.69, 95% confidence interval (CI): 1.12-2.54),but a non-significant effect on reducing arrests (OR = 0.98, 95% CI: 0.46-2.09).

Discussion
Recidivism is arguably the most fundamental indicator of intervention effectiveness in the field of criminology.Furthermore, despite criticism concerning its conceptual variability, "recidivism" continues to be the most widely used measure for correctional intervention effectiveness in meta-analysis.Although the literature suggests that some aggregate measures of recidivism are not appropriate for meta-analytic pooling (Rorie et al., 2018;Steiner and Wooldredge, 2009), it remains a common practice in metaanalyses of correctional interventions.To our knowledge, research has yet to examine the impact of aggregate measures of recidivism on meta-analytic conclusions of correctional interventions.The current study examined whether the different ways in which recidivism is operationalized in meta-analyses has an impact on the statistical significance of pooled effect sizes.Specifically, we compared disaggregated analyses of recidivism (parole violation, rearrest, reconviction, and reincarceration) with aggregated measures of recidivism (by combining multiple measures in a single pool) to determine whether the resulting pooled treatment effects differed in terms of statistical significance.
With respect to the findings from the current study, the variety of ways in which "recidivism" is operationalized in aggregate analyses is noteworthy.As demonstrated in Tables 1 and 2, an assortment of measures are used in aggregate pooled analyses, with little consistency across the 20 meta-analyses.This finding highlights the pervasiveness of conceptual and definitional ambiguity for recidivism.Also noteworthy is how few meta-analyses specified which recidivism measures were used in their aggregate analyses.Our findings also demonstrate that in more than half of the studies in our set, there is evidence of variation in the statistical significance of effect sizes across aggregated and disaggregated analyses of recidivism outcome measures.We are not suggesting that the common practice of presenting aggregate measures of recidivism should be abolished; in many cases, a correctional intervention may seek to reduce any type of recidivism and either the meta-analyst or policymakers are indifferent to whether variation exists in recidivism type.Rather, we argue that presenting only aggregate measures of "recidivism" in meta-analyses to assess intervention effectiveness is worthy of careful consideration as, without analyzing the impact of disaggregated measures, this nuance of intervention effectiveness is obscured.In other words, as disaggregated measures provide unique information about the effects of correctional interventions, simply "seeing the forest for the trees" can conceal important details that help to understand intervention effectiveness more clearly.
Overall, these findings suggest that if the aggregate measure of "recidivism" is the only outcome reported for intervention effectiveness, the nuance of "recidivism" is overlooked, and, subsequently, meta-analysts risk discarding important findings.Uniform policies that result from the aggregated output are likely to be ineffective, as they do not account for the variability in recidivism outcomes (Mulvey and Schubert, 2012).Put differently, as each outcome provides unique information about intervention effects on different stages of the criminal justice process, recognizing the implications of using only aggregate measures to determine intervention success is important with respect to producing a comprehensive body of knowledge about treatment effectiveness in offender rehabilitation, the process of desisting from crime (Kazemian, 2007), as well as for advising policy and practice for correctional interventions.For example, one goal of a given intervention or prevention approach may be to reduce arrests.An aggregated recidivism measure in a meta-analysis that combines arrest outcomes (for which study-level effects are positive) with incarceration outcomes (for which study-level effects are nil) may dilute the positive treatment effect shown for arrest, rendering the overall effect size non-significant and the intervention deemed noneffective at achieving its goal.
In addition, the effectiveness of correctional interventions is tightly tied to the desistance literature and the development of criminal justice policy.More specifically, the literature on desistance is often criticized as viewing the failure of rehabilitation as an individual-level behavioral problem (Graham and McNeill, 2017;Weaver, 2019); as such, if an intervention is not effective at reducing crime, it is because the individual did not successfully learn how to change their behavior.Furthermore, the desistance literature is also criticized as often being "disconnected from specific analyses of the cultural and structural contexts in which both offending and desistance take place" (Weaver, 2019: 641).In other words, failing to analyze intervention effects and/or failure to desist within the context of the social, organizational, cultural, and/or structural factors that may influence recidivistic behavior in general, and the generation of recidivism data in particular, can impede our ability to fundamentally understand how correctional interventions have an effect on recidivism (Graham and McNeill, 2017;Maltz, 2001).This is important because, just as the operational definition of recidivism is left to the discretion of the researcher, what constitutes a "recidivating event" is largely left to correctional and parole agencies, and can vary across time, correctional personnel, and jurisdictions (Maltz, 2001).For example, parole supervision policies may be strict in one jurisdiction due to a recent surge in crime rates by parolees and may be more lenient in another jurisdiction due to prison overcrowding.As described by Maltz (2001), departmental policy or organizational factors such as "the capacity of the state to incarcerate [or] the number of 'free beds' could have an eventual effect on failure rates, especially those based on return to prison" (p.53).Alternatively, in the absence of a clear definition of what constitutes an incident that qualifies as a "technical violation," the decision to revoke parole may be arbitrarily left to the subjective discretion of the parole officer, both of which can severely impact over-or under-reporting of criminal activity (Maltz, 2001).As such, before we can truly understand "what works" in correctional interventions, it is imperative that scholars recognize, and reiterate, that it is not only the characteristics of the individual offender that affect rates of recidivism, but also how certain correctional organizations, policies, and practices can affect recidivism data (Maltz, 2001).In sum, a more fundamental understanding of "what works" in correctional intervention necessitates a more critical view of the data upon which these conclusions are made; specifically, how recidivism data are produced, and an understanding that measures of recidivism are (at least somewhat) a product of particular criminal justice goals and interests; not simply individual-level problem behavior.

Recommendations
Altogether, our findings support the contention that recidivism is not a unidimensional measure of treatment effectiveness; it is an amalgamation of several distinct concepts about offending behavior (e.g.Maltz, 2001;Shoeman, 2010;Weisberg, 2014).As demonstrated herein, the pooling of recidivism outcomes in meta-analysis can have serious repercussions for the development of recommendations and policy with respect to effective crime prevention initiatives and practices.Meta-analyses represent the summative state of the literature on a given topic and are often widely read documents that are used for making informed decisions (Lakens et al., 2017;Polanin et al., 2020), including governmental decisions with respect to criminal justice policy.Based on these findings, we recommend against using aggregate measures of recidivism as the sole outcome when meta-analyzing the impact of correctional interventions.Although pooling different types of outcome measures can be useful for increasing the number of effect sizes in a meta-analytic study set when the population of relevant primary studies is small, we contend that the statistical power associated with an increased sample size is unlikely to cancel out the methodological limitations of combining conceptually dissimilar outcomes.Meta-analysts who choose to aggregate different measures of recidivism should engage in sensitivity testing and discuss the issue as a study limitation.Relatedly, metaanalyses should clearly state specific definitions of "eligible" recidivism outcomes in each of their analyses, and, if results from individual primary studies are presented, the specific definitions used in each study.Without transparency with respect to the operationalization of "recidivism" in meta-analyses, challenges will remain in estimating the true impact of correctional treatment.With respect to primary studies, as aggregate measures of recidivism may be insufficient as a sole measure of intervention effect, we recommend the use of a wide range of measures to assess the impact of correctional interventions.Furthermore, in line with crime desistance literature (Kazemian, 2007), it may be worth considering various parameters (e.g.type of crime, frequency, seriousness) when measuring recidivism, and/or integrate other outcome measures such as stable employment, stable housing, increased educational/vocational training, reduced substance use, and improvements in cognitive, affective, or attitudinal measures, as these are strongly correlated with recidivism outcomes (e.g.Duwe, 2018;Jacobs and Gottlieb, 2020).Finally, as descriptions of outcome measures in primary studies were sometimes incomplete, vague, or nonexistent, we underscore the importance of providing clear operational definitions for all variables.

Limitations
There are two primary limitations to the current study.First, the sample of meta-analyses was small (n = 20), and our inclusion criteria limited us to a specific type of meta-analysis in which the data necessary to calculate disaggregated and aggregated pooled effects were present.While the set of 20 studies is undoubtedly not a random sample of metaanalyses that examine the effects of correctional interventions on recidivism, it nonetheless includes a solid representation of studies published in reputable criminology journals.Second, some of the disaggregated analyses were conducted with a small number of studies (e.g.n = 3).As larger sample sizes have more statistical power to detect effects, pooled analyses that were conducted with a small number of studies may lack the statistical power to detect effects.It is therefore possible that the low statistical power associated with smaller samples in the disaggregated analyses led to the overestimation of effect sizes in those subgroups.

Conclusion
Results from the current study underscore the importance of examining single, commensurate measures when possible.We encourage the field to consider the consequences aggregated analysis, followed by the second most prevalent outcome measure.For example, of the 23 outcome measures reported in Berghuis (2018), 9 were arrest, 8 were conviction, and 6 were incarceration.When selecting studies for the aggregated analysis, studies reporting an effect size for the outcome "arrest" were favored because "arrest" was the most prevalent across studies.

Table 1 .
Description of outcomes presented in included meta-analyses (N = 20).

Table 1 .
(Continued) CI: confidence interval; OR: odds ratio; LOR: logged odds ratio; DV: domestic violence; n/a = outcome not included in the study.a

Table 2 .
Summary of newly computed effect sizes. a

Table 3 .
Comparison of disaggregated and aggregated recidivism outcomes with respect to statistical significance of findings.
n/a = outcome not included in the study.Not signif.= p > .05.