What matters for assessing insider witnesses? Results of an experimental vignette study

Assessments of insider or accomplice witnesses are a major challenge in complex criminal cases, such as those of international crimes: war crimes, crimes against humanity, and genocide. While insiders are both important and problematic, little is known about how legal decision-makers determine to what extent such witnesses can be relied upon. This study, the first to experimentally study practitioner decision-making in this context, presents the findings of an online vignette experiment with former and current international criminal law practitioners (N  =  160). Quantitative analyses show that the assessments of the witness and the information quality are interdependent, hence, where an insider is considered not credible, the information they provide is perceived as less reliable as well, and vice versa. Furthermore, decision-makers tend to accord more weight to the quality of information rather than the quality of the witness, in line with jurisprudence analyses. The consequences for research and practice are discussed.


Introduction
Indeed, at the core of almost every complex criminal case sits and accomplice (or cooperating) witness, a person capable of giving an insider's view of the crimes under investigation… (Cohen, 2002: 817) In evaluating the testimony of Bečir Begovic, the Trial Chamber has balanced his former position as Chief of the Srebrenica Public Security Station and his consequent informed knowledge of the situation, against the interest he may have to disengage the civilian police and himself from the events that took place at the Srebrenica Police Station… 1 Insider or accomplice witnesses are central to complex criminal cases, as demonstrated by the international experiences of investigating and prosecuting members of the mafia, drug cartels or other organised crime groups (Acconcia et al., 2014;Fyfe and Sheptycki, 2014;Nardini, 2006;Piccolo and Immordino, 2017;Wetmore et al., 2020). The principal argument for relying on insiders is the simple lack of reliable alternatives to gain insight into the internal functioning of criminal organisations and the actions of the accused (Combs, 2018;Koumbarakis, 2014). It is thus unsurprising that insider witnesses have been particularly prominent in the investigations and prosecutions of international crimes (war crimes, crimes against humanity, and genocide): such proceedings commonly involve large-scale criminality, complex organisations and tend to concentrate on the most senior members or leaders of a (criminal) organisation (Ambos and Stegmiller, 2013;Del Ponte, 2004;Guariglia, 2009).
Though lacking a consistent definition, at International Criminal Courts and Tribunals (ICCTs) 'insiders' commonly share the following attributes: they (i) have worked closely with the accused (Sluiter et al., 2013) and have potentially been involved in the commission of the crimes (Harmon and Gaynor, 2004), and (ii) provide information on the accused's criminal conduct to the court (Stepakoff et al., 2014). Insiders were relied upon extensively in majority of trials at ICCTs, including the pivotal trials of Charles Taylor, the former president of Liberia at the Special Court for Sierra Leone (Pamsm-Conteh, 2017); Radovan Karadžić, the former president of Republika Srpska at the International Criminal Tribunal for the former Yugoslavia (Vukušić, 2022); and all the cases that have been litigated at the International Criminal Court (ICC) to date (Chlevickaitėet al., 2021). This practice did not proceed without trouble as major international crimes cases have fallen apart due to insider witnesses recanting their testimonies, or due to a lack of trust placed upon them by the judges at trial (de Brouwer, 2015;Lawson and Bartels, 2019;Mueller, 2014).
Accurately assessing witness testimony, or deciding who to trust, is a fundamental challenge to justice practitioners (Cooper et al., 2013;Coyle, 2013;Denault and Dunbar, 2019), especially in settings where little or no other evidence is available, or decision-makers are faced with witness-against-witness testimony (Green, 2014;Spellman and Tenney, 2010). Unsurprisingly, this challenge has attracted considerable scholarly attention, spanning the fields of, inter alia, law, criminology, intelligence and psychology (De Vos, 2013;Haslam and Edmunds, 2013;Kelsall, 2009;McDermott, 2017;Stuart, 2008). To date, the overarching conclusion points to our limited ability to determine factually whether someone is telling the truth, and a lack of (expected or implied) superior skills in truthfulness assessments among justice professionals (Bond and DePaulo, 2006;DePaulo and Morris, 2004;Magnussen et al., 2010;Spellman and Tenney, 2010;Vrij et al., 2017). Legal decision-makers were found to be similarly susceptible to reliance on alleged deception cues invalidated by empirical research, and to misunderstand the basic science of human memory and witnessing (Denault and Dunbar, 2019), discrediting the long-standing assumptions that witness assessments are best guided by professionals' common sense (Porter et al., 2010;Vrij et al., 2011). Some studies point to reliance on superficial stereotypical indicators of trustworthiness as the basis of (at least initial) credibility judgments in both everyday and professional (police and courtroom) settings. For instance, higher credibility scores are afforded to witnesses who appear more emotional and more emotionally congruent (Bollingmo et al., 2009); or those who exhibit certain character traits: inter alia, extroversion, positivity, attractiveness or confidence (Brodsky et al., 2010;Nagle et al., 2010;Tenney et al., 2007). Credibility assessments were also found to differ depending on the modality of witness evidence presentation (video or transcript) (Lindholm, 2005), and on witness's age, ethnicity and speech style (Lindholm, 2005;Ruva and Bryant, 2004 extraneous factors on judicial decisions, exposing individual and cognitive biases playing a role (e.g., Danziger et al., 2011;Gill et al., 2018;Rehaag, 2012;Wistrich and Rachlinski, 2017).
Recent scientific efforts at improving this state of affairs have resulted in concrete recommendations for a change in practice: (forensic) psychologists have devised detailed guidelines for considering identification and recognition evidence, traumatised and vulnerable witnesses and eyewitness memory overall (Howe et al., 2018;Loftus, 2010;Nahari and Nisin, 2019;Schacter and Loftus, 2013;Volbert and Steller, 2014;Wells et al., 2020). Extensive guidelines and empirically-founded recommendations for assessing oral testimonies in cross-cultural settings have also been devised for asylum processing, facing similar challenges to accurate credibility and reliability determinations as ICCTs (Granhag et al., 2017;Gyulai et al., 2015;UNHCR, 2019). Some of this research has started to penetrate international courtrooms via internally devised guidelines (Aranburu, 2019;De Smet, 2019), expert witness testimonies (Appazov, 2016;Roberts and Redmayne, 2007;Rothe and Overton, 2010) or judicial deference to national practices and jurisprudence. 2 However, little is known regarding the approach taken by international justice professionals, not constrained to the judges, in assessing (insider) witness evidence. While we are aware of the general criteria comprising witness credibility and reliability at ICCTs (reviewed below), we are yet to examine which factors play a significant role in the assessments, and how decision-makers balance their concerns of witness motivations with the quality of the testimony provided. Hence, this study aims to answer the following research questions: (i) How does the quality of the testimony affect practitioners' assessments of the utility, credibility and reliability of an insider witness statement? (ii) How does the quality of the witness affect practitioners' assessments of the utility, credibility and reliability of an insider witness statement?
The first section of this paper provides a quick overview of witness credibility and reliability concepts and application in international criminal justice contexts, with a particular focus on insider witnesses. The second part presents the findings of an experimental vignette study with 160 international criminal justice practitioners tasked with assessing fictitious insider witness statements (vignettes). The final part discusses these findings in the context of current knowledge of international witness assessments.

Witness evaluation at the ICCTs, and the particular case of insider witnesses
No specific rules or procedures constrain witness assessments at ICCTs, whether performed by the judges or other fact-finders (Boas, 2001;Schmitt, 2021). Instead, the judges, in charge of factual and legal findings, follow the principle of 'free evaluation of evidence' coupled with a requirement for a reasoned opinion, which allows for a certain degree of scrutiny of their decision-making (Sorvatzioti, 2021). Comparatively little is known regarding these same processes among the analysts, investigators or lawyers, all likewise involved in making assessments of witness evidence, as their work practices are largely confidential.
To the extent that witness assessments can be observed externally in the judicial decisions, they focus on two core aspects: the credibility of the witness, containing references to witness objectivity, honesty and competence; and the reliability of the testimony, which commonly refers to the quality of the information provided (Chlevickaitėet al., 2020;Delisle, 1978;Schum and Morris, 2007). The distinction between witness credibility and reliability of the testimony that they provide is enshrined in the Rules of Procedure and Evidence of the International Criminal Court (ICC), governing the proceedings at the ICC (ICC, 2002, Rule 140(2)(b)). This categorisation is also supported by the jurisprudence of other ICCTs, 3 even though it has not always been used consistently to refer to the credibility of the witness and reliability of their evidence (Klamberg, 2013;Sluiter et al., 2013). Importantly, the jurisprudence makes clear that a credible witness might provide unreliable information and vice versa, suggesting a relative independence of the two concepts, which has not been examined empirically in this context. 4 As outlined in the introduction, assessments of insider witnesses hinge on the balance between the concerns regarding the credibility of the witness, and the reliability and/or importance of the information they can or do provide. Regarding reliability, assessments of insider witness testimony tend to focus on linkage evidence and information directly concerning the actions and responsibility of the accused (Anders, 2011;Fry, 2014); after all, that is the purpose of engaging insider witnesses in a case. Like regular witness testimonies, they are evaluated both in terms of internal quality: inter alia, level of detail, clarity, plausibility and consistency; and in relation to other evidence in the case: corroboration, contradiction and consistency with prior statements (Coyle, 2013;Judicial College, 2018;Ninth Circuit Jury Instructions Committee, 2010). Especially where other evidence is scarce, internal quality evaluations take the centre-stage, and are considered in the context of the witness's profile (accomplice or not) and indicia of credibility. 5 Credibility assessments at the ICCTs aim to uncover potential reasons for the witness to not be willing or able to provide an honest or objective account of the events in question. In the context of international crimes, potential reasons are abundant: ethnic, cultural, religious or other ties with one of the groups in conflict, victimisation, whether individual or group-based, trauma and memory concerns, and others (Bassin, 2006;Chlevickaitėet al., 2020). Certain aspects of credibility assessments are typically based on the evidence available to the decision-makers: e.g., documentation on victimisation, links to religious groups and memberships of certain organisations. However, they also commonly refer to more subjective indicators stemming from the impression the witness had given during testimony, such as the witness's performance or behaviour on the stand (Chlevickaitėet al., 2021;Kelsall, 2009).
It is hence clear that insider witnesses present the practitioners with some additional challenges to accurate credibility and reliability assessments, starting with questions about their motivation to testify (Nardini, 2006;Schrag, 2004;Whiting, 2009). Importantly, currently functioning ICCTs have limited powers to subpoena witnesses, relying on cooperation of, at times unfriendly, member states, and hence have significantly less leverage to induce witnesses to testify (Sluiter, 2009). Thus, acquiring cooperation of insider witnesses and subsequently determining whether such witnesses are driven by the wish to contribute to the truth-seeking, or by other, less desirable motives, is a major question for the fact-finders. 6 Overall, due to their profile, insider witnesses appear to be 'expected' to not tell the whole truth, especially where their own involvement is concerned. This is understandable considering the risks of testifying, especially that of self-incrimination (Harmon, 2009;Piccolo and Immordino, 2017;Scharf, 2004). Some unwillingness to be completely honest may also be due to their links with the accused or the organisations involved in the crimes, and be related to reasonable security concerns or intimidation (Sluiter, 2005;Trotter, 2012). On top of that, insider witnesses have also been found to harbour feelings of animosity towards the accused or other (former) members of the organisations, or, on the contrary, were motivated to exculpate others and, consequently, themselves (Roberts, 2012;Vukušić, 2022). In line with this complicated picture, case law analyses have revealed that the judges more often than not dismiss at least parts of insider witness testimonies (Chlevickaiteė t al., 2021).
Regarding reliability concerns, insider witnesses might have unique information that is difficult to corroborate or otherwise authenticate, especially where it comes from a complex or secretive organisation (Aranburu, 2009;Nardini, 2006;Piccolo and Immordino, 2017). This might cause particular issues where such information is highly relevant, as is often the case with linkage evidence (Del Ponte, 2006;Pamsm-Conteh, 2017). Moreover, insider witnesses at times turn into 'quasi-expert' witnesses, narrating the events in question and providing a frame of reference for subsequent testimonies (Vukušić, 2022: 198, Whiting, 2009. The centrality of their testimonies further increases the risks of relying on biased, or otherwise not truthful, evidence.
In sum, while the procedural setting of witness assessments at ICCTs is relatively bare, the practice is particularly complex and challenging. Considering the extensive reliance on witnesses, and particularly insider witnesses, in international criminal investigations and prosecutions, and given the recurrent credibility and reliability concerns in relation to their evidence, it is important to understand what the practice of insider witness assessments entails and how the decision-makers balance their concerns of witness credibility with the reliability of the evidence they are provided with. The next section presents the results of an online vignette experiment where these questions are explored in detail.

Research design
An online vignette experiment (factorial survey) was designed to assess the relative effects of factors indicating (i) witness quality (credibility) and (ii) information quality (reliability) on the practitioners' assessments of the credibility, reliability and utility of an insider witness statement. The study employed 2 (Witness quality: high/low, within subjects) × 2 (Information quality: high/low, within subjects) × 2 (Order of vignettes: A/B, between subjects) mixed factorial design. The conditions were distributed orthogonally, and the distribution was fully balanced, hence, no order effects were detected (Auspurg and Jäckle, 2017). Respondents were asked to rate the utility, credibility (4-item scale), reliability (4-item scale) on a Likert-type 1-10 scale (1-not at all useful/credible/etc. to 10-extremely useful/ credible/etc.). Additionally, multiple-item scales were used for witness quality-related (credibility) and information quality-related (reliability) indicators to capture the interpretation of the terms and check whether the interpretation was consistent across the participants (see Table 1 in the Results section). The survey also collected five respondent-level variables: gender, educational background, professional background, years of professional experience and the number of institutions practised at (see Table 4 in the Appendix).

Participants
All participants were international criminal justice practitioners, including investigators, analysts, lawyers, judges, judicial officers and others. The determining criteria for being included in the sample were (i) experience at one of international criminal courts and tribunals; (ii) direct experience in obtaining or assessing witness evidence. The following international and internationalised criminal courts were used to determine the sample: International Criminal Court (ICC), International Criminal Tribunal for the Former Yugoslavia (ICTY), International Criminal Tribunal for Rwanda (ICTR), their Residual Mechanism (RMICT), Special Court for Sierra Leone (SCSL) and Extraordinary Chambers in the Courts of Cambodia (ECCC). Respondents were invited to take part in the study with guarantees of anonymity, via personal and professional contacts, LinkedIn and referrals from other respondents (snowballing). No reward was promised or provided. Out of 213 individuals who had agreed to take part in the study, 160 completed the online survey in time (42.5% female, 52.5% male, 5% no answer/other). Respondents were of 46 nationalities, with a relatively high prevalence of individuals from the USA (12.5%), the UK (9.4%), France (9.4%) and Australia (8.8%), in line with the dominance of Western staff at the only fully functioning international criminal court, the ICC. 7 For an overview of participant characteristics, see Table 4 in the Appendix.

Materials
Vignettes. Text-based vignettes used in the research depicted excerpts of fictitious insider witness statements in a hypothetical situation. Each vignette included a description of the situation (context), basic witness information, explanation of the witness's involvement in the armed forces, and a potentially criminal incident. Each respondent was exposed to two vignettes; therefore, two comparable witness statements were created: one depicting a military insider witness, another one-a rebel group insider witness.
In order to ensure that the vignettes were true-to-life (Hughes and Huby, 2004), they were developed on the basis of authentic witness statements retrieved from the ICTY and the ICC evidence databases (ICTY, n.d.; ICC, n.d.). To further test the internal validity, realism and clarity (Taylor, 2006), the vignettes were piloted twice with four expert practitioners of international criminal law. During each pilot round draft vignette texts and accompanying questions were provided to the experts who were asked to evaluate the clarity, realism and whether experimental conditions were sufficiently concealed. The vignettes were revised based on their feedback, piloted the third time with a group of 10 researchers at the Netherlands Institute for the Study of Crime and Law Enforcement (NSCR), after which final amendments were implemented.
Witness quality factor contains manipulation of indicia of witness objectivity and honesty. High witness quality conditions contain: (i) no indicia of bias: no pre-existing personal relationship with the higher-level perpetrator, professional motivation to join the group; and indicia of (ii) self-deprecation: acknowledging own involvement and acknowledging the crimes committed. In low quality condition bias was introduced by the witness having a personal relationship with the commander, and a personal motivation to join the group (e.g., revenge). No self-deprecating details were included.
Information quality is focused on the quantity and verifiability of details, in line with the theory that truth-tellers would provide more verifiable detail than liars, and findings of ICCT case law analyses (Chlevickaitėet al., 2021;Nahari and Nisin, 2019). High information quality statements contain (i) verifiable details: precise dates, numbers of individuals involved, and names of members of the group and (ii) information specific to the crime: chain of command and decision-making. In low information quality condition these details are absent.
Importantly, certain aspects of the statements' quality are maintained at a relatively elevated level throughout all conditions for the assessors not to dismiss it outright. Hence, the following characteristics were kept stable across all conditions: coherence, extent of detail not directly related to the conduct of the superiors, description of contextual events, direct observation and insider status/role in the group.
Procedure. The study was designed with online survey software LimeSurvey. The participants were provided with a link to the study and a randomly assigned access token which determined to which condition they would be exposed. The respondents were first asked to complete informed consent form, after which they answered a set of demographic questions. The instructions for analysing the vignettes and the situation context followed. After completing the first steps, respondents were presented with Vignette A, alongside numerical and open questions (on the same page). The questions included explanations of the intended meanings of utility, credibility and reliability, to further enhance the clarity of the study. Respondents could progress to Vignette B only after answering the questions, and they were not allowed to return to Vignette A, to avoid revisions and direct comparisons.
Ethical considerations. This research has been approved by the Ethics Committee of Juridical and Criminological Research at VU Amsterdam on 19 November 2019. All participants were guaranteed anonymity and informed of their right to withdraw.
Data analysis. Statistical analyses were used to answer the research questions. First, a descriptive analysis was conducted for all the variables included in the study. Second, reliability and principal component analyses were conducted for multiple-item scales. Third, multiple linear regression with a correction for correlated observations within clusters (respondents) was conducted to identify the effects of the manipulated factors on the dependent variables. Reliability and principal component analyses were conducted with SPSS version 28.0.1.0. Linear regression analyses were conducted using STATA 17 due to its ability to conduct extensive interaction analyses and account for correlated standard errors using vce (cluster clustvar) command. Data, methods used in the analysis and code used to conduct the research will be made available to any researcher for the purposes of reproducing the results.

Credibility and reliability: interpretation and prediction
As explained above, both credibility and reliability concepts appear to be comprised of multiple items, and, considering the inconsistent use of the terms in the past, it was deemed important to check whether the respondents indeed interpret credibility as factors related to honesty or veracity (quality) of the witness, and reliability-to factors related to the quality of the information provided. Hence, witnessrelated criteria in the survey comprised four items: objectivity, trustworthiness, forthcomingness and credibility, while information-related criteria comprised clarity, detail, coherence and reliability. As Figures 1 and 2 demonstrate, the four items on both scales produced similar scores overall, while bivariate correlation analyses revealed moderate to strong positive correlations between the items. Figures 1 and 2 also show some variation across conditions. The independent Witness Quality factor is represented as W0/W1 (low/high), Information Quality as I0/I1 (low/high).
The means reported in Figures 1 and 2 give an indication of the overall trends in the respondents' evaluations of the vignettes. First, witness-related items tend to be, on average, scored lower than information-related items. Considering that in high Witness Quality conditions the respondents were not provided with any reason to consider the witness to have reasons to lie, besides their status as an insider in an armed group, this finding might indicate a negative frame of assessment, where a certain presumption of credibility issues appears to be present. There also seems to be little variation in the assessment of credibility item across conditions, as compared to the variation in the other three witnessrelated items. This might indicate that credibility as an overall concept is less well understood, or more difficult to precisely evaluate as compared to specific witness-related factors. A similar observation applies to information-related factors, where at least some variation is present across coherence (5.36-6.53), detail (3.73-5.04) and clarity (5.28-6.09), but not in reliability (4.63-4.94), which scores very similarly across the four conditions. Here again, evaluation of individual items might be more concrete and thus depend more on the changing conditions, though considering the differences in terms of type and nature of the details provided in the high/low information quality statements, such similarity warrants concern and is further addressed in the Discussion section.
Reliability analyses of witness-related items found Cronbach's α of .859, with no items suggested for deletion. This indicates, with relative certainty, that the four items represent the concept of 'witness quality' or credibility among international criminal law professionals relatively well, without excluding the possibility that additional items could result in a more fine-grained analysis. For further regression analyses, to avoid multi-collinearity, a principal component analysis (PCA) was conducted. It reduced the four components by extracting one component (KMO test .785, p < .001, all item communalities above .4), which in the regression analyses below is termed PCA_Witness_quality.
Reliability analysis of information-related items found Cronbach's alpha of .740, which could be improved to .826 with the deletion of reliability item. PCA analyses confirmed the low communality of reliability with the other three information-related items (communality of .211). Removing this item increased the % variance  explained by the first component from 53.823% to 67.592%, thus the reliability item was removed, and the component extracted based on the three items was used in further analyses, termed PCA_Info_quality. This shows that item 'reliability' might have been interpreted differently than solely information quality, which had been the intention. Removing this item thus conforms with the aims of this study and allows for more accurate analyses comparing the assessments of witness attributes versus the qualities of the statement.
Predicting statement assessment outcomes. Multiple linear regression with correction for clustered standard errors, required since each respondent assessed two vignettes, was conducted with PCA_Witness_quality and PCA_Info_quality as outcome variables. The analyses for the two outcomes were modelled separately. Initial analyses included an interaction term between the factors (Witness quality, Information quality), however, it turned out to be insignificant, and, after conducting post-hoc analyses (margins, marginplots, contrasts), interaction terms were dropped from the models.
Model 1 was found to be significant (F(10, 309) = 6.214, p < .000, R 2 = 12.7%), though the variance explained is relatively low. Respondents rated statements of high witness quality (β = .359, p < .000) and of high information quality (β = .247, p < .000) as more credible. Since the outcome variable is standardised (as a result of PCA analysis), the unstandardised coefficient reports the predicted change in the dependent variable in standard deviations (SDs). With this in mind, statements from high quality witnesses were, on average, found to be .359 SD more credible, while high information quality statements were, found to be .247 SD more credible. Hence, manipulation of both the quality of the witness and the quality of the information were related and had additive effects on the respondents' perception of witness credibility (Table 2).
Model 2 (F(10, 309) = 5.743, p < .000, R 2 = 12.4%) found comparable effects. Respondents rated statements of high witness quality (β = .334, p < .000) and of high information quality (β = .437, p < .000) as more reliable. Hence, similar to the assessments of credibility, it is both the witness-related factors and information-related factors that have a role in the assessment of statement reliability. Importantly, since PCA_Info_quality is extracted from three information-related variables (detail, coherence, clarity), we can see that conceptually unrelated factors of witness quality (bias, self-deprecation) appear to influence the evaluation of information provided.
Two respondent-level factors were significant for predicting PCA_Witness_quality: years of experience category 16-25 (β = −.664, p = .007), and number of institutions practised at (β = .148, p = .020). All categories of years of experience were significant for predicting PCA_Info_quality as well, ranging from β = −.604 to β = −.715 (at p < .05). Interestingly, increasing number of years of experience (length of experience) had negative effects on the assessments of witness credibility and witness reliability, while the number of institutions practised at (breadth of experience) had a positive effect on assessments of witness credibility.
Predicting the Utility outcome. Equivalent Model 3 was designed with the Utility outcome as well. In the survey, Utility score was collected by asking the respondents: 'Indicate how useful you consider this witness statement to be for further fact-finding in this situation (investigation/trial) on a scale from 1 (not at all useful) to 10 (extremely useful)'. Thus, Utility intended to capture both witness-and information-related assessments into one (Table 3). Model 3 (F(10, 309) = 7.081, p = .000, R 2 = 12.5%) was found to be significant, with a similar percentage of variation explained as Models 1 and 2. Both factors significantly predicted Utility scores at p < .001 significance level. The coefficients are comparatively higher as compared to Models 1 and 2, however, the outcome variable is not standardised with a mean of 6.6, thus higher numerical values were expected. Compared to Model 2 (outcome: PCA_Info_quality), the difference between high/low  Information quality factor had a larger effect within the sample (β = .631) rather than the difference in Witness quality factor (β = .494). Hence, in determining how useful insider witness statements would be for future fact-finding, the respondents appear to consider the factors related to the quality of the information to a larger extent than those related to the witness objectivity or honesty. This finding is consistent with the analyses of judicial insider witness assessments at ICCTs (Chlevickaitėet al., 2021). Three respondent-level variables were also significant. In terms of experience, there was a negative effect of having 16-25 years of practice (β = −1.001, p = .035), and a positive effect of the breadth of experience: number of institutions practised at (β = .235, p = .029), similar to findings in Models 1 and 2. Additionally, identifying as defence lawyer had a rather large negative effect (β = −1.454, p < .000), which might reflect on the type of statement presented, as all vignettes included some incriminatory, but little exculpatory information, but might also be indicative of a more stringent assessment conducted by defence professionals. It is particularly interesting considering the findings in Models 1 and 2, where the assessment outcomes did not significantly differ among respondents' backgrounds.

Discussion: credibility, reliability and their effects
The practice of fact-finding at international criminal courts and tribunals relies on accurate and effective assessments of insider witness evidence, which involves finding a balance between the relevance of the information that insiders can deliver, and their reasons or motivations for doing so. To date, the only information available to the outside world regarding how this task is undertaken has come from the reasoned opinions of the judiciary, and some limited reflections of ICCT practitioners (Del Ponte, 2006;McIntyre, 2014;Whiting, 2009). This study, particularly owing to its design and respondents' population, gives the first glimpse into the factors affecting practitioners' decision-making, in an experimental setting.
While judges commonly proclaim witnesses to be credible and/or reliable, this research shows that the terms are more multi-faceted and more inter-related than they might appear. Analyses of the credibility and reliability terms as compared to specific witness-and information-related factors (honesty, trustworthiness, forthcomingness, clarity, detail, coherence) showed that the broad concepts as defined in the Rules of Procedure and Evidence (ICC, 2002) and frequently mentioned in the jurisprudence, are insufficiently clear to be of use to individual decision-makers. This is indicated by a relative lack of variation in credibility and reliability scores, as compared to the more distinct assessments of the aforementioned specific factors. While according to jurisprudence reliability is comprised of information quality factors, principal component analyses found that reliability scores tended to be not in line with the scores assigned to detail, clarity and coherence. Besides indicating the clarity of the language used, this finding is also in line with decision-making research demonstrating that decisions are more consistent and more accurate if they are broken down into their component parts (Chang et al., 2018;Dunstall and Reeson, 2009;Kahneman et al., 2019).
The findings also support the assertion that assessments of the witness and of their evidence are interdependent, and the quality of one influences the assessment of the other. Both factors were found to significantly predict respondents' scores of witness quality, information quality and utility of the statement. This means that, for instance, the same statement was perceived as more or less detailed depending on whether a witness had demonstrated self-deprecation, or, on the contrary, bias. Likewise, witnesses who had provided more detail in their statements were also considered to be more credible, that is, more objective and honest. To an extent, this is not surprising. Out of the many reasons why a witness might provide less detail (inter alia, forgetting, lack of encoding due to attention lapse), the unwillingness to provide detail is a salient explanation, especially where the witness is an insider. Hence, while detail is formally a factor related to information quality, it casts a shadow over the character of the witness. However, the other side of this state of affairs: similar statements being accorded a higher or lower reliability score depending on perceived objectivity of the witness might be more problematic. Where the character or perceived character of the witness seeps into the determination of the extent of detail or other qualities of the information, there is a heightened risk of the halo effect (Cook et al., 2003) or confirmation bias (Rassin, 2020): the decision-maker's assessment of a witness as potentially dishonest interfering with objective assessment of the contents of the witness statement. While this study provides just some indication of it being the case, future studies should aim to dissect the phenomenon further.
Finally, the analyses show the relative effects of information quality as compared to witness quality factor. For two out of three outcomes (PCA_Info_quality and Utility), information quality factor had a comparatively larger effect on the assessors' scores. This indicates that decision-makers tend to focus on the information provided first, and the source of the information second. These findings support similar conclusions of judicial decision-making analyses (Chlevickaitėet al., 2021) and information studies considering the relationship between the source of the information and the perception of information quality (Brodsky et al., 2010;Irwin and Mandel, 2019;Smith et al., 2013). In addition, the findings show that practitioners tend to give lower scores to witness credibility overall as compared to reliability of the statement, across experimental conditions. Considering the lack of any biasing information in the high witness quality conditions, except for the witness's status as an insider in an armed group, this negative tendency likely indicates the practitioners' attitudes towards a witness being of insider profile.
Altogether, the findings of this study show that more specificity regarding the underlying components comprising the assessments might be a major step towards more consistent, reliable decision-making. Clarifying the concepts of credibility and reliability, and the underlying items evaluated to conclude whether a witness is credible, or a piece of information is reliable, is a recommended first step for practice. This shared understanding could improve communication among parties, between parties and the Chamber, and within teams as well, and increase the overall transparency of the currently rather obscure practices. It is also important to further question the underlying processes in legal decision-making, especially where the consequences of such decisions are this serious: understanding how practitioners assign value to particular aspects of witnesses and/or their testimony could inform further development of practice and its alignment with current scientific knowledge on deception detection and witness psychology.
This vignette experiment should be appreciated with several limitations in mind. Evidently, assessing a fictitious statement online differed from the assessments of real witness statements in several ways, even though significant attention to ecological validity was given. First, the statements were much shorter than what could be expected from a real witness statement, giving assessors less information to base their decisions on than what they were used to (this was also a point of feedback received from several respondents). Second, the statement was assessed on its own, with no additional evidence to compare it to, which would be a natural next step for a real-life situation and would help the decision-makers in making a complete assessment. Third, the assessors knew the study was an experiment, thus the stakes were lower and so was, perhaps, the attention and time given to the task (Hainmueller et al., 2015;McInroy and Beer, 2022). However, despite the limited external validity, this research is highly informative as it presented evidence in a real-world format while controlling the factors of interest, and thus provides important initial insight into the world of practitioner assessments of international witnesses.