What do we know about rape myths and juror decision making?

This paper presents overwhelming evidence that prejudicial and false beliefs held by jurors about rape affect their evaluation of the evidence and their decision making in rape cases. The paper draws together for the first time the available evidence from both quantitative and qualitative studies (most of which are not found in law journals, but rather in scientific outlets, most commonly those focusing on experimental psychology). The quantitative research demonstrates that mock jurors’ scores on so-called ‘rape myth scales’ are significant predictors of their judgments about responsibility, blame and (most importantly) verdict. The qualitative research indicates that jurors frequently express problematic views about how ‘real’ rape victims would behave and what ‘real’ rape looks like during mock jury deliberations and that even those who score relatively low on abstract rape myth scales can express prejudicial beliefs when deliberating in a particular case. The studies vary in terms of their realism, but it is important to note that some of the studies reported here were highly realistic trial reconstructions, involving representative samples of jurors drawn from the community, live trial reconstructions, evidence-in-chief and cross-examination, accurate legal directions and deliberation in groups. The review concludes by examining the evidence on whether juror education—whether in the form of judicial directions or expert evidence—might be effective in addressing problematic attitudes.


Introduction
The decision making of juries in rape and other sexual offence cases 1 is an issue that has attracted a great deal of attention. There is a concern, in particular, that prejudicial beliefs and attitudes that jurors take into the deliberation room (sometimes referred to as rape myths) impact on their evaluation of evidence and determination of verdict (Conaghan and Russell, 2014). This concern is sometimes dismissed, pointing to a lack of evidence of any problems (Reece, 2013). This paper supplies the missing evidence. It draws together for the first time the findings of the relevant studies (most of which are not reported in law journals, but are found instead in scientific outlets, most commonly those focusing on experimental psychology).
The focus of the paper is not the extent to which the jury eligible population holds prejudicial attitudes towards rape victims (although this might become apparent as a side issue of the discussion). Rather it is to examine the way in which such attitudes might affect juror decision making. Two types of studies are relevant in this respect: quantitative and qualitative. Quantitative studies attempt to correlate participants' scores on a scale designed to measure their attitudes towards rape victims in the abstract (so-called rape myth attitude scales) with a dependent variable in a concrete case, such as verdict choice or witness credibility. Qualitative studies examine the way in which prejudicial attitudes towards rape victims arise in jury deliberations. This paper argues that there is overwhelming evidence that rape myths affect the way in which jurors evaluate evidence in rape cases. The quantitative research demonstrates that jurors' scores on rape myth attitude scales designed to measure prejudicial attitudes towards rape victims are significantly related to judgments in individual cases, both in terms of the degree of blame attributed to a rape victim and-more importantly-views about what the verdict should be. The qualitative research shows that false and prejudicial beliefs about rape victims are commonly expressed during jury deliberations and that even jurors who do not score highly on scales that measure attitudes in the abstract can express highly problematic views when discussing a concrete case.
Before proceeding, it needs to be noted that the focus of this review is limited to rape involving a female complainant 2 /victim and a male defendant 3 /perpetrator. The literature on attitudes towards male rape victims is far less extensive. 4 The research that does exist points in the same direction as the studies involving a female complainant/victim, but there is no doubt that this is an area that would benefit from further research.
In the remainder of the paper, the next section briefly examines the scales that have been designed to measure attitudes towards rape victims. The following section discusses the research methods that have been used in the studies presented here. The next sections present the findings of the quantitative studies and the findings of the qualitative studies. The final section reviews the limited body of research that has examined juror education (whether in the form of judicial direction or expert evidence) as a means of addressing false beliefs.

Attitudes towards rape victims and instruments to measure them
The focus of this paper is on false and prejudicial beliefs about rape and rape victims and how these might impact upon the way in which jurors approach the evidence in rape cases. Such beliefs can broadly be divided into four categories: 5 based on the IRMAS, but which uses simplified language. Typical statements include 'a lot of times, girls who say they were raped often led the guy on and then had regrets' and 'if a girl doesn't physically fight back, you can't really say it was rape'.
All of these scales might be criticised for a lack of subtlety. Many of the statements have too obvious a socially acceptable answer and therefore participants might not give honest responses. This is a point that can be made even of those scales that purport to address this issue, such as the SRMAS. However, this is mitigated to an extent by the use of seven-point scales, rather than binary responses, as scales of this nature are able to capture relatively low levels of support for the beliefs in question. And, as the studies reported here show, the scales do succeed in picking up differences of sufficient magnitude to enable meaningful statistical analysis to be conducted.
Various studies have explored (in the abstract) the prevalence of rape myths among the populationboth in the UK and further afield. 7 Studies have consistently found that men are more likely to endorse rape myths than women (Hockett et al., 2016) as are those with lower educational levels (Johnson and Beech, 2017: 28;Suarez andGadella, 2010: 2019). Studies have also found that there is a significant relationship between scores on rape myth scales and scores on other instruments. Particularly notable here is the relationship between rape myth scales and scores on scales measuring the extent to which people hold what have been termed 'just world beliefs' (Lerner, 1980), as this might explain why some people-women in particular-believe rape myths. Just world beliefs are beliefs that 'the world is a just place where good things happen to good people and bad things happen only to those who deserve them' (Lonsway and Fitzgerald, 1994: 135). It is the latter of these two concepts (measured by a section of the just world belief scale called 'JWB-other') that is especially closely related to holding rape myth beliefs (Hayes et al., 2013;Russell and Hand, 2017). It may be that this is because some rape myths (such as the belief that intoxicated victims are partly to blame if they are raped) perform the function of reassuring people that it is not going to happen to them, as they would not engage in the behaviour that is perceived as risky.

Study research methods
Before examining the relevant studies, it is necessary to say a little about the research methods that they use. In most jurisdictions where juries are involved in criminal cases, there are restrictions on undertaking research that involves asking jurors about their deliberations and verdict choices. So, for example, restrictions under s. 20D of the Juries Act 1974 in England and Wales (and s. 8 of the Contempt of Court Act 1981 in Scotland) specifically preclude asking jurors about 'statements made, opinions expressed, arguments advanced or votes cast by members of a jury in the course of their deliberations'. This would clearly prohibit asking them about their verdict choices, or about attitudes towards the complainant or defendant that they had expressed during deliberations. But even if research with jurors who have sat on real cases was possible, it would not be a very effective way of investigating the extent to which attitudes affect verdict choices. It would be entirely reliant on self-reporting verdict preferences and there would be no opportunity to hold the trial stimulus constant, in order to isolate the effect of pre-existing juror attitudes on verdict choices.
Because of these issues, all of the studies included in this review involved mock jurors. Mock juror studies simulate the experience of sitting on a jury by asking participants to read, listen to, or watch trial materials. The trial materials used are generally fictional and significantly abbreviated in comparison with a real criminal trial. Studies vary greatly in terms of the extent of their realism and this in turn affects generalisability-how far their findings are likely to apply to real juries, deliberating in actual criminal trials. In assessing realism, four issues in particular require consideration. 7. For a review of this literature, see Dinos et al. (2015).

How representative was the sample of mock jurors?
Academic mock jury studies sometimes use a convenience sample of students. This inevitably means that the profile of their 'mock jurors' is different to that of real jurors in terms of characteristics like age and education. Researchers have debated how much this matters in terms of the wider generalisability of the findings (cf. Bornstein et al., 2017: 25;Wiener et al., 2011: 472). In the present context, as scores on scales measuring rape-myth supporting attitudes tend to be significantly lower among those with higher educational levels, the use of a student sample is likely-if anything-to underestimate the extent to which rape myths might affect juror decision making. Age in itself, on the other hand, does not appear to have a significant relationship with attitudes in this context (Suarez andGadella, 2010: 2019).
How realistic were the trial stimulus materials?
To create as realistic an experience as possible, some mock jury studies show participants an audiovisual enactment of a trial (either a video or a live re-enactment). However, other studies have used less realistic methods. The most common of these are written vignettes (a short-usually single paragraph and no more than one page-summary of events); trial summaries (a longer written stimulus-although still in summary form) or trial transcripts (a written document that sets out the evidence in script form). Even where jurors are shown a video or live re-enactment of a trial, it is important to assess how closely this reflects the reality of a criminal trial (for example, in terms of the accuracy of any legal instructions provided).

Did mock jurors deliberate?
Real juries are required to deliberate as a group before returning a collective verdict. However, most mock jury studies do not include this element, which may be problematic as research has shown that jurors' initial views may shift during deliberation (Sandys and Dillehay, 1995;Ormston et al., 2019). In the present context, the views that an individual juror holds about rape might be affected by what other jurors say, and this is a shift that might happen in either direction. Discussion may well ameliorate problematic attitudes if jurors are challenged by other jurors, or harden them if other jurors share the same views (or persuade jurors who did not initially hold such views to adopt them).
How seriously did mock jurors engage with their 'role'?
Mock jurors are obviously aware that they are role-playing and that, as such, their decisions will not have 'real' consequences. That said, there is compelling evidence that mock jurors engage very seriously with their role (Ellison and Munro, 2010b;Ellsworth, 1989;Finch and Munro, 2008;Ormston et al., 2019). To increase the likelihood of mock jurors taking their task seriously, studies will ideally take as many steps as possible to maximise the solemnity of proceedings, such as using appropriate venues and directing mock jurors about their role in a similar way to real jurors.

Quantitative studies
The link between juror attitudes and judgments about blame in a particular scenario A substantial body of research has examined whether juror attitudes towards rape and rape victims held in the abstract predict the extent to which a particular victim and/or perpetrator are thought to be 'responsible' or 'at fault' for an incident. 8 These studies present participants with a scenario in which it is stated or made clear that a non-consensual sexual encounter took place and ask them about the extent 8. Studies involving victims under 16 have not been included here but see e.g. Tabak and Klettke (2014). to which the perpetrator and/or victim were to blame for what happened. This measure is then correlated with participants' scores on one of the scales designed to measure rape myth supporting attitudes (which, for brevity, will subsequently be referred to as RMA scales and/or RMA scores). At the time of writing, 29 studies in peer reviewed journals were identified (Table 1). All the studies were conducted in the US unless otherwise specified.
These studies are near unanimous in finding a significant relationship between scores on RMA scales and judgments about victim/perpetrator blame in a specific scenario. The only study that did not find a significant relationship between these two constructs was that of Weiner et al., where there were only 58 participants. 17 That RMA scores correlate with judgments about blame is perhaps not surprising. Rape myth scales measure attitudes relating to rape in the abstract and the studies in Table 1 demonstrate that these attitudes correlate with attitudes towards rape victims and perpetrators in concrete cases. Lonsway and Fitzgerald (1994: 148) describe this as 'simple common sense, as well as a certain circularity'. It does also have to be noted that the realism of these studies is not generally high. None of them used trial videos or live trial re-enactments and none included an element of group deliberation. That said, the finding that abstract attitudes do translate into differences in views about a particular case is an important one. In other words, two people can be presented with the exact same information and-depending on Yes 9. Significance here refers to statistical significance: that is, that differences reported in the experiment produced so-called p-values of at least 0.05, indicating that the probability of such a difference occurring in the experiment when there is no actual difference in reality is less than 5%. 10. A scale developed by the researchers. 11. For stranger and acquaintance rape only.
12. An RMA scale that only covers myths about the cases of rape: Cowan and Quinton (1997). 13. For victim responsibility only. 14. Victim responsibility only in date rape scenario. 15. For acquaintance rape only. 16. Ward's (1988) Attitudes Towards Rape Victims scale. 17. This study is discussed further below as it also did not demonstrate a significant relationship between RMA scores and judgments about guilt.
their score on an abstract rape myth scale-will have different views on the extent to which a victim or perpetrator of rape was to blame for what happened. Of more importance, however, is the manner in which those views might translate into verdict preferences and it is to that the review now turns.

The link between juror attitudes and judgments about guilt
A second body of research exists that has examined the relationship between RMA scores and decisions about guilt in a specific rape case or scenario. 18 A meta-analysis undertaken in 2015 identified nine such studies, eight of which reported a significant relationship between these two concepts (Dinos, 2015). There were, however, a substantial number of relevant studies that were not included in that analysiseither because they were not identified by the researchers or because they have been published subsequently. This analysis identified 28 relevant peer-reviewed studies 19 (Table 2). All but three of the 28 studies found a significant relationship between RMA scores and decisions about guilt. In other words, people presented with exactly the same information were significantly more or less likely to find a defendant guilty of rape depending on their score on an RMA scale.
There was more variation in the research methods used in the studies in Table 2 than in Table 1. Some were relatively realistic representations of the trial process, but others had methodological issues that limit the reliance that can be placed on their findings. This was true of all three of the studies where no significant relationship was found (Weiner et al., Wenger and Bornstein and Stichman et al.). Weiner et al.'s study is 30 years old. It involved only 58 participants, the smallest sample of all of the studies identified, and used a relatively outdated RMA scale (the RES). The Wenger and Bornstein study also involved a small sample (152 participants) and it used a three-page written summary of a sexual assault trial and, like Weiner et al., used an outdated RMA scale (Burt's). 24 Stichman et al.'s study was more realistic. It used a shortened version of a trial that was based on a real case, in which criminal justice professionals played the legal roles, with a police officer as the defendant and a rape crisis counsellor as the complainant. The mock trial was performed live to 69 of the 294 participants and was played as a video to the remainder. 25 It involved testimony from the complainant and defendant and written sworn statements from other witnesses. Participants were given written definitions of legal terms, including beyond reasonable doubt, before being asked to indicate whether the defendant was guilty. However, aside from its use of a wholly student sample (criminal justice and sociology students), the main difficulty with the study lies in its use of the Perceived Causes of Rape Scale (PCRS). The PCRS covers five possible causes of rape: victim precipitation (e.g. women who tease men); male dominance (e.g. a need to put women in their place); male sexuality (e.g. men who can't control their sex drives); societal causes (e.g. violence towards women in the movies); and male pathology (for example men's feelings of inferiority, inadequacy, and low self-esteem). It is one of the less subtle RMA scales and is relatively narrow in the range of rape myths it includes. Average scores on the scale were low, as was the standard deviation (indicating little variation in participant scores). It is 18. There is some overlap between this and the body of research that has linked RMA scores with responsibility, as some of the studies investigated both issues. 19. All bar one were published in peer-reviewed journals. The exception is experiment 2 in Willmott's thesis (item 27 in the table), which, at the time of writing, had not been reported in a peer-reviewed journal. However, his first experiment, which has been reported in a peer-reviewed journal, uses similar methods and analytical techniques. 20. Study 1 is not relevant for our purposes. 21. But only for the 'rape survivor myths' part of the scale. 22. Relationship only with one part of Burt's scale-the adversarial sex beliefs component. 23. This study is also reported in Willmott et al. (2018). 24. Despite these methodological issues it did still find that RMA score was significantly related to judgments about the complainant's credibility (measured by asking participants to rate the likelihood that the complainant was lying on a 7-point scale from 'very likely' to 'very unlikely'). 25. The mode of delivery made no difference to the results (Stichman et al., 2019: 13).  worth noting that the researchers also recorded the students' explanations for their verdicts and these did indicate some belief in rape myths (for example, comments that it was the complainant's fault if she was raped as she willingly let the defendant into her apartment). It is not just the studies that failed to show a significant relationship that have methodological issues. However, not all of the studies suffered from methodological weaknesses. The best of the studies in terms of realism were the two experiments undertaken by Willmott. The first involved showing a 25-minute trial video to 324 students. The trial was based on a real rape case and included pre-trial instructions and post-trial directions taken from the Judicial Studies Board Crown Court Benchbook. 26 It was recorded in a real courtroom, with professional actors playing the complainant and defendant and an experienced barrister as a judge. Advice on realism was provided by an expert panel that included the barrister, a CPS lawyer and three senior detectives from specialist sexual offence units. After watching the video, the mock jurors deliberated in groups of 12 (as would be the case in a real jury in England) for up to 90 minutes, before returning a verdict. Participants completed the AMMSA, one of the most up-todate RMA scales. One important limitation of experiment 1 is that it did not involve cross-examination-rather the complainant and defendant gave an unprompted account and the mock jurors were also given a summary of the case for the prosecution and defence. However, Willmott's second experiment rectified this. It used a 3-hour 30-minute live trial re-enactment, which included examination and cross-examination, closing speeches and the same legal directions as before. It also utilised a community sample drawn from the electoral register. The 100 mock jurors deliberated in nine groups of between 10 and 12 for up to two hours. Both of Willmott's studies took substantial steps to maximise realism and both found a significant relationship between individual verdict preferences and scores on the AMMSA pre-and postdeliberation.

Qualitative studies
The studies discussed so far have been exclusively quantitative. However, further insights into the way in which prejudicial attitudes might influence jury decision making can be gained from another body of literature, which has looked at the extent to, and manner in which, false assumptions about what rape looks like, and what genuine victims would do, arise during mock jury deliberations. 27 All of the studies involved a female complainant-no studies that have looked at mock juror deliberations in a case where the complainant is male were identified. Batchelder et al. (2004) undertook research with 151 community participants, who read a trial transcript of a rape case (the length of which is not specified, but it did include legal directions) in which a female complainant and male defendant, who were both students, met in a bar before going back to her room, where the incident took place. The mock jurors then deliberated in groups of 12 (bar two smaller juries of eight and seven) and deliberations were recorded and transcribed. Although the study was not aimed at investigating rape myths, 28 the researchers noted that they arose regularly during deliberations, including the view that a woman who has been raped would always show distress after the incident and the view that false allegations of rape are often made by women who regretted having sexual intercourse. Taylor and Joudo (2005) carried out research with 210 jury eligible members of the public. Participants watched an 85-minute live trial re-enactment based on a transcript from a real sexual assault case, using professional actors (although for brevity only the complainant gave evidence). 29 The scenario involved a male and female work colleague who both attended an office party and drank and danced, before going to another room and having sexual intercourse, which the complainant claimed happened without her consent (a claim the defendant disputed). Mock jurors deliberated in groups of between 10 and 12 for up to an hour. The researchers did not record the deliberations. Rather, after the juries had reached their verdicts, they held group discussions with the participants to ask them what they had spoken about. This was far from an ideal research method-participants may not have been able to recall accurately what went on during the deliberations or might have been reluctant to report it. Nonetheless, Taylor and Joudo (2005: 59) found that: 27. See also the focus group study undertaken by Gunby et al. (2012). 28. Its primary focus was to examine whether gender influences verdict choices-and the researchers found that it did. 29. The main purpose of the study was to investigate the impact of the complainant giving evidence in person compared to a live TV link-so the conditions were varied in accordance with this.
One of the key insights obtained during this study was the high degree to which many jurors believed many of the myths which surround rape in general. Acceptance of these myths mean that jurors have strong expectations about how a 'real' victim would behave before, during and after an alleged sexual assault and these expectations impact on their perceptions of the complainant's credibility.
The researchers pointed to a number of examples which arose regularly and, as they put it, 'worked against the complainant' (Taylor and Joudo, 2005: 59). These included the complainant's admissions that she did not scream or shout for help; that there was no evidence of injury; that she continued to work with the defendant after the incident; that she delayed reporting the incident for two weeks; and that she was not visibly upset when recounting the incident in court. Some of the mock jurors volunteered that they had advanced these arguments as a rationale for a not guilty verdict, although others reported that they disagreed and did not believe they were relevant in reaching a verdict.
The most significant studies of rape myths and jury deliberations, however, are the four studies undertaken by Vanessa Munro, the first with Emily Finch, the second and third with Louise Ellison and the fourth as part of the Scottish Jury Research.
The first study (Finch and Munro, 2006) 30 involved a scripted 75-minute mock rape trial (including legal directions crafted from the Judicial Studies Board Crown Court Benchbook) performed live in front of 168 mock jurors, who were recruited from the general public. The study aimed to examine the way in which an intoxicated rape complainant was viewed, so the scenario involved a complainant who was conscious and able to communicate but had trouble walking, and whose words were slurred. The defendant admitted the complainant was largely unresponsive as he undressed her. The jurors watched the trial reconstruction and then deliberated in 21 groups of eight for up to 90 minutes, without the presence of the researchers. The deliberations were video recorded. Each jury returned a verdict, and jurors also gave their individual views on what the verdict should be, both pre-and post-deliberation.
The second study (Ellison and Munro, 2009a, 2009b, 2009c, 2010a) 31 utilised similar research methods. This time it involved a 75-minute mock rape trial performed live in front of 233 mock jurors, who again were recruited from the general public. The scenario involved two colleagues who attended a work event before the defendant gave the complainant a lift home. The two spent a few hours together drinking a glass of wine and some coffee, before kissing. After that, their accounts diverged, with the complainant reporting that she was raped and the defendant claiming they engaged in consensual intercourse. The roles were played by a mixture of actors and barristers, and experienced legal professionals advised on the realism of the trial script. 32 The mock jurors then deliberated in groups of eight or nine (27 mock juries in total) for up to 90 minutes. Each jury returned a verdict, and jurors gave their individual views on what the verdict should be, both pre-and post-deliberation, but this time the jurors also completed a RMA questionnaire. 33 The third study (Ellison andMunro, 2013, 2015) 34 utilised the same research methods as studies A and B, but this time the 75-minute mock trial involved the complainant and defendant having been in a relationship that had broken down. The alleged rape occurred in the complainant's flat (that they used to share together) when the defendant called round to collect his television. A forensic examiner testified to the complainant having bruises and scratches of a sort consistent with the application of considerable force, but no internal bruising (although he also stated that this is not uncommon in rape cases). The 30. Subsequently 'study A'. 31. Subsequently 'study B'. 32. The legal directions they heard included-in some of the groups-directions designed to counter various rape myths. This aspect of the study is discussed under 'Addressing juror attitudes'. 33. The questionnaire was a tailored one designed specifically for the project, including questions relevant to the particular trial scenario that was utilised. 34. Subsequently 'study C'. study involved 216 mock jurors recruited from the general public who deliberated in 27 groups of eight for up to 90 minutes.
The final study was undertaken as part of the Scottish Jury Research . This study utilised similar research methods to studies B and C-this time a 75-minute trial video based heavily on study C but adapted to Scottish criminal procedure. It was the largest study of the four, involving 431 mock jurors who deliberated in 32 groups of either 12 or 15 for up to 90 minutes. 35 All four studies found that rape myth supportive attitudes arose frequently during deliberations. Space precludes an extensive discussion, but to give three examples: Lack of physical resistance. Many jurors expressed the belief that a genuine victim of rape would have fought back to the extent that she would have suffered substantial defensive injuries, including internal trauma. 36 Acquittal verdicts were frequently justified with reference to the absence of more serious or more extensive injuries Ellison and Munro, 2009b: 207;Finch and Munro, 2006: 314). Often female jurors expressed these views, asserting that if they had been in the complainant's situation, they would have struggled more forcefully Ellison and Munro, 2013: 314), insisting that their instinctive reaction would be to lash out aggressively and inflict injury on the defendant and expressing a confidence that they would be able to do this even where the assailant was stronger them themselves (Ellison andMunro, 2009b: 206, 2009c: 371). For example, one female juror observed 'I think it's instinct, if you've got a hand free you'd grab for his eyes or his face or anything' and another stated that 'I just can't understand why she wouldn't push him off or do anything, I cannot get my head around that' (Ellison and Munro, 2009c: 371). This is despite the jurors in this study (study B) having been directed that 'it is not a requirement for establishing the offence of rape that any force has been used' and neither is it 'necessary to show evidence of any kind of struggle in order to establish non-consent'. In study A, some jurors insisted that even a heavily intoxicated complainant would be expected to offer physical resistance. As one juror put it, 'a woman's got to cooperate with a man to be able to do it, to have intercourse, unless he thumps her or what, and he didn't-there was no bruising on her body anywhere. I would say she was probably drunk but at the same time she more or less consented' (Finch and Munro, 2006: 316).
Jurors did sometimes challenge those views by arguing that women facing sexual assault may freeze and be too fearful or shocked to fight back physically. Reference to freeze reactions were most common in the Scottish Jury Research, most likely echoing the language of a national campaign by Rape Crisis Scotland . However, in none of the studies did these views appear to cause others to revise their opinion Ellison and Munro, 2015: 218). While jurors seemed prepared to believe a freeze reaction might happen in a 'stranger-rape' context, they seemed less willing to accept it in the context of an acquaintance rape Ellison and Munro, 2010a: 790). As one juror put it, 'even in a paralysed state, isn't it the body's natural reaction to put up some kind of defence?' (Ellison and Munro, 2009b: 206).
False allegations. Jurors often expressed views about the prevalence of false allegations of rape, stating that they are routinely made Ellison andMunro 2010a: 795, 2013: 314). 37 In the Scottish Jury Research, for example, one juror commented that 'there [are] hundreds of cases coming out where women have lied about rape' . Some jurors constructed a narrative whereby the complainant was angry that the defendant did not wish to start (in study B) or resume (in study C and the Scottish Jury Research) a relationship and made a false rape allegation out of a desire for revenge. As one juror put it, 'a woman scorned is so true, no disrespect girls, it is though isn't it?' (Ellison and Munro,35. The study's primary purpose was to investigate the effect of the unique features of Scottish juries (three verdicts, 15 members and decision making by a simple majority). It involved 64 juries in total-the other 32 watched an assault trial. Its findings are reported in Ormston et al. (2019). One jury proceeded with 11 members as a juror became ill during deliberations. 36. In the Scottish Jury Research, statements to this effect were made by jurors in 28 of the 32 juries . 37. In the Scottish Jury Research, statements of this nature were made in 19 of the 32 juries (Chalmers et al., 2019). 2010a: 797). Whilst there were jurors who questioned how realistic it was that a woman would put herself through the challenges of a criminal investigation and trial merely to 'get one over on someone, or to get back at someone', these comments were often countered by jurors who insisted that 'it does happen', 'love makes people do crazy things' (Ellison and Munro, 2013: 314), 'some women do just use [the criminal courts] as a tool' and 'women can be vindictive' .
Uncontrollable male sexual urges. The male defendant was at times regarded as being at the mercy of his sexual drives, which may have led to him having a genuine (and reasonable) belief in consent (Ellison andMunro, 2009a: 297, 2010a: 793). The belief that the defendant might have been 'so passionate and into it' or 'so transfixed' that he would not be able to 'register what she was actually doing' was regularly expressed by mock jurors (Ellison and Munro, 2010a: 793). One, for example, stated that 'a woman can stop right up to the last second . . . a man cannot, he's just got to keep going, he's like a train, he's just got to keep going' (Ellison and Munro, 2010a: 793).
It is worth noting that, in study B, these attitudes were all more evident in the deliberations (where jurors were discussing the specific mock trial) than in the questionnaires they completed (where they were asked about their attitudes in the abstract) (Ellison and Munro, 2010a: 790-791, 793). This is an important finding. Even jurors who were found to have relatively low levels of 'rape myth acceptance' when they completed the questionnaire sometimes relied on problematic views, grounded in those same stereotypes, in the process of engaging in their deliberations about the trial they had just watched.

Addressing juror attitudes
There is a small body of research that has examined whether prejudicial attitudes can be countered by juror education-either in the form of directions from the trial judge or evidence given by an expert witness. Four studies in peer reviewed journals have examined this issue. The first of these, although the most recent, is the least realistic in terms of its simulation of the trial process. Klement et al. (2019, study 1) undertook an experiment with 97 US psychology students, in which they read a short, written scenario, followed by either no expert testimony, written expert testimony stating that 50-90% of rape allegations are false, or written expert testimony stating that false allegations are rare, at around 2-10% of all rape allegations. The presence or absence of the testimony had no effect on decisions about guilt. However, the experiment lacked realism in a number of respects, limiting the weight that can be attached to it. 38 The other three studies were more realistic in that they all involved a mock trial and juries who deliberated. Brekke and Borgida (1988) undertook two experiments, both involving US psychology students. In the first, 208 students listened to an audiotaped mock rape trial, based on a real trial (which varied in length from 65 to 102 minutes, depending on the experimental condition). They then gave individual verdicts, before deliberating in groups of between four and six for up to 30 minutes. Conditions were varied so that there was either no expert testimony, an expert who testified in general terms (that few women falsely accuse men of rape, rape is an under-reported crime, a large proportion of rapes involve acquaintances and it can be better for a women to submit rather than risk additional violence), or an expert who gave similar testimony but related it to the facts of the case and used a hypothetical example. Jurors exposed to the case-related testimony were significantly more likely to favour a guilty verdict pre-and post-deliberation than either those who heard no expert testimony or those who heard the standard testimony. There was no significant difference in the proportion of jurors who favoured a guilty verdict between the standard testimony and no testimony groups. Their second study involved 144 38. A second experiment in which the same information was provided by the prosecution or defence also found that it had no effect. The researchers also conducted two experiments involving a trial scenario with a male complainant and a female defendant, and here they did find some limited evidence that the testimony (which was adapted for a male rape scenario) had a positive effect. students and was identical to the first, except that all groups heard some form of expert testimony, either standard expert testimony or case-specific testimony. The case-specific testimony resulted in a significant increase in the number of guilty verdicts pre-deliberation. The relationship post-deliberation was not significant, but participants who had heard the case-related testimony were significantly more likely to find the complainant a credible witness. 39 Brekke and Borgida recorded the deliberations of their mock juries, although the analysis they undertook was quantitative only. They found that there was limited discussion of the expert testimony during the deliberations of those juries who heard it (an average of two minutes discussion of the 30 minutes total deliberation time). However, they also found that in the groups who had not heard the expert testimony, complainant resistance was a dominant theme during more than 15 per cent of the deliberation and discussion tended to be favourable to the defence. The juries who heard the casespecific testimony devoted, on average, less than two per cent of their time to discussing resistance and the discussion was generally favourable towards the complainant. Spanos et al. (1991) undertook a mock jury experiment with 219 US students, who were randomly assigned to one of 36 juries, ranging in size from four to eight. They listened to an audiotaped mock trial (involving an alleged rape in the complainant's flat after a date), followed by either no expert witness testimony, expert witness testimony or expert witness testimony and cross-examination. The expert witness gave evidence aimed at countering a number of different possible false beliefs (for example that women provoke rape by their appearance and false allegations are common). In the crossexamination, he agreed that there are documented cases of false allegations (as well as making a number of other concessions). 40 At the jury level, juries were significantly more likely to return guilty verdicts when they heard the expert testimony, but only in the condition with no cross-examination. The same effect was found in relation to jurors' individual verdicts (although only post-deliberation). The fact that the directions were ineffective when the expert was cross-examined might lead to the conclusion that in real life-where cross-examination would always form part of an adversarial trial-expert evidence is unlikely to be effective. However, as Ellison and Munro (2009c: 376) (Cutler et al., 1989). However, it does also have to be said that, although both studies did involve an element of deliberation, there are other elements of the research methodssuch as their use of audiotaped mock trials-that limits the reliance that can be placed on them.
The final-and most realistic-study was undertaken by Ellison and Munro (2009c). The study involved 216 jurors recruited from the general public who deliberated in 27 groups of eight. The main features of the research methods used have already been outlined, 41 but it is pertinent to add that there were nine experimental conditions. The main substance of the trial remained the same, but (a) the level of the complainant's physical resistance; (b) the delay between the incident and its report to the police by the complainant; and (c) the level of observable distress in the complainant's courtroom demeanour were varied. In addition, in one third of the trials a direction from the judge informed jurors about the feasibility of a complainant freezing during an attack, the frequency with which complainants delay reporting, or the different emotional reactions that victimisation might elicit. In another one third of the trials, the same information was provided by an expert called by the prosecution and cross-examined by the defence. In the remaining trials, no such guidance was provided.
39. The fact that the relationship between the expert testimony and verdict choices was not significant in the second experiment may simply be due to the smaller number of participants (only 144, compared to 208 in their first experiment). 40. For instance, during cross-examination the expert was forced to acknowledge that rape fantasies are not uncommon, some women derive sexual gratification from being tied up, some women develop unrealistic expectations of a relationship following an initial sexual encounter, and some such women may become distraught, angry and vindictive when they are rebuffed. 41. See 'Qualitative studies' (Ellison and Munro's study B).
The researchers used a primarily qualitative research methodology, examining the way in which the content of the deliberations differed between the groups who had received the educational guidance and those who had not, but they supplemented this with analysis of questionnaires completed by individual jurors post-deliberation. They found that the educational guidance on complainant demeanour and delayed reporting had a noticeable effect on the deliberations, but the guidance on lack of resistance did not.
In respect of complainant demeanour, the jurors who had been exposed to the educational guidance were less likely to make reference to the complainant's demeanour when giving evidence and-when the issue was raised-were more likely to offer explanations for what might account for the complainant's lack of emotionality and more inclined to comment that it was 'normal' that a victim of rape could respond in such a calm manner. This was supplemented by the post-deliberation questionnaires, where jurors in the expert testimony and judicial instruction conditions were less likely to say that it would have influenced their decision if the complainant had been more obviously distressed when giving her testimony.
In respect of delayed reporting, the jurors who had been exposed to the educational guidance were more likely to state that they were untroubled by the three-day delay in reporting the alleged rape. Jurors in the no-education condition were more likely to express the view that the complainant's response had undermined her credibility and described the delayed reporting as, variously, 'odd', 'strange' and 'disturbing'. This was also supported by the questionnaire data, where jurors in the non-guidance condition were more likely to agree that it would have made a difference to their deliberations if the complainant had reported the alleged assault to the police sooner.
In relation to lack of resistance, there was no discernible difference in either the deliberations or the questionnaire data between the way jurors responded to the complainant's claim to have frozen in shock after initially attempting to push the defendant away and telling him to leave her alone.
The researchers offer a number of possible explanations for the different findings in relation to the different types of rape myth. It may simply be that some beliefs-including those about injury and resistance-are so deeply entrenched that attempts to influence them through juror education will have limited effect (Cowan, 2019: 38;Temkin, 2011: 724). It may, however, be the case that the particular directions on lack of resistance utilised in the study were ineffective. In line with judicial guidance, the directions were general in nature-they did not use hypothetical examples and were not linked to the facts of the particular case. This explanation would be consistent with Brekke and Borgida's study, where expert testimony was only effective when it was case-related and used a hypothetical example.
It may also be the case that Brekke and Borgida's case-related testimony was more effective than the abstract testimony because it explained why some rape victims might react in a particular way (for example that freezing is a natural physiological response to danger or that there may be good reasons for not reporting a sexual offence immediately). This would also be consistent with experimental research into the effectiveness of judicial directions more generally, which suggests that jurors are more likely to follow instructions if it is explained why they are being given. 42 It is also important that judicial directions are simple and comprehensible. This is not to suggest that the direction in Ellison and Munro's study was not-but experimental research has shown that juror comprehension of judicial directions is low and that it can be considerably improved by using plain language and other methods of simplification (Chalmers and Leverick, 2018: 23-27). The direction also needs to be recalled by jurors and here written directions can play an important role (Chalmers and Leverick, 2018: 32-26).
Finally, it is possible that timing was an issue and that the introduction of expert testimony or judicial directions on the issue of non-resistance (as well, potentially, as on demeanour and delay) would have had a greater influence if it had preceded the complainant's testimony (Ellison, 2019: 275). There are two possible reasons for this. The first is that there is considerable research evidence that suggests that 42. See the survey of the relevant evidence in Leverick (2016: 581). jurors, rather than passively absorbing all the evidence as it is presented to them, instead settle on a 'story' that makes sense to them relatively early in the proceedings and then attempt to fit the remainder of the evidence into that narrative rather than evaluating it independently (Pennington and Hastie, 1992). Hearing the guidance before the complainant's testimony would mean that it would be salient in jurors' minds before they form their opinions about the complainant's credibility based on their own preconceptions and beliefs. The second is that directions given at the end of the trial might get lost as they will simply be one part of a lengthy summing up and detailed instruction on the relevant law which, in a real trial, will be given after several days of evidence (Temkin, 2011: 731). The point is strengthened by the research evidence on jury directions more generally, which has found that pre-instruction improves juror memory for and comprehension of jury directions (Chalmers and Leverick, 2018: 27-31). This might also point to one advantage of expert testimony over judicial directions given at the end of a trial-expert testimony might be more memorable as the expert will only be testifying about a single issue and jurors will be less likely to switch off and miss important information.

Conclusion
To summarise, there is overwhelming evidence that jurors take into the deliberation room false and prejudicial beliefs about what rape looks like and what genuine rape victims would do and that these beliefs affect attitudes and verdict choices in concrete cases. This evidence is both quantitative and qualitative.
The most important quantitative studies are those that have examined the link between scales designed to measure rape myth supporting attitudes and judgments about guilt. A total of 28 such studies were identified and all but three found a significant relationship between jurors' scores on RMA scales and their judgments about guilt in a specific rape case or scenario. All three of the studies that did not report a statistically significant relationship had methodological weaknesses that limit the reliance that can be placed on them. The 25 studies that did demonstrate a link varied in terms of the realism of their research methods. Many used students as their experimental participants, although that in itself is likely to underestimate the scale of the problem, given that higher scores on RMA scales are correlated with lower educational levels. Only two studies included an element of deliberation in the research design. However, those two studies both demonstrated that the link between juror attitudes and verdict choices persisted even after deliberation, suggesting that the deliberation process is not a 'magic bullet' in terms of curing problematic attitudes. These two studies were also the most realistic experiments and the fact that they both found a significant relationship between RMA scores and verdict choices is notable.
The qualitative studies paint a similar picture. They demonstrate that jurors regularly express in deliberations false beliefs about matters such as an absence of extensive injury or resistance indicating consent and rape allegations often being unfounded and easy to make. These beliefs have been expressed in highly realistic studies that have replicated real rape trials as closely as possible. Importantly, these beliefs were sometimes expressed by those who had not indicated that they held such beliefs when asked about them in the abstract, via a questionnaire.
All of this raises the spectre of whether the jury is an appropriate decision making body in rape and other sexual offence cases. It is argued here that it would be very premature to reach that conclusion. The conscientiousness with which mock jurors approach their deliberations (Ellison and Munro, 2010b;Ellsworth, 1989;Finch and Munro, 2008;Ormston et al., 2019) suggests that juries take their decision making role very seriously. Mock jury studies have also demonstrated that jurors bring with them relevant life experience and understanding that help them assess the plausibility of claims made by complainants and defendants in a way that a professional judge might find more difficult . Then there are, of course, the political and philosophical justifications for juries, which are rooted deeply within the legal culture of the UK jurisdictions, around citizen participation and the need to provide a check on state power (Houlder, 1997;Redmayne, 2006). That said, the volume of evidence presented in this paper linking rape myths and jury decision making should give us at the very least pause for thought.
Before suggesting anything as drastic as removing juries from criminal trials, however, it is worth considering whether the answer might lie in addressing problematic attitudes via juror education, such as trial judge direction or expert evidence. The studies reported here give some limited cause for optimism in this respect, with evidence that juror education can have an impact. It is clearly not as simplistic, however, as simply telling jurors that they are wrong and expecting them automatically to change their views. Some views may be more difficult to shift than others and consideration also needs to be given to the timing of any intervention and to its content. But this, it is argued here, is the way forward before more radical measures are considered, alongside well-funded research that is able to rigorously assess the effectiveness of such interventions.