Into the Eyes of the Referee: A Comparison of Elite and Sub-Elite Football Referees’ On-Field Visual Search Behaviour when making Foul Judgements

In foul decision-making by football referees, visual search is important for gathering task-specific information to determine whether a foul has occurred. Yet, little is known about the visual search behaviours underpinning excellent on-field decisions. The aim of this study was to examine the on-field visual search behaviour of elite and sub-elite football referees when calling a foul during a match. In doing so, we have also compared the accuracy and gaze behaviour for correct and incorrect calls. Elite and sub-elite referees (elite: N = 5, Mage ± SD = 29.8 ± 4.7yrs, Mexperience ± SD = 14.8 ± 3.7yrs; sub-elite: N = 9, Mage ± SD = 23.1 ± 1.6yrs, Mexperience ± SD = 8.4 ± 1.8yrs) officiated an actual football game while wearing a mobile eye-tracker, with on-field visual search behaviour compared between skill levels when calling a foul (Nelite = 66; Nsub−elite = 92). Results revealed that elite referees relied on a higher search rate (more fixations of shorter duration) compared to sub-elites, but with no differences in where they allocated their gaze, indicating that elites searched faster but did not necessarily direct gaze towards different locations. Correct decisions were associated with higher gaze entropy (i.e. less structure). In relying on more structured gaze patterns when making incorrect decisions, referees may fail to pick-up information specific to the foul situation. Referee development programmes might benefit by challenging the speed of information pickup but by avoiding pre-determined gaze patterns to improve the interpretation of fouls and increase the decision-making performance of referees.


Introduction
In 2010, international referee Howard Webb made a critical error when officiating the FIFA World Cup final. In the 27th minute, Dutch midfielder Nigel de Jong kicked his Spanish opponent Xabi Alonso with a karate-like kick directly in the chest The foul was hard and firm enough to warrant de Jong being sent off, but instead Webb awarded "only" a yellow card. Afterwards Webb admitted he had not seen the actual point of impact and therefore settled on a yellow card. 1 This example illustrates that visual information plays a crucial role in the foul decision-making of a referee. Foul decisions are the most important and frequent decisions made by a football referee, 2 whereby (s)he has to decide whether the action of a player is in violation of the laws of the game. 3 In the assessment of these foul situations, visual information is necessary to correctly assess the situation: if a referee cannot see what is happening, as In officiating, different models 4,5 show the importance of perception for the decision-making performance of a referee. For example, in the decision-making model for referees composed by MacMahon et al., 4 perception is the first step in the process, followed by categorisation and information integration, before finally a decision is made. Similarly, the model of Samuel et al., 5 which was conceptualised specifically for football referees, emphasises that both where to run and where to look are important elements in the execution of decisions by referees. Under that model perception entails multiple senses which can all influence a referee's performance, for example crowd noise. 6 Nonetheless, for a referee visual perception is crucial. Research has shown that sport-specific visual behaviours characterise elite performance, both in referees 7 and in athletes. 8 Enhanced gaze behaviour has been proposed to be a result of optimal information-reduction, with elite performers better able to distinguish relevant from irrelevant sources of information and to accordingly allocate gaze towards the most relevant sources of information. 9 We therefore focus on visual search behaviour in the decision-making process of football referees.
Visual search behaviour has been studied extensively in different sports, 7,8 with the findings generally showing elite performance to be characterised by differences in visual search behaviour that depend on the sport and even the specific scenario within that sport. 8 In interceptive sports, an elite athlete's visual search behaviour is often characterised by less fixations of longer duration, 10 whereas in invasive sports it is often more fixations of less duration. 11 However, even in invasive sports such as football (soccer), the characteristics of the elite player's gaze behaviour depend on the specific scenario. For instance, Williams and Davids 12 showed that experienced footballers use different visual search behaviours when anticipating one-versus-one (1v1) and three-versus-three (3v3) playing scenarios. In a 1v1 scenario, their gaze is characterised by less fixations of longer duration, whereas in the 3v3 scenario they use more fixations of shorter duration. Thus when seeking to understand expert performance within a given task, such as a referee's decision making, the key lesson is that behaviour must be examined when performing that actual task.
A growing body of literature has emerged examining the visual search behaviour of sports officials. MacMahon et al. 4 proposed that sports officials can be classified as sport monitors, reactors, or interceptors. Research has begun to address the behaviour of officials in each of these roles, for instance in gymnastics judges as sport monitors (e,g., Pizzera et al., 13) in cricket umpiring as reactors (e.g. Ramachandran et al. 14) and in football refereeing as interceptors (e.g. Spitz et al. 15) Here we focus on interceptors, and in particular on football referees. A recent literature review by Ziv et al. 7 revealed there is no consistent finding or behaviour that underpins an expert sports official's visual search behaviour. First, contrary findings are found between sports. The performance of expert officials in some invasive sports, such as rugby 16 and football, 15 is characterised by differences in gaze behaviour between skill levels, while other invasive sports such as icehockey 17 find the opposite results whereby officials' expert performance is not characterised by a difference in their gaze behaviour. Second, similarly as with athletes, contrasting findings are often found within sports, for instance some studies find differences in the gaze behaviour of elite and sub-elite football referees, 15 while other studies do not. 18,19 The differences found between studies further underpin the degree to which the specificity of a task can influence the visual search behaviour not only of athletes, but also for sports officials.
In addition to the importance of task specificity, the representativeness of the task is also vital when examining visual behaviour. 20,21 This is best illustrated in a study by Dicks, Button, and Davids, 20 who asked eight experienced football goalkeepers to stop penalties in five conditions: simulated conditions when responding to video footage either (1) verbally or (2) moving a joystick, and facing a live in-situ kicker by responding (3) verbally, (4) with a simplified movement (e.g. stepping to the side), or (5) with a full-body interceptive response. The results revealed that visual search behaviour differed between the tasks, with goalkeepers fixating on fewer locations in the in-situ situations. Considering that the task of the referee (e.g. judging whether the rules have been applied) differs to that of a football player (e.g. physically interacting within the game), then the above study implies that research findings are hard to generalise from players towards referees. Whereas Dicks et al. 20 showed the influence of representativeness across different tasks, Mann, Farrow, Shuttleworth and Hopwood 21 demonstrated differences within the same task but with different viewing perspectives. In their study, junior elite footballers watched video-clips of openplay situations filmed from both an aerial and a first-person perspective. Visual search behaviour differed between the two conditions, with players when viewing the aerial perspective making more fixations of shorter duration and more transitions between the ball and other attacking features. The results highlight the surprising way in which gaze behaviour can change according to the representativeness of a task. As a consequence, research findings of officials and referees based on the usage of overview or television video footage 17,18 may not necessarily generalise to on-field conditions because there the viewing perspective is first person rather than an aerial overview perspective.
Thus although research has shown elite footballers to be characterised by their visual search behaviour, we believe the context and representativeness of this behaviour limits the generalizability of those findings to the behaviour of referees. Additionally, whereas players couple visual information to the execution of a movement, a referee performs an almost purely cognitive task where they must acquire visual information largely for the purposes of decision making without a coupled action * . And given, for instance, the dual-pathway theory of vision, 22 which suggests that visual information may be neurologically processed differently depending on whether it is used for action or for perception, then there is further reason to believe that gaze behaviour when performing an action might differ to that when no action is required. Therefore, it is challenging to generalise findings regarding the visual search behaviour of football players, who are performing interactive actions, to the behaviour of referees who are more passively observing the play.
In football officiating, research has mainly focussed on the visual search behaviour of assistant referees when making offside decisions. 18,19 Recently however, Spitz et al. 15 examined the search behaviour of football referees when judging potential foul situations shown on video. In their study, elite and sub-elite referees viewed footage of potential-foul situations and judged whether a foul had occurred. Results showed elite referees to be more accurate in their decision-making, and while there was no difference in the search rates of the elite and sub-elite referees, the elite referees did spend more time than the others fixating on the contact-zone of the offender and less time on the non-contact-zone of the defender. This search behaviour may have enabled the elite referees to pick-up the most relevant information necessary for successfully assessing potential-foul situations. The elite referees' visual search behaviour provides a potentially useful insight into why they make better foul decisions, and offers promise as a means of improving the performance of developing referees.
Although Spitz et al.'s 15 study was insightful and well controlled, it remains unclear whether the findings when performing a video-simulated decision-making task would replicate those found on-field. First, in the video-simulation task, referees sat watching a computer screen rather than moving as they would on-field, which could potentially result in different visual search behaviours. 20 Additionally, since all videos were filmed from the fixed perspective of "an additional assistant referee left to the goal post", 15 the viewing perspective differs from the dynamic, on-field perspective of a referee, potentially altering the behaviour as well. 21 A video-based task offers excellent experimental control and convenience, but it may not sample the behaviours that would be expected if tested on-field.
The aim of this study was to examine the on-field visual search behaviour of football referees when calling a foul during a match. In doing so, we sought to compare the gaze behaviour observed for correct and incorrect calls. Elite and sub-elite level referees each adjudicated a football match while wearing mobile eye-tracking glasses. Their visual search behaviour was analysed for foul calls made during their respective matches. Based on the results of Spitz et al., 15 we expected the elite referees to use a similar search rate to the sub-elite referees, but with more time spent viewing the contact zone (CZ) between the two players. Moreover, if better decisions are characterised by unique gaze behaviours then we expected more time spent viewing the contact zones in correct decisions, and more time spent on the non-contact zones for incorrect decisions. We also performed more exploratory analyses of the gaze patters of the referees. An individual's pattern of gaze behaviour is a complex construct that changes both in time and space, and studies in sport have started to adopt more complex methods for understanding these patterns (e.g. fixation order 23 gaze entropy. 24) Therefore, additional explorative analyses were performed on other gaze parameters (without prior hypothesis) to provide a more thorough understanding of gaze while assessing foul situations.

Participants
Fourteen referees were recruited to participate: five elite (M age ± SD = 29.8 ± 4.7yrs, M experience ± SD = 14.8 ± 3.7yrs) and nine sub-elite referees (M age ± SD = 23.1 ± 1.6yrs, M experience ± SD = 8.4 ± 1.8yrs). We sought to test as many referees as we could within a single season, given that we were testing elite referees who would need to be available in their spare time, and we also had to convene football matches specifically for this study. The elite referees were amongst the top 30 rated referees in the Netherlands and all active at the highest national levels (i.e. the Dutch Premier League (N = 4) or First Division (N = 1)). The sub-elite referees were all part of the national referee talent trajectory and active in the highest regional divisions in the Netherlands. Using the method of Swann et al. 25 to validate the skill level of the participants, the elite referees can be classified as successful elites (score 9.3) and our sub-elites as competitive elites (score 4.7). All participants provided written informed consent prior to participation. The study was approved by the university's ethics committee.

Matches
Each referee officiated a pre-season or mid-season friendly football match convened for the research. The level of the match was matched to the referees' skill level so they officiated a match on the same level they would typically officiate. Given that the games were matched to the skill level of the referees, we checked whether the decision-making demand and pace of the games differed. The average number of fouls per minute did not differ between the matches (elite vs. sub-elite matches, M ± SD = 0.20 ± 0.09 vs. 0.18 ± 0.07 fouls per minute; t(12) = .57, p = .61, r = .16). Similarly, the pace of the game expressed as the number of passes per minute did not differ significantly between the games (M ± SD = 6.9 ± 1.5 vs. 6.1 ± 1.3; t(12) = 0.95, p = .36, r = .26; compared to ∼8.7 passes per minute in European first division matches; Bransen and Van Haaren. 26) Either neutral or club-assistant referees acted as assistant referees during the match. Matches were played in accordance with the FIFA Laws of the Game 3 with the exception of there being no restriction on the number of allowable substitutes and no additional time.

Apparatus
During the match, the referees wore mobile eye-tracking glasses (SMI 60 Hz ETG eye-tracking glasses; one sub-elite referee wore 30 Hz SMI ETGs due to unavailability of the 60 Hz glasses). Dark lenses were used to filter infrared sunlight. Footage from the glasses were recorded to a mobile phone (Samsung Galaxy S4 SmartPhone) worn in a hip belt on the referee's waist

Procedure
The eye-tracking glasses were fitted to the referee and calibrated using the system's 3-point calibration viewing specified locations typically within the referee change room immediately prior to the match. The SMI ETG reports a gaze position accuracy of 0.5°over all distances. 27 To account for any drift or movement in calibration, referees were instructed to intermittently look towards and name a specific object during the match (e.g. a corner flag or goal post), enabling re-calibration during data processing if necessary (M re−calibrations = 0.1/minute, i.e. 4-5 re-calibrations per 45-min). The glasses were worn during both halves of the match, but could be removed during the half-time break and re-calibrated immediately before the second-half.
After data collection, data were imported, processed, and exported using SMI BeGaze TM 3.6 software. All foul situations were tagged using event-logging software BORIS. 28 Visual search behaviour for the foul situations was manually analysed frame-by-frame using the SMI BeGaze TM 3.6 software.

Dependent variables
The data consisted of first-person perspective video footage with overlaying gaze data (see Figure 1 for an example of the video footage). A total of 915 match minutes out of a possible 1260 min (14 referees × 90 min) were recorded by the mobile eye-tracker: 353 min for the elite referees (M min/match ± SD = 70.6 ± 11.1) and 562 min for the subelite referees (M min/match ± SD = 62.4 ± 28.3). Complete recordings were not possible in some instances because of: (i) inclement weather conditions requiring the eye tracker to be removed (n = 3); (ii) disconnection of the cable between the eye tracker and mobile phone; or (iii) the recording device switched off due to the battery overheating (ii and iii together n = 12).

Foul calls
Foul calls were defined as those situations in which the referee blew the whistle to call a foul. Situations (N = 61) were excluded from analysis when the quality of the video or eye-tracking footage was too poor to reliably determine the direction of gaze throughout the foul situation (N elite = 36, whereby one elite referee officiated a game in full sunlight having a major impact on the recordings excluding 20 situations; N sub−elite = 25). In total, 159 foul situations were analysed, 66 by the elite and 92 by the subelite referees. For each situation, the moment of contact between the foul committer and the fouled player was determined.

Decision-making accuracy
The correct decision for each foul situation was determined by an independent panel of four referees (M age ± SD = 32.6 ± 3.4yrs, M experience ± SD = 17.4 ± 2.0yrs) separate to those The red circle in each of the panels represents (fictitious) overlaying gaze data, thus where the referee was looking at. Panel A shows a foul situation where the lower part of the body is involved in the infringement. The foul committer (the defender on the ground in this case) is seen making a tackle on the ball and ankle of the foul receiver. Panel B shows a foul situation were the upper part of the body is primarily involved in the infringement. The foul committer (facing the camera) is holding the arm of the foul receiver, preventing him from running towards the ball (not visible). Panels A1 and B1 show the original frame (without the point of gaze), whereas panels A2 and B2 show the same frame with the AOIs coloured: orange = foul committer's contact zone (FC-CZ), yellow = foul committer's non-contact zone (FC-NCZ), blue = foul receiver's contact zone (FR-CZ), green = foul receiver's non-contact zone (FR-NCZ). In panel A2 the (fictitious) gaze is allocated towards the foul committer's contact zone, whereas in panel B2 the (fictitious) gaze is allocated towards the foul receiver's contact zone.
who took part in this study. The four were employees of the Royal Dutch Football Association (KNVB) and active referees in the third highest league in the Netherlands. Each independently determined the correct decision for 80 of the 159 foul situations using an online platform (My-TPE; 's Hertogenbosch) so that each video was viewed by at least two referees. For the purposes of assessing decision-making accuracy, the fouls were cut into video fragments of approximately 7 s that excluded the gaze data overlay and wherever possible footage of the decision of the on-field referee. Panel members were allowed to watch the situations multiple times, pause the fragment at any time, and/or change the playback speed. They judged each situation as a foul or no-foul based on Law 12 of the FIFA law book.
Situations were considered a 'foul' if both referees indicated that the video clip contained a foul, and 'no foul' if both agreed the video clip contained no foul. The disciplinary action (e.g. no card, a yellow card or a red card) was excluded from the decision, considering that foul situations were randomised over games and presented in random order without the presentation of the contextual information which a referee might usually rely on to decide whether to award a card (e.g. a player's previous infringements). The ICC for the first pair of reviewers was.493 and for the second pair of reviewers was.253. The relatively low 29 ICC values reflect the subjective nature of the judgements and may reflect the challenges of using footage from a first person perspective. For situations without agreement (N = 44), a fifth referee (Age = 29yrs; Experience = 15yrs) active in the same league for 7 years watched the clip and independently made his own decision. In those cases, the majority decision was taken as the adjudicated outcome. In three instances, the secondary review was ambiguous (e.g. due to a poor view of the incident) and so those foul situations were excluded from further analysis. To measure decisionmaking accuracy, the decision of the referee was compared to that of the expert panel, with the decision-making accuracy calculated as the percentage of correct decisions for each referee.

Visual search behaviour
Visual search behaviour was assessed by measuring the search rate, fixation locations, and gaze entropy. A fixation was defined as when gaze was maintained on any location (stationary or moving) for a period of 120 ms or longer. Variables were analysed in two time periods: (i) the full clip; and (ii) the moment of contact clip (MoC-clip). The moment of contact clip included the second leading up to and the second after the moment of contact (replicating Spitz et al. 15) to provide a direct comparison with the analyses reported by Spitz et al. 15 Given that foul situations in real matches are nested within a wider match rich in contextual information, we chose to also analyse gaze in the three seconds leading up to the foul to uncover any differences in the referees' ability to pick-up on this contextual information that may be available in the lead-up to a foul situation. Therefore, the full clip included footage 3 s prior to contact until 1 s post-contact.
Search rate. The search rate was defined as the mean number of fixations per second over the full clip. The mean fixation duration was calculated for each full clip, in addition to the fixation duration at the moment of contact to examine any relationship between the length of the fixation at contact and decision-making accuracy. In addition, the number of fixation transitions from one area-of-interest to another was determined per second. For the MoC-clip, the timing of fixations was calculated by separating the clip into 200 ms epochs and counting the number of fixations that commenced in each 200 ms segment.
Fixation locations. The total time spent viewing each of six areas-of-interest (AOIs) was calculated in ms for each clip to measure the spatial distribution of gaze. We defined the foul committer (FC) as the player who committed the foul, and the foul receiver (FR) as the player on whom the foul was committed ( Figure 1). Next, the FC and FR were each separated into two AOIs: the CZ, defined as the half of the body involved in the (possible) infringement; 15 and the non-contact zone (NCZ), defined as the half of the body not involved in the infringement (see also Spitz et al. 15) Subsequently, there were six AOIs: (i) the contact-zone of the foul committer (FC-CZ); (ii) the non-contact zone of the foul committer (FC-NCZ); (iii) the contact-zone of the foul receiver (FR-CZ); (iv) the non-contact zone of the foul receiver (FR-NCZ); (v) the ball; and (vi) other locations (e.g. the assistant referee or open space). Depending on the position of the referee and the distance between the players, AOIs might overlap. In these cases, gaze was attributed to both AOIs (see also Spitz et al. 15) rather than guessing towards which of the two areas the referee was looking. This caused the total viewing time in some clips to be longer than the actual duration of the clip.
Gaze entropy. Gaze entropy was calculated to measure the degree of structure in the gaze behaviour, with a higher value reflecting a greater degree of randomness in the allocation of gaze. 24,30 Entropy was calculated in bits following Ellis and Stark 31 :

Statistical analyses
Decision-making accuracy was compared between the elite and sub-elite referees using a one-tailed independent samples t-test to test the hypothesis that elite referees would have higher accuracy than sub-elite referees.
A 2 (accuracy: correct, incorrect) × 2 (skill: elite, subelite) ANOVA with repeated measures on the first factor was used to analyse the search rate, the mean fixation duration, and the gaze entropy. An independent samples t-test (two-tailed) was conducted to analyse potential differences in the overall mean fixation duration and the fixation duration at the moment of contact. For the second before and after contact, a 2 (accuracy: correct, incorrect) × 36 (fixation transition: each possible transition) × 2 (skill: elite, subelite) repeated measures ANOVA was conducted, with repeated measures on the first two factors. The timing of the fixations was analysed using a 2 (accuracy: correct, incorrect) × 20 (timing fixation onset: blocks of 200 ms) × 2(skill: elite, sub-elite) repeated measures ANOVA with repeated measures on the first two factors. Fixation location was analysed using a 2 (accuracy: correct, incorrect) × 6 (AOI: FC-CZ, FC-NCZ, FR-CZ, FR-NCZ, ball, other) × 2 (skill: elite, sub-elite) repeated measure ANOVA with the first two factors as repeated measures. For the ANOVAs, post-hoc analyses were conducted by performing follow-up t-tests. The remainder of the analyses were exploratory, without specific hypotheses being made about the likely outcomes.
For all statistics, SPSS (IBM version 26) was used, with the significance level set at p < .05. We did not check for outliers, for we felt it was inappropriate to exclude participants given the sample sizes. The assumptions for parametric testing were met for all variables (e.g. normality and homogeneity of the variances). Effect sizes for the independent t-tests were expressed using Pearson's correlation coefficient r, whereas partial eta squared η 2 p was used for the repeated measures ANOVAs. For both measures, a value of 0.1 represented a small effect, 0.3 a medium effect, and 0.5 (or higher) a large effect. 32 Post hoc power was calculated in SPSS, using 0.8 as the threshold value for high powers.

Search rate
The repeated measures ANOVA for the search rate revealed a significant main effect of skill level (F(1,10) = 34.37, p ≤ .001, η 2 p = .78, power = .90; Figure 3A), with elite referees performing significantly more fixations per second (M elite ± SD = 2.2 ± 0.06) than the sub-elite referees (M sub−elite ± SD = 1.7 ± 0.04). The main effect of accuracy was not significant though did fall just short of significance (F(1,10) = 4.832, p = .053, η 2 p = .33, power = .86; M correct ± SD = 1.9 ± 0.06 per second, M incorrect ± SD = 2.1 ± 0.06) and there was no interaction between accuracy and skill level (F(1,10) = .41, p = .538, η 2 p = .04). For the fixation durations, similar results were found. There was a significant main effect for skill level (F(1,10) = 23.36, p ≤ .001, η 2 p = .70, power = .63), with the fixations of elite referees being shorter (M elite ± SD = 400 ± 18 ms) than those of the sub-elites (M sub−elite ± SD = 507 ± 12 ms; Figure 3B). There was again no main effect for accuracy though the p-value fell just short of significance (F(1,10) = 3.88, p = .077, η 2 p = .28, power = .76) and no interaction (F(1,10) = .07, p = .805, η 2 p = .01). The duration of the fixation at the moment of contact (M ± SD = 767 ± 154 ms) was longer than the duration of the remaining fixations (M ± SD = 510 ± 98 ms; t(13) = 8.46, p ≤ .001, r = .92, power = .32; Figure 3). Again, a significant main effect for skill level shows that elite referees had shorter fixations even for the fixation at the moment of contact (M elite ± SD = 667 ± 42 ms vs. M sub−elite ± SD = 815 ± 30 ms; F(1,10) = 8.22, p = .011, η 2 p = .45). There was no main effect of accuracy (F(1,10) = .01, p = .918, η 2 p = .00), and no interaction between accuracy and skill level (F(1,10) = .06, p = .806, η 2 p = .01). Because of the difference in search rate, we compared the frequency of each of the 36 possible fixation transitions (i.e. from one AOI to another, see the matrix in Figure 4 for all possible fixation transitions) to determine whether the higher search rate of the elite referees could be attributed to a higher frequency of specific fixation transitions, or rather because of a faster rate that generalised across all types of fixation transitions. A repeated measures  Figure 4), from the foul committer's contact zone towards the foul receiver's non-contact zone (FC-C to FR-NC), and from the ball towards both the foul committers' contact zone and foul receiver's non-contact zone (ball to FC-C and ball to FR-NC). Transitions from or to "other" areas were rare and almost always the least likely transitions to occur. A main effect for skill level confirmed the higher search rate of the elite referees (F(1,12) = 15.48, p = .002, η 2 p = .56, power = .90), but crucially, the interaction between transition and skill level was not significant (F(35,420) = 1.24 p = .171, η 2 p = .09, power = .34). The lack of any interaction suggests that the frequency of the different fixation transitions did not differ across the two skill groups, and indeed visual inspection of the pattern of fixation transitions in Figure 4 supports the idea that the most frequent transitions were comparable across the two skill levels.
To investigate when fixations occurred, a repeated measures ANOVA testing the timing of the fixations revealed a significant main effect of timing (F(19,190) = 6.14, p ≤ .001, η 2 p = .38, power = 1.00), with the search rate dropping in the final second prior to the moment of contact and staying low for the remainder of the clip ( Figure 5). The main effect of skill level was not significant though fell just short of significance (F(1,10) = 4.41, p = .062, η 2 p = .31, power = .38), with the trend reflecting the higher search rate already established for the elite referees. A significant interaction between accuracy and timing was found (F (19,190 Figure 6). The effect appears to be mainly driven by an elevated search rate for the elite referees in the first half of the trial when making incorrect decisions. However, given how rare it was for elite referees to make an incorrect decision (n = 9), there is every chance that this finding is a result of chance rather than reflecting a genuine effect of relevance (see also the higher error around the mean in Figure 5 when elite referees made incorrect decisions).

Fixation locations
The repeated measures ANOVA for the total time spent viewing each of the AOIs revealed a main effect for AOI (F(5,50) = 41.27, p ≤ .001, η 2 p = .81, power = 1.00; Figure 7). Referees spent more time viewing the foul receiver's contact zone (M ± SD = 786 ± 421 ms) than all other AOIs (p < .05) except the foul committer's contact zone (p = .076). The differences in viewing time between all AOIs were significant (p < .05), except between the ball and the foul receiver's non-contact zone (p = 1.000), the ball and the foul committer's contact zone (p = 0.167), and the foul committer's non-contact zone and other areas (p = 1.00). However, the time spent allocating gaze towards the areas of interest did not differ between the skill levels of the referees (F(1,10) = 2.06, p = .182, η 2 p = .17) or on account of the accuracy of the decision-making (F(1,10) = .15, p = .905, η 2 p = .001). There were no significant interaction effects (p≥.136).

Gaze entropy
The ANOVA revealed a main effect for accuracy (F(1,10) = 25.01, p ≤ .001, η 2 = .71, power = 1.00), with higher entropy for correct decisions (M correct ± SD = 4.2 ± 0.2 bits) when compared to incorrect decisions (M incorrect ± SD = 2.9 ± 0.2 bits). Gaze behaviour was more structured when making incorrect decisions. There was no main effect for skill level (F(1,10) = 0.01, p = .911, η 2 = .00) and no interaction effect between accuracy and skill level (F(1,10) = 3.76, p = .081, η 2 = .27, power = .72). To further investigate why gaze was less structured when making correct decisions, we determined the likelihood that each fixation transition would occur (i.e. from one AOI to another) and compared it for correct and incorrect decisions (Figure 8). Note that the likelihood of a particular transition occurring (measured as likelihood out of one) is different to the rate at which each fixation transition occurs (measured in transitions per second, see Figure 4) because the likelihood controls for any overall difference in search rate between conditions or groups. The results revealed a somewhat more even distribution of fixation transitions when making correct decisions, that is, gaze transitions were less structured when making correct decisions, demonstrated by a more even spread of yellow shading across the landscape of possible transitions in Figure 8 (with the exception of transitions to and from 'other' areas, which occurred only rarely). In support, some transitions rarely occurred when making incorrect decisions, with paired t-tests revealing a significantly lower likelihood of transitions (when compared to the correct decisions) from FR-CZ towards both FR-CZ (t(11) = 4.39, p ≤ .001, r = .80, power = .61) and FC-NCZ (t(11) = 3.23, p = .008, r = .70, power = .46), from FC-CZ towards FC-NCZ (t(11) = 3.51, p = .005, r = .73, power = .66), from ball towards ball (t(11) = 2.63, p = .023, r = .62, power = .21) and from other towards FR-CZ (t(11) = 2.93, p = .014, r = .66, power = .51). Although the results must be interpreted with caution given the unplanned nature of these follow-up comparisons, the findings help to explain the more structured gaze behaviour when making incorrect decisions.

Discussion
This study examined the visual search behaviour of elite and sub-elite football referees while making foul calls in actual matches. By using mobile eye-tracking, on-field gaze behaviour was recorded for over 150 called fouls. We hypothesised that elite referees would have better   decision-making accuracy and would be characterised by distinct gaze behaviour when compared to sub-elite referees, specifically spending more time viewing the critical contact zone where the potential foul occurred. The results revealed there to be no significant difference in the decision-making accuracy of the elite and sub-elite referees. However, the gaze behaviour of the elite referees was characterised by a higher search rate using more fixations of shorter duration. The higher search rate of the elite referees appeared to simply reflect more rapid search rather than a reliance on different sources of information (i.e. allocating gaze towards distinct AOIs) or on any distinct pattern of fixation transitions when compared to the sub-elite referees. Elite and sub-elite referees were both found to decrease their search rate approximately 1 s in advance of the foul situation, suggesting that referees of both skill levels successfully anticipate the upcoming event and modify their search behaviour. Gaze was also found to differ when making correct as opposed to incorrect decisions, with gaze being more structured (i.e. less random) when making incorrect decisions. There was no difference in where gaze was directed when making correct or incorrect decisions, suggesting that it is the ability to interpret what is seen and/or the ability to pick-up relevant contextual information in the lead up to contact that aids decision-making accuracy rather than the ability to direct gaze towards the correct location at contact.
Consistent with previous studies of elite referees, we hypothesised that the elite referees in our study would make more accurate decisions. 15,17,18 Surprisingly, our results did not replicate those findings. Although the decision-making accuracy of the elite referees tended to be higher than the sub-elite referees (87.8 vs. 76.1%), the effect failed to reach significance (p = .074 one-way). It is not uncommon to find similar decision-making accuracy between different skill-level referees, 16,33 but nonetheless the findings were not expected. In an effort to achieve genuine task representativeness, we chose to match the skill levels of the referees with the players (and indeed it would have been incredibly challenging to convince the elite referees to adjudicate lesser games and vice versa). The elite referees officiated games between elite teams and the sub-elite referees officiated games between sub-elite teams. Although we checked game characteristics (number of fouls and pace) between the skill levels, it might be that the elite referees are better decision makers, but that the foul situations in the elite games are more difficult to judge, for example due to the deceptiveness of the tackling players or even those being tackled. 34,35 It may be that this choice has underestimated the nature of the skill advantage in the foul assessment task. Furthermore, we isolated the foul situations and analysed them in random order, improving the generalisability within referees and between games but ignoring the potential influence of contextual information and the sequence of fouls during a game. An isolated situation might on itself not be considered by the referee panel as a foul, however within the wider context of the game it could be considered as a foul (e.g. if the same player had previously been warned by the referee). By excluding an evaluation of the disciplinary sanction taken by the referee (e.g. yellow card, red card, or no card) and focussing purely on whether a foul took place or not, we have tried to minimise the impact of these contextual influences on the results of our study, but for future studies it is recommended to explore the wider contextual information as a potential influence on the accuracy of foul judgements, as well as other game related factors, such as the time of the match, score and position of the referee.
With respect to gaze behaviour, we hypothesised that we would find differences in the visual search behaviour of elite and sub-elite referees. To assess the visual search behaviour, we first assessed the search rate. The results revealed a faster search rate for elite referees, characterised by more fixations of lesser duration. Although it is in line with some results in football players' visual search rate, 36 this too is in contrast to previous studies which generally find no difference between referees of different skill levels. 15,17,37 It seems unlikely that the results can be attributable to differences in the pace of the elite and sub-elite games. If anything, there was a tendency for there to be less passes-per-minute in the elite matches (elite vs. subelite = 3.6 ± 0.8 vs. 5.8 ± 3.7 passes per minute; p = .22), yet the elite referees still maintained a faster search rate. Instead, the elite referees seem to adapt their on-field gaze behaviour in a way that is not detected in off-field videobased tasks for referees, but has been shown in tasks for players before. The faster search rate of the elite referees could reflect faster pickup of visual information, which if confirmed could lead to the recommendation for referee development programmes to focus on increasing information pickup, for example by training visual search under increased time constraints. Furthermore, a faster search rate could potentially also lead to faster decisions; although decision making time was not included in the current study, it could be an aspect of the performance to include in future studies.
In addition to the search rate, we analysed the change in the rate of the fixations over time relative to the moment of contact. This analysis revealed a sharp reduction in the search rate in the second leading up to the foul that was maintained in the second directly after. Gaze during that period was likely to be directed towards the zone of contact. This finding reflects that found by Land, Mennie and Rusted, 38 who examined anticipatory gaze behaviour when making tea and showed that gaze becomes directed towards task-relevant areas of interest in the time period (0.5-0.7sec) leading up to when they are meaningful for execution. In the case of foul judgements by referees, the drop in search rate prior to the moment of contact likely reflects the ability of referees to anticipate the zone of contact before any contact has occurred.
The third aspect of gaze behaviour examining where referees looked while calling fouls revealed two key findings. First, our results replicate and confirm that, not surprisingly, referees direct their gaze towards the contact zones (both of the foul committer and foul receiver) irrespective of their skill level when calling a foul. It thus seems both elite and sub-elite referees attribute their gaze toward the most informative areas of interest, in line with the information reduction theory. 9 Although they differ in skill-level, the contact zone seem is the most intuitive source of information for the foul decision. Second, in contrast to the findings of Spitz et al., 15 the time spent by the elite referees directing gaze towards the contact zone of the foul receiver was no different to that of the sub-elite referees. Elite referees in the Spitz et al. 15 study spent more time viewing the contact zone of the attacker than the sub-elite referees (1330 vs. 1190 ms). Interestingly, our study found referees spend markedly less time directing their gaze towards the contact zones (elites 743 ± 70 ms and sub-elite 834 ± 224 ms). In the Spitz et al. 15 study, referees watched 20 consecutive video-recorded foul situations, with referees aware a potential foul would occur in each instance. This may have resulted in the participants in their study actively searching for the contact zone at an earlier point in time. In contrast, in the current study the on-field referees were less likely to be expecting a foul situation given that those situations occurred at random moments throughout the game. Although our referees should have been alert for foul situations within the match context, they were not primed to know that a potential foul was about to occur as would occur in a lab situation where they view a series of potential fouls. This result highlights a potential drawback of videobased tests because they may prime referees to be prepared for the call and provide referees with more (visual) information than they may acquire when making the same call on-field.
Finally, our analysis shows gaze entropy when making correct decisions to be different to that when the decisions were incorrect. Gaze entropy reflects the degree of structure in the gaze pattern, with high entropy reflecting less structure (i.e. more randomness). Our results showed correct decisions to be characterised by higher entropy, suggesting that gaze was less structured (i.e. more random) when making correct decisions and more structured (i.e. less random) for incorrect decisions. This might indicate that referees adjust their gaze to the unique characteristics of the situation when making correct decisions other than perhaps following a stereotypical repetitive pattern of gaze. In contrast, incorrect decisions might be more likely characterised by gaze following a more ordered and predictable structure where gaze is less likely to adapt to the constraints of the situation. An alternate explanation could be that decisions were more likely to be incorrect when there was high task complexity. Gaze entropy is likely to decrease in highly complex situations, 39 for instance, trained fighter pilots decrease their gaze entropyand their flight performancein more complex flight situations. 40 More difficult foul calls might involve higher task complexity that reduces gaze entropy.
An analysis of the likelihood of each fixation transition revealed two aspects. First, visual inspection of the pattern of fixation transitions supports the idea that the most frequent transitions were comparable across the two skill levels. This result suggests that the elite referees used a faster search rate that generalised across all possible fixation transitions, rather than their faster search rate being a result of more frequent transitions between specific areas of interest Second, potentially interesting information also related to the accuracy of the decision made. For correct decisions, referees were more inclined to make transitions from the contact zone (both of the foul committer and receiver) towards the foul committers' non-contact zone. This behaviour could reflect the referees directing their gaze towards the foul committer's face, after viewing contact, for example to evaluate the intention of the foul committer, and/or to pick up extra kinematic information about the whole body movement to better assess the careless, reckless or excessive character of the foul committer's action. In itself, this would be an interesting finding, as this search pattern could expose referees to susceptibility to deceptive behaviour. 34,35 Additionally, transitions from the contact zones towards the ball were frequent. This could suggest that the ball provides relevant information. Two main explanations can be given at this time. First, the ball could provide important information to assess whether the non-offending team might benefit from the foul, in other words, whether the team has an advantage. In at least some of those cases the referee might decide to play on to give an advantage to the attacking team rather than stopping the play by calling a foul. Second, in some situations, the location of contact might be insufficient to make an accurate decision, for example when the viewing angle or distance of the referee relative to the foul situation are sub-optimal. In these cases, visual information from the ball could be indicative of whether the foul committer contacted the ball. Although the results must be interpreted with caution given the unplanned nature of these comparisons, they provide testable hypotheses for future studies when measuring gaze behaviour and/or employing interviews to identify referees' explicit search strategies.
The current study was unique in that it analysed the visual search behaviour of elite football referees, on the field, while making decisions in real game situations, but this type of in-situ design is not without limitation. First, it is almost inevitable that some data will not be able to be included in the final analysis in eye-tracking studies conducted in a natural environment. Indeed that was the case in our study, with about 30% of our possible eye-tracking recording time excluded from analyses due to low-quality footage (we used 915 out of a possible 1260 min of recorded footage). This is an inevitable consequence given that we were recording outdoors in daylight and with referees running whilst wearing technical equipment. Lessons learned from this study, amongst others, are to avoid direct sunlight wherever possible and to plan extra breaks during the testing to check for mechanical defaults. Second, despite our best efforts to simulate a representative task by organising friendly football games, it could be argued that the games were still artificially organised games for research purposes and therefore incongruent with real competitive matches. Therefore, the next step would be to analyse gaze behaviour in actual competition games, although this evokes other challenges, not least the international rules, which state it is not allowed for the referee to wear (electronic) equipment other than to communicate with other game officials. Some form of exemption would be required to overcome this potential barrier. Given the challenges in recruiting elite referees and staging games, our sample size (14 participants in total) was smaller than what might have been customary in lab-based studies. The number of participants is in-line with or in excess of many existing and well-known and/ or cited in-situ studies of gaze behaviour. 18,[41][42][43][44][45] Moreover, the skill level of our elite group was extremely high, with each of the referees amongst the top-30 referees in the country. The large effect sizes we reported, combined with the significant differences and the absence of outliers in the groups, make us confident that our results would generalise to larger samples. Another limitation is that we were only able to confidently analyse the situations where the referee actually called a foul, and not situations where any potential foul was not called by the referee. The main reason is that the video footage was limited to that from the head-mounted camera in the eye-tracker, meaning that there were situations where the referee was not looking at the potential foul situations or that a foul might not have been called because the referee was not in a favourable position to view it. To consider potential fouls, external video recordings of the football game are desirable and recommended in future studies. Finally, although the main focus in this study was on eye movements, head movements are also associated with the acquisition of visual information (see for example Jordet et al. 46) Therefore, future work is recommended to extend the analysis of visual search with head movements as well.
In summary, the current study examined the on-field visual search behaviour of elite and sub-elite football referees, revealing elite referees to rely on a higher visual search rate when adjudicating games matched to their skill level. The gaze of the elite referees was characterised by more fixations of lesser duration, although both groups spent similar amounts of time watching the contact zones of the players involved. Considering the relatively equal capability of the elite and sub-elite referees to recognise and anticipate the contact zones, it might be suggested for development programmes to not only focus on where to allocate attention to gain sufficient visual information, but furthermore to focus on the interpretation of this information. Additionally, the correct decisions were characterised by higher entropy, suggesting that correct decisions may be more likely when referees adapt their gaze to the specific situation.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability
The data that support the findings of this study are available from the corresponding author, T. van Biemen, upon reasonable request Note * Referees do indeed decide to blow their whistle when calling a foul, but the act of moving a whistle to the mouth is not coupled to the movements of others.