View preference in urban environments

With people spending up to 90% of their time in indoor spaces, windows and the visual connection that they afford to the outside, can play an important role in ensuring physical and psychological well-being. This is particularly relevant in urban settings, a substantial part of our lives, whilst still being significantly understudied. What we know from other environments may not translate to cities, and there may be important differences between the expressed preferences of individuals and their behaviour. Therefore, this study sought to define suitable methods and metrics to measure view preference in urban environments. Participants were asked to observe urban views whilst three types of data were collected: subjective preference ratings; eye-tracking measures and verbal reasoning. We found that when views were preferred, the gaze of the observers was more exploratory, with a higher occurrence of fixations and number of saccades. In addition, participants tend to prefer the presence of people, well-maintained buildings and orderly presented colours. A new link was revealed between the degree of visual exploration and the preference rating of a visual scene. This characteristic pattern of oculomotor behaviour may guide the criteria for framing selected views and accordingly inform window design in buildings.


Introduction
We have become an indoor species 1 ; spending most of our lives enclosed within buildings, windows often provide our primary means of connection to the outside world. 2,3 This situation is exacerbated by various phenomena that are shifting the use patterns of our built spaces, including demographic trends (e.g. an increasingly ageing population) but also technological and socio-cultural developments. Information technology and the internet of things have enabled new livework practices, whilst the shortage of floor spaces in urban centres has encouraged the redefinition of typical office layouts, moving towards open workspaces (which might increase the distance between desks and windows) and hot desking. Since there are many parameters that may influence job productivity, 4 the possibility of choosing one's working location might render the presence of windows increasingly important. The social distancing practices imposed by the pandemic that swept the globe in 2020, where millions of people have been confined to their homes, is another testament of the importance of windows, and the view out of them, within a space.
Several studies have sought to understand the importance of window views. These studies have suggested that visual exposure to certain elements within a view can lead to improvements in human health, 5,6 task performance and working memory, 7,8 attention restoration, 9 visual comfort and glare sensation, [10][11][12][13][14][15] job satisfaction and an enhanced general perception of well-being. 16 Given that people spend up to 90% of their time inside buildings, 1 most often in cities, 17 research on visual preferences within the built environment can help to address several environmental and social challenges that cities present, particularly those relating to occupant health and well-being. 18 As such, whilst the restorative value of natural views has been well-documented, 19,20 there is a substantial need of research exploring visual preference in urban environments. [21][22][23] Research on view preferences has mostly focused on evaluating differences between natural and urban views [20][21][22][23][24] and on perceived restorativeness. 25,26 However, what we know about natural views only has limited applicability to urban environments. This presents a gap with direct impact on the response towards the current challenges of fast-growing urbanisation.
In the design of buildings, several rating tools feature window views amongst the criteria used in their assessment process. 27,28 The European Daylighting Standard EN 17037 recommends that the view be 'aesthetically pleasing', 29 with the aesthetic value depending upon building (e.g. complexity, maintenance, age) and environment-related phenomena (e.g. location, time, weather, nature, people). However, although numerous parameters such as view depth, content and dynamism have been suggested to appraise view evaluation, 30 no consistent methodology currently exists to measure the aesthetic preference of outdoor views, particularly in an urban environment. A holistic method to measure view evaluation would largely benefit the application of design principles, beyond just view content and quality, 10 to more comprehensively define view preference in our living and working settings. 31 Empirical research on subjective preferences presents, nevertheless, significant challenges. 32 This difficulty is rooted in the diversity of subjective preferences that contribute to the aesthetic experience. 33 These conditions often result in researchers having to adopt either a reductionist or a qualitative approach, 34 hence offering only a partial understanding of the complex underlying principles driving view preference.
Different physiological and psychological correlates of environmental preference, including brain activity 35 and internal cognitive processing mechanisms driving attention, 36,37 have been used in environmental perception studies. Research has found activation of specific brain regions in response to urban environments corresponding to effortful attention. 35 Visual attention is considered a precursor to choice in unfamiliar environments and is known to be directed to goalrelevant stimuli within the scene. 36 In this context, eye movements instantaneously reflect cognitive and decision-making processes 38 and could, therefore, represent a robust means to investigate visual preferences in urban environments in addition to more traditional subjective methods (see Section 2.1 for further details on the utility of this approach).
In response to these factors, this study was designed to explore the extent to which subjective preference ratings concur with the selection of, and the reasons for, preferred views, and if such ratings are associated with characteristic patterns of gaze behaviour. More specifically, to identify the key factors that influence view preference in an urban environment, this study aims to respond to the following questions: (1) What features or elements are preferred in urban views? (2) Are there characteristic patterns of gaze behaviour associated with visual preference in urban views?
Three types of data were gathered to gain insight into the mechanisms underlying viewers' preference 39 for urban scenes: subjective preference ratings of views; eye-tracking measures (ETMs) taken during scene viewing; and, qualitative reasoning for preferences. The research materials and methods used in this study are presented in the following section.

Study design
The research design addressed the 'comparability of questions' by simultaneously collecting quantitative and qualitative data on view preference. 40 An objective record of where the viewer is looking in an environment can be obtained from recording eye movements via ETMs. Within natural scene perception, humans selectively seek out information in the visual environment pertinent to perceptual, cognitive or behavioural goals. 41 Like many other species, humans use a saccade and fixate strategy to sample visual scenes. Saccades are ballistic eye movements around the image at a rate of two to three occurrences per second. Between saccades, the position of the eye is comparatively steady, and during these periods, known as fixations, visual information is obtained for processing by the brain. 42 Two of the earliest eye movement studies using pictures of natural scenes identified that fixations are clustered around areas of interest, 43,44 generally called centres of attention, and that eye-gaze patterns change depending on the viewer's task and demands of attention. 43,45 As such, we fixate on what we are giving attention to. 46 A positive correlation between the number of saccades and elicited interest is documented in the literature. 47 Studies focusing on consumer behaviour also contributed to describe the relationship between visual attention and preference choices. A marketing packaging study revealed that the time spent attending visually to a product predicts whether we buy it or not. 48 However, whilst these studies provide an important bridge between preferences and visual attention, they may not consistently apply to real-world urban views, where choices in the visual field might be more complex.
To study the relationship between gaze behaviour and preference whilst sampling urban scenes, eye position can be recorded via ETM techniques, graphically visualising the changes (e.g. heat maps (HMs)) or extracting numerical data for quantitative analysis. 45 HMs, also known as attention maps, allow visualising the general distribution of gaze points across an image, using colours to depict the time spent on different locations. 45 In our study, we used ETM techniques to measure several oculometric response variables: number of fixations; number of saccades; mean fixation duration; mean saccade amplitude; and, mean saccade duration.

Participants
The study received ethics approval and was compliant with the requirements of the General Data Protection Regulation. 49 Thirty-two participants were recruited from University students and staff, using convenience sampling, during April-May 2019. This sample size is consistent with previous studies on human perception of visual comfort. 11,50,51 As suggested in the literature, 52 samples of this size have allowed the detection of effects of large magnitude (e.g. Cohen's d40.8), based on established benchmarks. 53 A majority of participants identified themselves as female (65.6%). All participants had normal or corrected-to-normal visual acuity and, when questioned, reported no history of ocular ill health, nor any developmental or pathological visual disorder. Participants View preference in urban environments 3 were naive to the purpose of the experiment and provided written informed consent for their data to be collected and analysed. Since the literature suggests that each individual has a single dominant eye, 54 in order to record eye-tracking monocularly, each participant's dominant eye was determined using the 'holein-card' test. 55 Twenty-five participants (78.2%) were right-eye dominant.

Visual stimuli
Photo-based materials are widely used as stimuli in landscape preference studies 13,34,56 because they allow for control of confounding variables, hence reducing the effect of contextual factors (e.g. the weather), 57 although some research has suggested that participants can discount these factors when evaluating view quality. 31 As such, 40 images of realworld views depicting urban elements including buildings, streets and facades were used as the visual stimuli in this study. These were acquired from the open access McGill Calibrated Colour Image Database. 58 Given the study's focus within urban environments, images with prominent naturalistic elements, such as trees and water bodies, were deliberately excluded from this experiment. However, the sky and some naturalistic elements (e.g. branches) and urban landscaping features integrated into buildings (e.g. vegetated balconies) were inevitably present in some views. The use of pre-published photographs, selected from a larger data set, minimised researcher bias whilst issues of variability in image quality were removed by using pre-calibrated images. The details of the calibration process have been documented by the authors of the image database. 58

Experimental apparatus and procedure
Each participant was required to take part in an experiment during normal office hours. The procedure, lasting 60 minutes, required participants to provide preference ratings of urban scenes (preference rating task), whilst their eye movements were tracked, and to offer qualitative verbal reasoning for their evaluations (pile sorting task). For the preference rating task, Adobe Photoshop CS6 was used to linearly re-scale the 40 images to 1024 � 768 pixels. To enable qualitative appraisal of the images through pile sorting, the images were printed on matt photographic paper (101.6 � 152.4 mm).

Preference rating task
The visual stimuli were presented using the PsychoPy 2.0 software package 59,60 and displayed on a 20-inch calibrated Cathode Ray Tube (CRT) monitor (Iiyama Vision Master Pro 514; resolution 1024 � 768 pixels, background luminance 45 cd/m 2 ). The luminance response of the monitor was linearised with respect to the digital representation of the image, and 14-bit resolution was obtained with a Bitsþþ stimulus processor (CRS, Cambridge, UK). Images occupied the full extent of the monitor.
Monocular eye movements were recorded at 500 Hz with an Eyelink 1000 infrared eye tracker (SR Research Ltd, Ontario, Canada). Raw gaze positions were converted to degrees of visual angle using the data from a nine-point spatial calibration procedure carried out at the beginning of each block of evaluations. To reduce participant fatigue and maintain experimental accuracy, the eye tracker was recalibrated at approximately 10 minute intervals. 61 This also allowed the participant to have a break, and therefore to maximise the number of stimulus presentations with no blinks. Each participant completed one practice block followed by four experimental blocks, each featuring 10 images. Discounting the data from the practice block, 40 responses were gathered per participant.
A balanced Latin square design 52 was used to determine the sequence of blocks. Every block always featured the same 10 distinct images, and within each block the order of presentation of the 10 images was randomised for each participant. Each image was only presented once throughout the entire experiment. Participants were asked to observe the scene on the screen as if it were a real view and provide a rating of view preference, this being defined as 'how much you like the scene for whatever reason you may have'. This definition derives from previous studies on environmental preference, 24,62,63 whereas 'preference' was used in its noun form for expressing how much the participants 'liked' a view.
The participant sat in an electrically lit room (2.26 m Â 3.96 m) (Figure 1(a)) and the view position was secured using a chin and forehead rest with at a viewing distance of 0.655 m from the CRT monitor ( Figure 1(b)). At the onset of the experiment, each participant was asked to read a set of instructions on the screen, including the above definition of preference, the meaning of rating scales and an explanation of the experimental procedure. During the experiment, the researcher was present in the same room as the participant but remained out of the field of view.
A white central fixation dot on a uniform grey background, which functioned as a fixation trigger, started each block of evaluations. Participants fixated on the dot in the centre of the screen and were instructed to press any key on the keyboard to trigger the stimulus sequence. Each view was individually presented for 15 seconds, 20,25,[64][65][66] after which the scene was replaced automatically by an evaluation screen where participants could give their preference rating. Evaluations were given using a visual analogue scale (VAS), where participants had to indicate with a mouse click the point on the scale that they felt best represented how much they liked the scene. 67 The scale comprised a horizontal line anchored by two descriptors at each end, 'Least Preferred' and 'Most Preferred', each also marked by the values of 0 and 1 to offer a numerical reference, without any gradation marks in between. No slider was present on the VAS to avoid a central anchor bias. Once their rating had been given, participants could press any key on the keyboard to restart the presentation sequence.

Pile sorting task
To address the potential limitations of subjective evaluations, 31,68,69 and to identify characteristic features of views that could mediate individual preferences, a pile-sorting task was used. Participants were presented with photographic cards featuring the 40 views previously seen on the CRT monitor. They were then asked to sort their three most preferred and their three least preferred views from the randomly ordered pile and to discuss these selections, verbally explaining the reasons for their choices.

Data analysis
Subjective data were anonymised using a participant ID. All dependent variables, preference ratings and oculomotor metrics, were associated with an image ID for further analyses using SPSS Statistics version 25.0 (SPSS, Chicago, IL).

Preference rating and ETMs
Data derived from the VAS were tabulated against each photograph, creating a matrix of 32 Â 40 preference ratings.
Eye-tracking data files were imported into MATLAB 70 to analyse gaze position. An open-source software for event detection 71 was adopted to derive unbiased ETMs for quantitative analysis. HMs were generated by weighting the cumulative number of fixations with time and superimposing them on the images in MATLAB.

Statistical tests
For each ETM, a further 32 Â 40 matrix was created, mean averages were calculated, and individual measures were prepared for statistical analysis in SPSS. Each ETM was tested to ascertain whether they met the conditions for parametric analysis (normality, homogeneity of variances, linearity and independence). Natural logarithmic transformations were performed if data were nonnormally distributed and second-time testing of normality was conducted. Since the variable mean fixation duration (raw data and log-transformed) violated the assumptions for parametric analysis, non-parametric Mann-Whitney tests were used to analyse the raw data. All other responses -number of fixations, number of saccades, mean saccade amplitude and mean saccade duration -did not violate the conditions for parametric analysis; therefore, t-tests were performed on these data.
For the evaluations provided by participants, the experimental data were divided into two groups. Since the VAS yielded a preference rating between 0 and 1, the mean value of preference ratings across all participants was calculated, and this threshold value was set as a cut-off point. A continuous dependent variable (preference rating) was, therefore, converted into a binary variable (most preferred/least preferred) in order to measure the relationship between subjective preference and ETMs.
Two indicators were used for the statistical analysis of eye-tracking data: the significance of statistical tests at level of 0.05 and the effect size. To estimate the practical relevance of the differences detected in each ETM for 'most preferred' and 'least preferred' views for parametric t-tests, Cohen's d values of 0.2, 0.5 and 0.8 were used as benchmarks of, respectively, small, medium and large effect sizes. 53 The Pearson's r coefficient was used to estimate the magnitude of the differences detected in the non-parametric statistical tests, calculated from the standardised z scores. Values of r range between small (0.1 r50.3), medium (0.30 r50.50) and large ( ! 0.50) effects. 72,73 For d50.2 and r50.1, effects were considered of negligible magnitude and, therefore, not practically relevant.

Qualitative data
The three most preferred and three least preferred images were identified for each participant within the 32 Â 40 matrix of preference ratings. This step afforded a direct comparison between quantitative and qualitative data. The verbal reasoning data associated with each image selected in the pile-sorting task were coded as 'least preferred' and 'most preferred' using NVivo-12. 74 Using the 'word frequency count' feature in NVivo-12, a list of words most frequently used to describe the images was populated. This list was further subdivided into: (a) elements in the views (i.e. specific nouns) and (b) adjectives (descriptors). These were finally organised in word frequency tables.

Preference ratings
Results from the Shapiro-Wilk test (p ¼ 0.878) and visual inspection of histograms and Q-Q plots confirmed that the preference rating data were normally distributed. Tables 1a-1d feature the 40 images, together with their ID number, the mean preference rating given by the participants and their standard deviation. Across the entire sample of images, the 40 views had a mean preference rating of 0.45 (SD ¼ 0.115, range 0.182-0.669).
A test for internal consistency of view preference was performed to evaluate the reliability of the preference rating. 75 To do this, all measurements were randomly split into two halves and the mean preference rating for each of the 40 images was calculated based on each half sample. The two sets of 40 mean-per-setting scores were inter-correlated for preference using a Spearman-Brown reliability test. This resulted in a test coefficient of 0.906 76,77 suggesting high reliability.
A comparison was made between the most and least preferred views given in the preference rating task and the participants' pilesorting selection of most preferred and least preferred views (individual participant's selection of views is listed in Appendix 1). These two tasks produced very consistent results, with images i34, i36 and i28 being the three most frequently preferred views in both the preference rating and the pile-sorting task (Table 2). Similarly, images i1, i10 and i12 resulted, with similar frequencies, as the three least preferred images in both tasks.

Gaze behaviour characteristics
To test whether view preference was associated with a particular pattern of oculomotor response, the lowest and highest rated images (i.e. least preferred and most preferred) were analysed using ETMs to search for significant and practically relevant differences between views.

Heat maps and foci of attention
In order to identify foci of attention in the views, HMs were plotted for each image. The three most frequently selected images, for both the most preferred and the least preferred categories, across the entire data set are presented in Figure 2.
Viewing behaviour was observed to be different between the least and most preferred images. Figure 2(a) shows the HMs superimposed on the least preferred views. The yellow, green and blue colours represent -in descending order of occurrence -the amount of gaze points that were directed towards specific parts of each image. These maps are characterised by scattered hotspots. For example, in image i10, attention is drawn towards the various faces of people printed on the posters.
In Figure 2(b), presenting the HMs relative to the most preferred views, the majority of hotspots are formed towards the centre of the image and on textual information. No other centres of attention are evident in these most preferred images.
These differences in gaze behaviour seem to suggest that, in least preferred views, the gaze becomes more focused on specific, and more clearly defined, centres of attention.

Eye-tracking measures
ETMs provided us with the opportunity to determine if there was a characteristic oculomotor signature associated with subjective preference ratings. Figure 3 presents an illustration of the raw ETMs for view preference, comparing least and most preferred views (n ¼ 40), as explained in Section 2.5.2. Levene's test indicated equal variances between data groups (F ¼ 0.812, p ¼ 0.373). The t-tests showed that the number of fixations (Figure 3 Figure 2 Three most frequently selected images in the least (a) and most (b) preferred categories and heat maps of eye-tracking data recorded over 15 seconds inspection of Figure 3(c) suggests that this might be longer in least preferred views than in most preferred views. Statistical analysis of the data via a non-parametric Mann-Whitney test (as data were not normally distributed) revealed no statistically significant differences (U ¼ 126, p ¼ 0.153), although a practically relevant effect size of small magnitude could be detected (r ¼ �0.225). A similar result was obtained when comparing least and most preferred views for mean saccade duration (t(38) ¼ 0.876, p ¼ 0.387, d ¼ 0.307) (Figure 3(d)). Finally, no significant (p ¼ 0.866) or practically relevant (d ¼ 0.056) differences could be detected between least and most preferred views for mean saccade amplitude (t(38) ¼ 1.170) (Figure 3(e)).
A statistically significant inverse relationship was found between number of fixations and mean fixation duration (r ¼ �0.843, n ¼40, p50.001), as presented in The statistically significant differences in number of fixations in most preferred and least preferred views lead to postulate that most preferred views engender more gaze movements. A Pearson product-moment correlation was run to determine the relationship between number of fixations and preference rating for each view ( Figure 5). The correlation between these variables was not statistically significant (r ¼ 0.102, n ¼ 40, p ¼ 0.529). The three more frequently selected most preferred views (i36, i34, i28) scored highly in preference ratings but evoked a lower number of fixations. These three views also produced atypical verbal descriptors from the participants, as discussed in the following section.

Verbalisation of view preference
An analysis of the verbal reasoning data from the pile sorting task revealed that each view was associated with specific words. Although participants were only asked to explain the reasons for their three least and three most preferred views, different words appeared in the word frequency counts to identify the elements in the views (Table 3).
A comparison was drawn between the words used to identify elements in the most and least preferred views. In terms of key recognisable elements, participants identified in both groups the presence of: 'building', 'colours', 'graffiti', 'people' and 'window'. 'Car' and 'traffic' were never mentioned when describing the most preferred views but were frequently cited in the least preferred views (16 and 7 times, respectively). 'People' occurred more often in the most preferred views (18 times vs. 11 times in least preferred) whilst 'graffiti' was mostly associated with least preferred views (19 occurrences vs. 4 for most preferred). The presence of a 'building' in a view was mentioned almost equally (37 and 38 times) in the two groups. The word  'window' occurred both in the most (11 times) and in the least preferred (17 times) views. For example, when observing image i12, participants focused on the window and, in their verbal descriptions, suggested an interest in seeking more information from people's presence or activities inside the spaces, e.g. 'would have liked to see what was inside', 'you can't see any people in them' and 'there should have been curtains or lighting through the windows'. It is noteworthy to mention that participants often referred to the physical characteristics of buildings, such as the care of maintenance or lack thereof, which, subsequently, might have affected the overall preference rating of the view. With this respect, when coloured graffiti appeared on a wall, in the absence of a pattern or an order, their presence was associated with lack of maintenance, e.g. 'colours thrown on the wall, no pattern or regularity' (i26) and 'more of a tag than art work' (i23) when views were least preferred. Conversely, in most preferred views, an orderly presentation of colour on a wall was positively assessed by participants suggesting, for example, that 'with the graffiti at this distance you can see the order' (i28). The adjectives used to describe the least preferred and most preferred views were also collected. Table 4 lists the most frequently used descriptors to characterise the views in both groups. The least preferred views have been depicted with words such as 'unsafe', 'nothing', 'don't' and 'depressing', possibly entailing that the scenes did not afford potential exploration. Conversely, the words more often used to describe the most preferred views included 'clean', 'good' and 'relaxed', possibly implying desirability and interest in a view, or scenes that had more to offer to the participants.
Additionally, we looked at the words more frequently used to describe the three least preferred views (images i1, i10 and i12) and the three most preferred views (images i36, i34 and i28) based on mean preference rating in order to explore the presence of specific points of interest. These are presented in Table 5 and differ from the words listed in Table 3. In the three least preferred views, we found mentions of the 'picture' and the 'posters' (image i10), 11 and 8 times, respectively. Furthermore, the words 'dirty' and 'abandoned' (mentioned 6 and 4 times) were used to describe these scenes. For the most preferred views, the words 'natural', 'plants', 'flowers' and 'organic' were used 8, 6, 3 and 2 times, respectively. Even though the most prominent feature of image i28 is a row of bikes, participants noticed the 'organic' pattern on the wall. Similarly, in images i34 and i36, participants particularly noticed the presence of naturalistic elements in the scene, mentioning the 'plants' and 'flowers' outside the building and on the balcony. Figure 5 shows that three views (i36, i34, i28) had lower number of fixations despite being highly rated on preference. It must be reminded here that the sample of images used in this study featured urban scenes, yet the  View preference in urban environments 15 verbal descriptors evoked by the participants referred to naturalistic properties, as described in Section 3.3 and illustrated in Table 5.

Quantitative analysis without naturalistic elements
To study whether the presence of naturalistic elements had any impact on our study, these three images (i36, i34, i28) were removed from the sample of 40 views and the statistical analysis of ETMs data for view preference was run again. In fact, among all the 40 views of the data set, i36, i34 and i28 were the only images for which participants explicitly mentioned naturalistic features of views in their verbalisations.
With Using the reduced data set of 37 images, a Pearson correlation (two-tailed test) was also run to determine the relationship between the number of fixations and their preference rating ( Figure 6). A positive correlation of medium strength was observed between these variables (r ¼ 0.42, n ¼ 37, p ¼ 0.009). This indicates that when naturalistic elements were removed from the sample of urban views, the relationship between oculometric and preference data was stronger.

Discussion
Our initial main finding was related to the content in most preferred and least preferred views. Consistent with the results of previous eye-tracking studies, 45,78 we observed that a participant's overt visual attention was attracted by people's faces and their activities, e.g. when participants looked inside windows. In fact, since people might be powerful distractors, 79 they have often been excluded in sample images for naturalistic landscape preference studies. 57 However, in urban  Relationship between number of fixations and preference rating for reduced data set environments, the presence of people is relatively common and unavoidable, and it is usually associated with a positive evaluation of the scene. 80 Secondly, distinct features of the built environment captured attention. We found gaze hotspots on alphanumeric content (images i29, i28, i40), for example, signage and number plates, in line with previous studies where participants spent more time looking at the text than the picture part of advertisements 81 (HMs for all images are included in the Supplementary Material). This might reflect the fact that participants encode much more information (per fixation) from a pictorial than a textual representation and, therefore, need to spend more time reading a text to make sense of it. Buildings were, unsurprisingly, often mentioned in the verbalisation of preference ratings. However, their preference was often associated to their level of perceived care and maintenance. For example, when the presence of graffiti was seen as a result of vandalism, buildings were considered as less cared for. Views containing a variety of information, with colourful patterns and differentiated facades, were preferred more than those with less information due to the degree of complexity offered. Conversely, windows not affording a clear vision to the inside -that is, where further information could not be obtained about the environment behind an opening -led to reduced preference. Just observing a window in the wall might not add value to the visual preference, although based on other research, 82 this may still contribute to enhance the complexity of the view.
In order to investigate which ETMs better correspond to visual preference, we recorded a range of oculomotor parameters. We suggest that gaze measures, for example, the number of fixations and number of saccades, capture how gaze changes and can help obtain exploratory information in preferred views. Frequent shifts of gaze, characterised by a high number of saccades to gather visual cues, are usually associated with participants being less likely to skip over information on the preferred element in a view. 83 Our findings indicate that the most preferred urban views might be characterised by frequent exploratory gaze movements that seek to capture more information within a given period of time.
Visual representations (HMs) and oculomotor metrics were employed to interpret gaze behaviour with respect to view preference. Interpreting gaze maps is, however, a challenging task, as confirmed by previous research that reported difficulty in unambiguously interpreting eye-tracking results and offering explanations for gaze behaviour. 69 This is because, in preferred views, participants might not necessarily spend time fixating on one 'preferred' location in a scene for it to generate a gaze cluster. However, we find that by measuring gaze statistics and verbal reasoning of preferences in the same view we could draw plausible conclusions on what participants looked at (overtly attended to) and why. The triangulation of data led us towards a careful interpretation of fixation clusters in order to assess whether all fixations are related to the allocation of attention. There is, however, considerable evidence that there are other mechanisms at work, such as the central gaze bias, attraction towards certain elements in the view (e.g. people and text), and a degree of redundancy. 84 Indeed, our results show that gaze maps considered in isolation are inherently ambiguous tools to identify gaze characteristics in most and least preferred views.
We identified three views (i36, i34, i28) that exhibited divergent gaze behaviour, evoking fewer fixations despite being the most preferred views. Interestingly, participants' verbal reasonings for why they preferred these views were also atypical, in that they explicitly highlighted naturalistic contents. Urban environments that appear more nature-like are often rated more highly on View preference in urban environments 17 preference. 85,86 Previous eye-tracking studies comparing natural and urban views reported lower numbers of fixations for natural views. 20,87 Although various other urban views in our study were also rated highly on preference, only the three views where participants noted naturalistic contents evoked this gaze behaviour. Thus, even though higher preference for a view is not uniquely driven by the presence of naturalistic elements, presence of this content appears to engender a pattern of oculomotor behaviour characterised by fewer fixations. Further research is needed to address the reliability of this finding, but this result is not inconsistent with previous studies of window views in offices suggesting that the interest in a view is not necessarily divided along the urban/natural split. 10 One final point of discussion relates to the methodology of data collection. Fixation towards the centre of the monitor display enabled participants to derive an impression of the overall scene. In the most preferred views, we found a hotspot of attention located at the image centre, which was not explained by the verbal descriptions of preference. We could associate this result to a central gaze bias, 45 which can be linked to different phenomena, 88 such as the centre of screen may be the optimal location for early information processing; 89 participants started the exploration from the centre of the screen since this is where they were instructed to fixate at the onset of the experiment; the eye has a tendency to re-centre in its orbit when the head position is constrained by a chin rest. As such, it was perhaps inevitable to find central hotspots in the images, regardless of preference, under the procedure in which our data were collected.
It is important to mention that there may be other mediating factors driving gaze and preference, which could not be addressed in this study. For example, research has shown that image salience can play a role in view preference 20 and visual comfort. Le et al. 90 showed that unnatural properties of urban scene images (e.g. repetitive patterns) resulted in higher discomfort, as reflected by a relatively large haemodynamic response in the visual cortex. Earlier research 91,92 examining psychological benefits of outdoor views also demonstrated that non-straight surfaces, borders, shades, yellow-green rather than bluepurple contents, and scenes containing both high and low saturated colours, predicted higher preference.
Several models have been developed to explain attentional capture 93 using visual salience. Experimental evaluations of complex scenes showed that salience at fixated locations is significantly higher than at control locations, 94 and that more fixations occur within areas expected by the salience model than would occur by chance. 95 As stated in previous studies, 42,88 however, these correlations alone should not imply a causal link between image features and fixation location. It is also possible that the role of salience in attention is less important than top-down control 36 in a laboratory setting and becomes negligible in real-world environments. 84 Yet, whilst these studies have shown the impact of low-level image features on visual gaze behaviour, our use of a set of pre-calibrated images (from the McGill Calibrated Colour Image Database) precluded this possibility.

Conclusions
Using a mixed-method approach, view preference ratings, ETMs and verbal reasoning data were collected and analysed to: (1) identify what people prefer in urban views; (2) investigate whether characteristic patterns of oculomotor response are associated with visual preference in urban environments.
The main conclusions to be drawn from this controlled laboratory study are: In urban views, a higher preference may be moderated by the presence of people, colour and differentiated built elements that are well kept and maintained. The presence of green and naturalistic elements, however small, in urban views may lead to higher preference ratings and result in gaze behaviours characterised by relatively low numbers of longer fixations. Gaze exhibits a characteristic behaviour associated with preference in urban scenes. When views are rated as more preferred, the gaze appears to be more exploratory, with a higher number of fixations and saccades within a fixed time frame. Conversely, the lower the preference for a view, the more gaze dwells on specific hotspots within the scene.
Before these findings can be transferred to other contexts (e.g. building occupants observing real window views), it should be acknowledged that our results are based on a laboratory experiment, where participants viewed photographic scenes for 15 seconds whilst keeping a static head position. In a real building, both the position of the viewer and the content of the window view may change continuously. The dynamic quality of light, the content of the view (i.e. including but not limited to the seasonality, time of the day and other personal and contextual factors such as the presence of an attention attractor) or people attending to another visual task, may influence people's preference of the view, the time spent attending it or evoke different gaze behaviour. The effect of long-term exposure to a view, when participants might become increasingly familiar with the environment, is additional questions raised by this study that require further research. acknowledge insightful discussions with Dr. Adrian Marinescu and Dr. Milad Abou Dakka and express our gratitude to the participants for their valuable time. Thanks also to Dr. Frederick Kingdom for sharing a larger data set of urban images.

Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the University of Nottingham Faculty of Engineering Research Excellence PhD Scholarship awarded to the first author, Scholarship Ref Number: 17226. This funding source had no role in the design of this study or during its execution, analyses, interpretation of the data or decision to submit results.

Supplemental material
Supplemental material for this article is available online.

Appendix 1
Selection of most preferred and least preferred views in the preference rating (left) and pile sorting (right) tasks.