Education in the Digital Age: Learning Experience in Virtual and Mixed Realities

In recent years Virtual Reality has been revitalized, having gained and lost popularity between the 1960s and 1990s, and is now widely used for entertainment purposes. However, Virtual Reality, along with Mixed Reality and Augmented Reality, has broader application possibilities, thanks to significant advances in technology and accessibility. In the current study, we examined the effectiveness of these new technologies for use in education. We found that learning in both virtual and mixed environments resulted in similar levels of performance to traditional learning. However, participants reported higher levels of engagement in both Virtual Reality and Mixed Reality conditions compared to the traditional learning condition, and higher levels of positive emotions in the Virtual Reality condition. No simulator sickness was found from using either headset, and both headsets scored similarly for system usability and user acceptance of the technology. Virtual Reality, however, did produce a higher sense of presence than Mixed Reality. Overall, the findings suggest that some benefits can be gained from using Virtual and Mixed Realities for education.

creating a positive academic climate (Seligman et al., 2009). Indeed, Olmos-Raya et al. (2018) found a significant effect of both positive emotion and high immersion on knowledge acquisition (for a discussion of the benefits of immersion for learning see Dede, 2009). Participants also reported that learning in VR was more engaging than the traditional approach in Bennie et al. (2019).
According to the constructivist learning theory (e.g. Duffy & Jonassen, 2013;Fosnot, 1996) learning is an active process, whereby learners construct knowledge for themselves (as opposed to passively receiving information). This theory builds on ideas and suggestions from Piaget's theory of cognitive development (e.g. Piaget, 1937Piaget, , 1950), Dewey's functional psychology (Dewey, 1938) and Vygotsky's social development theory (Vygotsky, 1980), which have been expanded and researched within cognitive psychology.
E-learning often promotes active learning through interactive technology tools, which constructivism would claim as beneficial. Kay (2011) developed an evaluation scale for Web-Based Learning Tools (WBLT) which focuses on three key constructs: learning, design, and engagement. Kay found an improvement in pre-vs. post-test scores on remembering, understanding, application, and analysis when using WBLT compared to standard methods of teaching. These categories were derived from the revised Bloom's Taxonomy (Bloom, 1956; see also Anderson et al., 2001). This is a model used to classify educational learning objectives based on cognitive principles and suggests that there is not simply one way in which information is processed and learnt, instead proposing a hierarchy of learning. This hierarchy consists of six stages of cognitive processing from simplest to most complex: remember, understand, apply, analyse, evaluate, and create.
Technologies can complement classroom teaching rather than replace it, with technologies like VR being particularly applicable for teaching practical tasks, and virtual laboratories can offer advantages over traditional methods, such as providing greater flexibility for conducting experiments (Valdez et al., 2015; for reviews see Albidewi & Tulb, 2014;Hilgarth, 2010;Welsh et al., 2003). Similarly, Pan et al. (2006) noted that the use of XR could help enhance, motivate and stimulate learners' understanding, as well as improve their overall mood. Simulations can be used to mimic real experiments which are important in education for many subjects (e.g., Davies, 2008), and understanding has been shown to be equivalent for both physical and virtual experiments (Zacharia & Olympiou, 2011). Experiments and practicals are often important interactive learning tools, and XRs can be beneficial when physical experiments are not practical. Indeed, interactivity and feedback enhance learning by promoting active rather than passive learning. Active learners are more engaged, motivated, and show better learning than passive learners (Benware & Deci, 1984), leading to better student outcomes (Chi & Wylie, 2014;Cui, 2013). Markant and Gureckis (2014) noted that according to one explanation active learning improves performance by enhancing cognitive processes related to motivation, attention, and engagement (see also Chi & Wylie, 2014). However, they theorised that the difference between active learning and passive learning comes from a hypothesis-dependent sampling bias which happens when an individual collects data to test their own hypotheses. This explanation is in line with constructivism, which also focuses on the importance of the interaction between experiences and ideas. Thus VR and MR, which enable active learning, allow one to experience and test such learner-generated hypotheses, which may be more effective compared to traditional passive learning methods (e.g. lectures, textbooks). Freeman et al. (2014) conduced a meta-analysis of 225 studies which compared student performance in undergraduate science, technology, engineering, and mathematics (STEM) courses under traditional lecturing versus active learning conditions. They found that students in classes with traditional lecturing performed worse in the examination and were 1.5 times more likely to fail than students in classes with active learning methods. Similarly, another metaanalysis also found a benefit of using computer simulations for STEM learning (D'Angelo et al., 2014). Technology, including XR, can be a useful tool to enhance learning and engagement in engineering, as discussed by Koretsky and Magana (2019), which is the subject area this study uses.
Active learning via the use of XR provides a variety of benefits. However, the costs and benefits of different XRs (VR vs. MR) might differ, as might user acceptance, engagement and experience. For example, VR presents a closed world, so participants might feel more concealed and private, and more comfortable with the simulated reality, which could, in turn, lead to higher user acceptance. Conversely, participants might feel more vulnerable due to being unaware of their real-life surroundings. Similarly, there are two alternatives for MR; participants might feel more comfortable because they can see their surroundings and what is going on around them, or less comfortable because of their awareness of people who can see them using the technology.
This user experience will be in part linked to the level of immersion and presence generated by the equipment (see Cheng & D'Angelo, 2018). Userexperience and user comfort are important considerations in this study, as even if a technology has a variety of benefits, if individuals are unwilling to use it due to discomfort, it will not be useful. As such, simulator sickness (i.e., motion sickness caused by simulated environments) is also an important factor that needs to be considered. Antoniou et al. (2017) discuss the importance of applying good design principles to XR use for education, rather than relying on the novelty effect. They recommend using all spatial and overall environmental degrees of freedom in order to maintain not only the immersion and engagement but also the educational impact, and they suggest incorporating game design principles.
Although previous work has shown the benefits of VR-based learning over traditional methods (for a review see Kavanagh et al., 2017), very little previous work has considered the benefits and costs of learning in MR environments. Accordingly, in the present work, we directly compared learning, user experience, engagement and acceptability in a learning context. Overall, based on constructivist principles, we would expect VR and MR to produce better learning outcomes than traditional methods. Furthermore, we would expect VR and MR to result in improved user experience (e.g., in terms of emotion, engagement, and usability) compared with traditional methods (Allcoat & von Mu¨hlenen, 2018). Finally, we would expect that because VR is fully immersive, it would lead to a higher sense of presence, but it might also lead to more simulator sickness, compared to MR.

Participants
Seventy-five participants (34 female, mean age 25 years) were recruited from the University of Warwick participant pool. Of these, 25 were undergraduate students, 12 postgraduate students, 23 PhD students, and 15 staff members. Fourteen participants had a background in Engineering, 18 in Science, 16 in Economics/Business, and 27 in other subject areas. All reported normal or corrected-to-normal vision and received £5 Amazon vouchers for their participation. Each gave informed written consent and the study was approved by the University of Warwick Humanities & Social Sciences Research Ethics Committee. Participants were randomly assigned to one of three conditions: traditional, VR, and MR and did not differ in terms of self-reported computer skills, gaming experience, or VR/MR headset experience (see Table 1). 1

Apparatus
The questionnaires and learning material in the traditional condition were presented on a 19" LCD computer screen (1920 x 1080 pixels, 60 Hz) via Microsoft PowerPoint and Qualtrics. Responses were collected through mouse and keyboard. The VR condition used an HTC Vive, which displays a 3D environment via two OLED displays (1080 x 1200 pixels per eye 90 Hz) with a field of view of 100 x 110 degrees. The MR condition used a Microsoft HoloLens, which projects 3D objects on a pair of translucent screens (1268 x 720 pixels per eye, 60 Hz) with a projection field of 30 Â 17 degrees (i.e., the 3D objects are seen overlaying the real world). The headsets were of similar weight (Vive 550g, HoloLens 557g) and navigation occurred using the standard handheld controllers. In the XR conditions, participants were tested individually due to the space requirements of the headsets. In the traditional condition, groups of up to eight participants were tested in a computer-equipped teaching lab, providing a traditional learning environment.

Learning Material
The learning material was based on a real classroom example covering solar panels for Engineering students. The VR and MR simulations took the material from an existing course that would usually be taught via PowerPoint slides and developed it to be presented in an immersive 3D environment. This topic focused on students' understanding of how different parameters can influence solar-power panel efficiency, such as light intensity, panel mismatching and tilt angle. The existing method of delivering these lectures is to use slides showing graphs describing the relationship between solar-power efficiency and other system parameters. The Traditional learning condition was implemented in the form of lecture slides adapted from the course, which participants could navigate through, as in a distance learning environment.
The simulations for VR and MR were created and programmed in Unity, with some models made with the Blender software package. Both the VR ( Figure 1a) and MR (Figure 1b) conditions allowed participants to interact with the application to experiment with how different characteristics (e.g., type of solar panel, light intensity, shading) impact power output. A video showing the VR and MR environments can be found at https://youtu.be/Jg3gsjVYrKM.
Participants were instructed to interact with the learning environment to find out how solar panels work, so they could answer questions in a subsequent test. They were told verbally how to use the equipment and how to navigate the learning environment. Buttons and sliders manipulated variables, and participants could select information boxes to obtain further details. Both the HTC Vive and Microsoft HoloLens have full head-tracking, so participants were able to look around at the 3D environment/objects at will.
The researchers worked closely with the course lecturer to ensure each condition was equivalent in terms of its content, material, and amount of information presented. Both the VR and MR conditions presented the same models and written information. No audio was used to reduce confounding variables.

Procedure and Design
There were three phases: pre-test, learning, and post-test (see Table 2). Participants had approximately 10 minutes for phase 1, at the end of which they put the headset on or were given the lecture slides for the traditional condition. They were given 10 minutes with the learning materials (as determined by piloting 2 ) before they filled in the questionnaires of the post-test. Some questionnaires were only presented once, whereas others were given before and after learning, leading to a mixed design with the between-subject factor Condition (traditional, VR, and MR) and the within-subject factor Test (pre and post).

Knowledge Test
Participant's knowledge of the learning material was assessed using eight questions constructed by the lecturer, based on those used in a real classroom course (see Online Appendix). Participants completed the test twice, once before and once after the learning phase. The questions were a mix of formats and tested different types of knowledge in accordance with Bloom's Taxonomy (Bloom, 1956): four multiple-choice questions tested 'remembering' aspects, three shortanswer questions and one calculation question focused on the 'understanding' and 'applying' aspects (see also Anderson et al., 2001). All questions were marked as correct or incorrect using a marking scheme provided by the lecturer.

User Experience
The Differential Emotions Scale (DES, Izard et al., 1974) was used to measure participants' emotions before and after engaging with the learning materials. The scale included ten emotion categories (interest, amusement, sadness, anger, fear, anxiety, disgust, contempt, surprise, and elatedness), each represented with three words (e.g. surprised, amazed, astonished). Participants indicated on a five-point scale (from "not at all" to "very strongly") the extent to which these adjectives corresponded to their current emotional state. Participants completed Kennedy et al.'s (1993) Simulator Sickness Questionnaire, which assesses to what extent individuals experience physical discomfort. This questionnaire has participants rate whether any of 16 symptoms (e.g., nausea, headache) are affecting them on a four-point scale (from "none" to "severe").
Student engagement was measured via the WBLT Evaluation Scale (Kay, 2011), developed specifically for evaluating the efficacy of web-based learning tools for education. It consists of 13 questions split into three sections which ask participants to rate on a five-point scale (from "strongly disagree" to "strongly agree") how well they could learn from the learning tools (learning), how well designed the tools were (design), and how engaging they found them (engagement).

Technology Evaluation
The quality of the learning materials was assessed via the Perceived Quality Scale (Pribeanu et al., 2017) specifically developed for the evaluation of ARbased learning applications. It consists of 18 questions which measure participants' perceptions of the quality of the learning materials on a five-point scale (from "strongly disagree" to "strongly agree"). Quality was further split into three different sub-scales: ergonomic quality (perceived learnability and ease-ofuse), learning quality (perceived efficiency and usefulness), and hedonic quality (cognitive absorption and perceived enjoyment). Minor changes to the wording of the questions were made to fit the context of the scenario.
The System Usability Scale (Brooke, 1996) consists of ten questions which measure the usability of the learning environment on a five-point scale (from "strongly disagree" to "strongly agree"). Example questions include: "I found the system unnecessarily complex", and "I felt very confident using the system".
The Unified Theory of Acceptance and Usage of Technology Questionnaire (Venkatesh et al., 2003;Akbar, 2013) was used to measure user acceptance and comfort with being in a 3D simulated environment. 23 of the 31 questions relevant to the scenario of the current study were used (e.g. "Using the system will enable me to accomplish tasks more quickly"). Ratings were provided on a seven-point scale (from "fully disagree" to "fully agree").
Finally, sense of presence is an aspect of immersion which can impact learners in 3D virtual worlds (Mount et al., 2009, see also McMahan, 2003 was measured with the Igroup Presence Questionnaire (Schubert et al., 2001). This scale has 14 questions (e.g., "I felt present in the virtual space") rated on a seven-point scale (from "fully disagree" to "fully agree").

Knowledge Test
Questions in the knowledge test were marked as correct or incorrect to give a total score of 0 to 8. Learning was represented by the difference between the knowledge pre-test and post-test scores (see Table 3). The data from five participants who scored very high in the pre-test (2SD above the mean, i.e., !75%) were subsequently removed as outliers from the learning data (Tukey, 1977).
A mixed-design ANOVA with the between-subject factor of Condition (traditional, virtual, mixed) and the within-subject factor of Test (pre, post) revealed a significant main effect of Test, F(1,67) ¼ 177.72, p ¼ .001, g p 2 ¼ .726; participants' knowledge improved on average by 2.5 points from pre-to post-test. There was a trend for more learning in the VR condition and less learning in the MR condition, however, this difference did not reach significance, 2-way interaction, F(2,67) ¼ 2.13, p ¼ .13, g p 2 ¼ .060. Further analysis of the knowledge data showed that the amount of learning did not depend on prior computer skills (correlation with learning across conditions: r ¼ -.08, p ¼ 51) or gaming skills (r ¼ .22, p ¼ .07), nor did it depend on the amount of previous headset experience (r ¼ .14, p ¼ 25).

Emotional and Physical Experience
The ten pre-and post-test DES questions (see Table 4) were analysed separately for positive emotions (interest, amusement, surprise, and elatedness) and negative emotions (sadness, anger, fear, anxiety, disgust and contempt). A mixeddesign ANOVA for the positive emotions with the between-subject factor Condition (traditional, VR, and MR) and the within-subject factors Test (pre and post) and Emotion (interest, amusement, surprise, and elatedness) revealed a significant three-way interaction, F(6, 216) ¼ 2.42, p ¼ .028, g p 2 ¼ .063. This interaction was further explored with four separate ANOVAs, one for each emotion category (p values of these follow-ups were adjusted using the Holm-Bonferroni method). Marginally significant two-way interactions were found for elatedness, F(2, 72) ¼ 4.46, p ¼ .06, g p 2 ¼ .110, and for surprise, F(2, 72) ¼ 3.70, p ¼ .09, g p 2 ¼ .93. Participants experienced an increase in elatedness in the VR condition, but a decrease in the MR and the traditional conditions (0.36, -0.32, and -0.24, respectively), and they experienced an increase in surprise (from preto post-test) in the VR and MR conditions, but not in the traditional condition (0.96, 0.76, and 0.20, respectively). The equivalent ANOVA with negative emotions showed no significant 3-way interaction, but there was a significant main effect of Test, F(1, 72) ¼ 11.17, p ¼ .001, g p 2 ¼ .134, due to a reduction in negative emotion between the pre-and post-test (1.6 vs 1.4, respectively). There was also a significant main effect of Condition, F(2, 72) ¼ 3.16, p ¼ .48, g p 2 ¼ .081, due to overall less negative emotion in the MR than in the VR and traditional condition (1.22 vs. 1.62 and 1.66, respectively).
Pre-and post-test scores for simulator sickness were low (overall average: 1.24), and their internal reliability estimates based on Cronbach's a (see Table 2) were acceptable for measures used in the social sciences (Kline, 2013). A 3 x 2 mixed ANOVA with the factors Condition (traditional, VR, and MR) and Test (pre and post) revealed no significant effects. Hence, simulator sickness did not depend on Condition (1.30, 1.20, and 1.21, respectively), and it did not increase from before to after learning (1.23 vs. 1.24, respectively).

Learning Experience
The internal reliability estimates for the WBLT Evaluation Scale with the three constructs (learning, design, and engagement) were acceptable (Cronbach's a: 0.87, 0.87 and 0.93, respectively), and similar to the ones reported by Kay (2011). Figure 2 presents the results for the WBLT evaluation scale with the three constructs (learning, design, and engagement), separately for the three learning conditions. Three separate one-way ANOVA's, one for each construct, 3.4, respectively). Similar patterns were found for the design and learning constructs, however, they did not reach significance.

Technology Evaluation
Technology evaluation questionnaires were only given to participants in the two XR conditions. The Cronbach's a for the perceived quality scale and its subscales were acceptable (all > 0.91). Table 5 reports the average scores for the three perceived quality dimensions for the VR and the MR applications. There were no significant differences between the two XR conditions. Both systems were generally rated positively on the system usability scale (3.72 in VR and 3.64 in MR, Cronbach's a 0.90), and on the six UTAUT sub-scales (see Table 5). In the Igroup Presence questionnaire, VR participants reported a significantly higher sense of presence than MR participants (4.34 vs. 3.78, respectively), t (48) ¼ 2.24, p ¼ .030, d ¼ 0.63. The internal reliability for the UTAUT and the Igroup Presence questionnaire was acceptable (Cronbach's a 0.90 and 0.83, respectively).

Discussion
The main aim of the current study was to determine if VR or MR are suitable alternatives to traditional learning methods and if they have any costs or benefits. Significant learning occurred in all three conditions; however, there was no reliable evidence to suggest that VR and MR provide increased learning over traditional methods. As those in the VR and MR conditions performed as well as those in the traditional condition, this does indicate that VR and MR are viable alternatives to traditional learning. This could be beneficial in distance learning situations where traditional learning is not possible. Alternatively, this equipment could be useful as a supplement to traditional learning, to add an active learning component (see Allcoat & von Mu¨hlenen, 2018).
Although the VR and MR participants did not perform better than those in the traditional condition for learning, there was also no evidence of impairment. Participants had little opportunity to familiarize themselves with the equipment, as they were only given verbal instructions on how to use them. In the same length of time that the traditional group had with standard learning materials, the VR and MR groups were able to learn to use new equipment, acclimatize to a simulated 3D environment, and learn as much from the material. As such, learning could potentially be improved with MR and VR than traditional learning if individuals are familiar with the equipment. Alternatively, VR and MR may be well suited as supplementary learning methods, as our results are in line with those from a meta-analysis which found that e-learning-only situations produced an equal amount of learning compared with classroom-only situations, but blended learning (a combination of both) produced better results than classroom-only instruction (Means et al., 2013). Moreover, it has been claimed that during lectures attention tends to wane after approximately 10-15 minutes (Davis, 2009;McKeachie & Svinicki, 2013). In these situations, VR methodologies might be particularly effective in generating increased engagement and improved learning outcomes. Currently, VR technologies are also used to supplement -and not to replace learning technologies, with short VR lessons (3-7 min) that are integrated into the classic lesson flow, in order to make the subject more visual and comprehensive (e.g., MEL Chemistry, see Fahrenkamp-Uppenbrink, 2015). Our findings also suggest that there can be an emotional benefit of learning in VR, as seen by the DES results with VR scoring higher for surprise and elatedness compared with traditional learning. This supports similar previous findings of improved mood as a result of learning in VR (Allcoat & von Mu¨hlenen, 2018). As student satisfaction is considered to be an important concept (e.g., Elliott & Shin, 2002), this could be considered a considerable benefit of using this type of equipment. Indeed, past research suggests that student satisfaction should be an important outcome for teaching institutions and educators, (e.g., Appleton-Knapp & Krentler, 2006;Thomas & Galambos, 2004), both for the learning benefits and for the idea of students as "consumers".
VR and MR both performed significantly better than traditional learning on the measure of engagement. Student engagement has also been proposed to be an important factor in student outcomes (Pascarella & Terenzini, 2005;Trowler, 2010), enhanced learning effectiveness (Zhang et al., 2005), and student success outcomes, such as academic achievements and student satisfaction (Kuh, 2001(Kuh, , 2005; Organisation for Economic Co-operation and Development, 2010). Therefore, the observed increased participant engagement in the VR and MR conditions are also benefits of using the equipment in learning contexts.
One potential problem with the use of XR headsets is simulator sickness. Individuals prone to motion sickness or nausea may be wary of using such devices. Our results indicate that neither VR nor MR caused simulator sickness in this context, suggesting that they would be safe to use in-classroom and distance learning. This is likely in part due to the application being welldesigned to avoid motion sickness but may also be due to the short sessions with the equipment. Hence, VR and MR may be best suited to shorter, supplementary learning sessions as part of a larger presentation, or for specific activities, as well as distance learning, which can be done at the learner's own pace.
The results from the Unified Theory of Acceptance and Usage of Technology Questionnaire, from the System Usability Scale, and from the Perceived Quality Scale showed no difference between VR and MR, suggesting that the two systems are equally suitable for this learning context. However, the participants in the VR environment had a higher increase for positive emotions, suggesting that participants enjoyed using the VR more than the MR. VR also produced higher reports of presence than MR, suggesting that being in an enclosed virtual environment leads to higher levels of immersion. These results suggest that VR has a few benefits over and above MR for this type of practical learning.
Future research should consider longer durations, as well as longitudinal impacts of the use of this technology, as in this study the headsets were only used for a short period of time. One hypothesis would be that over time the novelty effect of using the equipment would wear off, and the benefits would decrease. On the other hand, the benefits might increase over time as individuals become more familiar with technology and how to use it. As individuals become more proficient with the equipment, they may find the novelty less distracting and be more able to focus on the learning. Therefore, research considering the long-term effects of the technology is an important future focus. VR and MR may produce better retention than traditional learning methods, as it has been found that constructivist learning increases knowledge retention (Narli, 2011), however, this is a question that needs to be considered in future research.
A further suggestion for future research is the potential to focus more on the individual benefits of MR and VR, rather than taking a directly comparative approach as in this study. The design of both the VR and MR learning environments can be further enhanced to reflect the unique properties of these platforms. For example, MR could be made to interact more directly with the surrounding environment or physical components. VR could more accurately simulate a real workplace environment. These factors, along with others such as audio, were not included in the present study in order to reduce confounding variables, but could be interesting topics for future research.

Conclusions
The impact of this paper includes empirical research conducted into the pedagogic benefits of MR, a tool which has not been rigorously studied for educational applications. As this technology is relatively new and not readily available on the market or easily accessible, little empirical research has been conducted with it. Although MR has some similarities with VR, the ways they are used and applied in educational settings are potentially quite diverse due to the differing natures of the technologies. As well as considering MR alongside VR, a broader range of issues for using these technologies in education were measured. As well as the knowledge-learning aspects, user-technology interaction and user experience were examined to consider if the technology is suitable for classroom use.
The overall results do suggest that VR and MR are both suitable and safe technologies for learning, potentially enabling new approaches to teaching. Applications for these technologies can also be adapted to suit pre-existing courses, both classroom-based and distance learning. Distance learning applications can be particularly beneficial in circumstances such as those during the Covid-19 pandemic. Even when considering possible restrictions of the technology, such as how long it can be used for comfortably, the benefits, such as increased engagement and positive emotions, suggest that VR and MR would be good as supplements to traditional learning methods. As the technology improves, these restrictions will likely be reduced, such as decreased headset weight for improved comfort.
XRs have a myriad of possible uses, as many environments and interactions can be accurately simulated within virtual environments, enabling access to learning materials that may not otherwise be available to learners (Bailenson, 2018). For example, dangerous environments or chemicals can be re-created virtually, or expensive equipment that would be too costly for institutions to purchase. In these cases, XRs can provide learning methods not otherwise available, allowing a more hands-on, immersive, and active learning approach to teaching and learning.
Derrick Watson is a professor in the Department of Psychology, University of Warwick. His research interests include the mechanisms of spatial and temporal visual selection and processing, road safety, driver performance and the use of virtual reality and simulation for studying human behaviour and performance.
Adrian von Mu¨hlenen is an associate professor at the Department of Psychology, University of Warwick. He has a general interest in cognitive psychology, including visual perception, attention, learning, memory, and emotion, with a particular interest in how new technologies can be used to advance research in these areas.