More horror due to specific music placement? Effects of film music on psychophysiological responses to a horror film

Previous studies have explored the impact of music on emotional visual material and examined the impact of the placement of music on neutral film scenes, but there has not been any link between these two topics so far. In our study, a horror film scene extracted from the Spanish film REC with several shock moments was presented to participants (N = 39) who are divided into three groups. The scene was underscored with specially composed horror film music that had either congruent shock moments (synchronous condition) or preponed shock moments (asynchronous condition). For the third group, there was no musical background (control group), but the same sound design and same dialogues. As physiological data we use skin conductance and heart rate, but only the skin conductance reached significant values for differences between the groups. The results show that there was an additional effect of the music even though there is already a strong emotional statement of the visual material. Furthermore, our findings suggest that the placement of music prior to the shock moment can lead to the stress level increasing earlier than in the other conditions.

underscoring signals that an action is completed, an ongoing underscoring signals the continuation of the storyline, a leitmotif announces the oncoming appearance of a well-known character, or musical motives of a threatening character tell the spectator that the protagonist who thinks themself to be safe is already in danger (Bullerjahn, 2018;la Motte-Haber & de Emons, 1980). Thompson et al. (1994) underscored a film excerpt of a Hollywood movie with different music and found that the ratings for closure were higher when underscored with closed music. Most film composers would generally agree that a thoughtful placement of music is a great influencing factor for the success of the scoring (i.e., Karlin & Wright, 2004), but there are very few empirical studies which examine that impact.

Horror films and the potential impact of background music in film
Stressful movies-especially horror films-have been produced for a long time. Although there are some studies about behavior and the cognitive or emotional reactions of people while watching horror films, it is little-examined why people like scary movies in the first place. Harris et al. (2000) asked 233 men and women how they react and behave while watching scary movies in a dating context. Frequent mentions were for example "heart beat fast," "felt amused and entertained," "was generally very jumpy," or "tried not to show how scared I was." Women also frequently mentioned "hid eyes/ look away" or "yelled and screamed." In real life, most people try to avoid feelings like fear or uncertainty. Although the question of why someone watches a film like this may not be answered conclusively, there is a consensus that there is a positive correlation between suspense and the pleasure of watching a horror film (Martin, 2019). It is also generally agreed that music is an important component of a film's atmosphere (i.e., Bullerjahn & Güldenring, 1994). Therefore, film music is an integral part of most horror films (Hentschel, 2011). The music in those films often follows similar strategies, which makes it possible for the viewer to identify the music as "typical" for horror films. Amongst other musical strategies, composers of horror film scores typically avoid memorable melodies in favor of musical textures, and use atonality as well as destabilization of pitch or harmony with various musical techniques such as glissandi, tremolos, or tone clusters during especially suspenseful scenes (Brownrigg, 2003;Scheurer, 2008). Hentschel (2011) gives a good overview with several examples from well-known films. Trevor et al. (2020) observe a similarity between human screams and the music that is sometimes used in horror films. The increasing use of temp tracks during a film production leads to more stereotypical music in the final film. The producers get used to the sound and ask the composers to create something similar (Schneller, 2018). Hence, both horror films and horror film music are familiar topics, but no one examined the impact of the precise placement of the music in stressful movies and its impact on physiological stress responses. Some studies focus on single aspects like psychophysiological responses or the impact of placement on remembrance.
Research on psychophysiological responses to music and film has shown that skin conductance is a particularly good instrument for measuring arousal; thereby, an additive effect of music on emotional film scenes could be observed (Ellis & Simons, 2005;Thayer & Levenson, 1983). Thayer and Levenson (1983) explored the psychophysiological responses to an industrial safety film (which was considered stressful) with several danger situations. They underscored the film either with "horror" music (increase condition) or "documentary" music (decrease condition). The original version without music was presented to a control group. The authors collected skin conductance and heart rate data to examine the increasing or decreasing effect of music in comparison to the condition without music. Sixty male students participated in the experiment. The obtained data showed an increase in the perception of stress for all three groups. Differences between the groups were significant for skin conductance. In all danger situations, the skin conductance of the decrease condition was significantly lower than in the increase condition. In all danger situations, documentary music led to the lowest skin conductance levels, the control group was in the middle and the horror film music reached the highest values; but not all of these differences became significant. Ellis and Simons (2005) classified music and film excerpts according to arousal and valence and then combined them differently. Apart from the psychophysiological data, the participants were asked to fill in a version of Lang's (1980) Self-Assessment-Manikins (SAM), which contains pictograms representing gradations of the dimensions valence, arousal, and dominance. The combinations of films and music with different valence and arousal could also be seen in the results of the SAM. For example, film scenes combined with music of positive valence were rated more positive than film scenes combined with music of negative valence; high arousal music led to a higher rating for the arousal of the film scenes than low arousal music. In contrast to Thayer and Levenson, heart rate reached a significant level, but only for excerpts with positive valence.
Studies about the impact of music on the remembrance of film scenes led to inconsistent results. Bezdek et al. (2017) did not find any impact of the congruency of music on the remembrance of film scenes. The authors proposed that the suspense of the visual material is decisive for the remembrance. Boltz et al. (1991) showed musical impact depending on the length of the underscoring. The music ended either before or was still running during the finale. It was observed that the expectation which was created by the music had an impact on the remembrance of the scene. When the music ended before the finale, participants better remembered the ending when the music was incongruent. This means that the scene ended differently than the music suggested. When the music was running during the finale, the remembrance was better when the music was congruent to the ending.
The assignment of intended emotions in a neutral film scene is independent from the placement of music. Music of various emotional weight was placed before or after the appearance of the protagonist and assigned correctly for all conditions (Tan et al., 2007). Aside from music, sound effects also have an influence on felt emotions and immersion while watching a movie. Kock and Louven (2018) combined silent film scenes with sound effects, music, or sound design (music + sound effects). Two pieces of Modest Mussorgsky's "Pictures at an Exhibition" in the original piano version were used as musical accompaniment. For example, sound effects included the actors' footsteps, the strangled victim's moaning, or the splintering of glass. All three conditions showed an increase in immersion compared to the silent version. The best results were reached with the sound design and music exclusively. There was an increase in suspense only concerning the use of sound design and music alone. For these findings, it has to be stated that silent film scenes have a limited generalizability. Thayer and Levenson (1983) illustrated that music has an impact on watching stressful film scenes. They showed that different types of music can either lead to an increased or a decreased stress level. As there are not many studies which focus on stressful films, but a lot of studies that examine the impact of different types of music, the aim of this study was to examine the placement of underscoring in a stressful film scene while consistently using the same film music tracks newly composed for this study.

The present study
Three hypotheses were derived from the findings of the cited studies.

H1.
Thematically matching music increases the stress in danger situations of a horror film compared to the same film scene without music, regardless whether or not the music is placed synchronously.
This hypothesis derives from the additional effect of music found in several studies and the independence from placement Tan et al. (2007) detected.
H2. The placement of music before the shock moment of a film leads to a stronger increase in stress level.
The second hypothesis builds upon the expectations of the audience. Music can indicate the ending of an action (Thompson et al., 1994). Transferred to a horror film, the music suggests a shock moment, which is not immediately fulfilled. This could lead to a false feeling of safety and to a stronger increase in stress level when the shock moment appears a few seconds later.
H3. In a suspenseful visual context, musical background can create additional stress even though the expectation of a shock moment turns out to be wrong.
For the last hypothesis, a feint was placed (a so called "red herring"; Bullerjahn, 2019, p. 230). The visual material suggests that a shock moment is going to take place, an expectation which is then left unfulfilled.

Ethics statement
The participants gave their verbal informed consent. The study was conducted in full accordance with the Ethical Guidelines of the German Association of Psychologists (DGPs) and the German Association of Psychologists (BDP) as well as the Ethical Principles of Psychologists and Code of Conduct of the American Psychological Association (APA). This procedure was confirmed by the institute's internal ethics committee. The surveys were conducted on the premises of our institute.

Participants
Thirty-nine participants (18 women, 21 men, M age = 26.02 years, age range 18-54 years) participated in the experiment and were equally and randomly assigned to three groups (synchronous, asynchronous, control, cf. Table 1). Due to a technical malfunction, skin conductance could only be collected from 29 test persons (12 women, 17 men, M age = 25.14 years, age range 20-34 years). The participants were mainly university students of different disciplines and were not paid. Thayer and Levenson (1983) used a black and white safety film, which was originally utilized in a stress study in the early 60s. As the film was outdated, we used a contemporary horror film. The nature of the film scene had to allow for the addition of musical accompaniment, contain several shock moments and preferably be unfamiliar to participants. The selected scene was extracted from the Spanish horror film REC (released in 2007) directed by Jaume Balagueró in the German-dubbed version with dialogue and sound effects, but no music. The scene had a total length of 2 min and 20 s and contained three shock moments: two real ones and a faked one. The complete movie is shown from the perspective of the hand camera of the protagonist. As there was no music in the original film scene, Julian Ortlib composed fitting music especially for this experiment. For both music conditions, we used the same music tracks. Matching the visual material, the whole scoring comprises three music tracks that were composed to stimulate three shock moments which are slowly built up. In the synchronous condition, musical and filmic shock moments were perfectly matched. In the asynchronous condition, the whole scoring was shifted forward by 2.5 s, resulting in musical shock moments slightly before the filmic shock moments. In the control condition, the same film scene was presented with the original sound but without additional music.

Stimulus
In the first visual shock moment, a roof hatch unexpectedly falls down (roof hatch), in the second shock moment, a monstrous boy punches the camera (boy), and in the third shock moment, the protagonist pans the camera around in night vision. Here, the shock moment is only implemented in the music (pan; cf. Figure 1).
The scoring was stereotypical for a horror film as described in the literature mentioned in the introduction. In the music tracks for the first two shock moments, strings in all pitches slowly build up and finally end together with a strong beat of synthesizer-percussion. In the synchronous condition, the percussive beat sounds simultaneously with the dropping of the roof hatch and the boy who punches the camera. The last shock moment is underscored with some pulsating synthesizer sounds that end when the protagonist stops panning around and finally fixates on some point in the dark.
The music tracks were subsequently tested for their suitability to horror films. Therefore, each track was rated by a different group of people in an online survey. The participants had to put 12 film genres in the order they assumed the music fits best. For all three music tracks, horror film was ranked first most often. With a chi-square test it was shown that this distribution is above chance (cf. Table 2).

Measuring instruments
The psychophysiological parameters skin conductance (in micro siemens) and heart rate (in beats per minute) were measured with a Biopac Quad Channel data collection system and recorded with the related software "Biopac Student Lab (BSL) Analyses 4.1." SAM: self-assessment-manikins. Note. We used the SAM in the version by Morris (1995). The psychophysiological parameters were measured with a Biopac Quad Channel data collection system.
In its first part, the questionnaire in the German language contained questions concerning demographic background (age and profession: open questions, gender: male/female/ other) and individual preferences regarding 18 film genres and eight music genres (5-point Likert scales: −2 don't like at all to +2 like very much). The second part includes Lang's SAM (cf. Figure 2). Each dimension (valence, arousal, dominance) was illustrated with five manikins and a 9-point scale from −4 to +4. -4 means positive valence, biggest arousal, and lowest dominance. The scale was placed directly under the manikins. The dimensions always appeared in the order valence, arousal, and dominance. By using both psychophysiological parameters and the SAM, it is possible to examine the actual reactions of the participants as well as their subjective evaluations.   (2020), the 12 film genres were melodrama, film musical, adventure movie, fantasy film, gangster movie, horror film, war movie, crime film, music film, western, comedy film, and erotic film. Crime film and war movie were named second and third most frequently in the top three for all music excerpts.

Procedure
All participants were tested in single sessions and placed in the center of a darkened room in front of a projection screen. Audio was played through stereo speakers next to the screen. First, the participants completed the part of the questionnaire regarding demographic background and preferences. Afterwards, the physiological measurement began. The participants were not told what kind of movie they were going to watch. Before the film excerpt started, we performed a 60 s baseline measurement to obtain comparative values. Following the film, the participants filled in the SAM. For that, the questionnaire was given to the participants in paper-pencil form and they could tick the point on the scale under the manikins that best suited for their impressions.

Design and analysis
The mixed design with subjective evaluations and physiological measurements produces different types of information. The self-disclosure through questionnaire with the SAM can only sensibly be performed after watching the film scene, so there is no development that can be observed during the film scenes, whereas heart rate and skin conductance are monitored during watching, so there are some repeated measurement data. For heart rate, this study confined itself to a baseline measurement and an average measurement through the film, because in the previous study by Thayer and Levenson (1983) skin conductance proved to be the more promising measurement type, and calculation of heart rates for each time of measurement would have meant a lot of additional work. For skin conductance, in addition to the baseline and average measurements, each shock moment was considered on its own and each shock moment was further divided into three intervals which help to contextualize effects coming from the music or the filmic elements (see data preparation). To check for interactions between the conditions and the times of measurements, mixed ANOVAs were performed. The collected data were checked for normal distribution of all variables. For analyses where normal distribution could not be presumed, non-parametric tests (i.e., Kruskal-Wallis test) were used. The significance level was set at α = .05. The test variables did not correlate with the age or gender of the participants, resulting in their exclusion as confounding variables. There was a significant but small correlation (r = -.343, p = .032) between the valence ratings of SAM and the pleasure of watching horror films. As expected, people who like horror films rated more positively than those who dislike horror films. As people who like or dislike horror films were present in all groups, H(2) = 1.330, p = .514, there should have been no distortion of the result by that factor.
As effect sizes are sometimes used with the same term but calculated in different ways, in this study, effect sizes are calculated as proposed by Lakens (2013). The effect sizes reached in the study by Thayer and Levenson (1983), who used a comparably stressful film, were taken as benchmarks. Thayer and Levenson performed t-tests instead of ANOVAs to find the differences between the three groups and did not report any standard deviations, so the t-values were used to calculate the effect sizes and transformed into generalized eta squared to make it comparable to the skin conductance effect sizes of this study. For skin conductance, Thayer and Levenson reached effect sizes of η G 2 = .108 to η G 2 = .204 at the three accidents between the increase and decrease condition and reached effect sizes of η G 2 = .008 to η G 2 = .09 for the differences between the increase condition and the control group. These results were used as benchmarks for this study, keeping in mind that the effect sizes between the control group and the increase condition are more suitable, because in this study, there are only increase conditions.

Data preparation
In order to create a more precise picture of the stress effects, the shock moments were divided into short intervals. Those intervals were created by forming differences between different times of measurements. Skin conductance does not increase rapidly, but rather with a short delay. To help us calculate differences for the comparison of the conditions, we had to define specific times of measurement, with each shock moment consisting of four points of measurement. t1 is the time of measurement when the musical shock moment in the asynchronous condition takes place. In the synchronous condition, the music had already started, but had not come to an end, while in the control condition there is no music. t2 marks the maximum of skin conductance after t1. The time span in which that maximum can possibly lie starts 2 s after t1 (due to the delay of the increase of skin conductance) and has to end 2 s after t3, as the increase which is triggered by t3 then starts. t3 marks the filmic shock moment (which is also the musical shock moment) in the synchronous condition. Equivalent to t2, t4 is the maximum after t3. However, the time span for this maximum is a little longer, as there is no stimulus following it that could also lead to an increase. Figure 3 shows the development of skin conductance of a participant in the asynchronous condition for the first shock moment. The different times of measurements can be seen well. The following three differences were calculated: -Interval_music (t2-t1): increase due to the underscoring and visual appeal prior to the visual shock moment -Interval_shock moment (t4-t3): increase after the visual shock moment due to film and synchronous music -Interval_total (t4-t1): total increase due to music and film

SAM and heart rate
Even though the synchronous condition reached the highest (resp. lowest) values in all dimensions of the SAM, no significant differences between the groups were found. In contrast to the SAM, heart rate increased the least in the synchronous condition, but there were also no significant differences (cf . Table 3). However, heart rate increased in all groups which demonstrates the effect of the film scene on stress t(38) = -3.97, p < .001, Hedges' g av = .627 (cf. Table 4).   The dimensions of the SAM neither correlated with heart rate nor did they correlate with skin conductance.

Skin conductance
Like heart rate, skin conductance showed an increased stress level due to the film scene, Z(29) = -4.595, p < .001, Hedges' g av = 1.176 (cf. Table 5). In the total time span, no significant differences between the groups were observed, H(2) = 4.005, p = .132, η G 2 = .139, but there was quite a good common language effect size comparing the control group with the asynchronous condition with CLES = 0.753, which means that there was a 75% chance that a randomly chosen person from the asynchronous condition would have a higher skin conductance increase than a randomly chosen person from the control group (cf. Table 6).
As there were three different shock moments, it is useful to examine them with the help of the intervals introduced in "Data preparation." For each interval, a mixed ANOVA was calculated over the three conditions and the shock moments as repeated measurement factors.

Intervals
Interval_music. For interval_music, there was a significant effect for the condition, F(2) = 6.808, p = .004 and a large effect size of η p 2 = .344. Post-hoc tests (with Bonferroni correction) show that there was a difference between the control and the synchronous condition, p = .011, 95% CI = [-1.149212, -0.125388], as well as between the control and the asynchronous condition, p = .011, 95% CI = [-1.178935, -0.127057]. In Table 7, it can be seen that skin conductance was higher for the conditions with music. In the first two shock moments, the asynchronous condition reached the highest values, but in the last, it was the synchronous condition. Yet, for the repeated measurement factor, there was no significant result, F(2) = 1.047, p = .358, and there was no interaction between shock moments and group, F(4) = 0.676, p = .48.
Interval_shock moment. For the interval_shock moment, there was no significant effect of condition, F(2) = 0.357, p = .703, even though the synchronous condition reached the highest values in all times of measurement (cf. Table 8). There was also no interaction between the shock moments and the condition, F(4) = 0.220, p = .926, but in this case, the time of measurement reached significance, F(2) = 37.899, p < .001, η p 2 = .593. This is not very surprising because in the first two shock moments there is a surprise moment in the end, while it is missing at the last time of measurement.
Interval_total. Similar to the second interval, interval_total did not reach significance for condition, F(2) = .949, p = .4, or for the interaction between condition and time of measurement, F(4) = .762, p = .832, even though both conditions with music constantly reach higher values than the control group (cf . Table 9). Again, there was a significant effect of the time of measurement due to the last shock moment that contains a feint, F(2) = 24.788, p < .001, η p 2 = .488. As there was no significant interaction between time of measurement and conditions over all intervals, we had a closer look at the individual shock moments.
Shock moments. Roof hatch. In the first shock moment, in all conditions, the control group reached the lowest values in all intervals. In interval_music, there was a significant difference between the three conditions with H(2) = 7.801, p = .02 and a very good effect size compared to our benchmarks of η G 2 = .214. Prior to the first shock moment, there was no increase in skin conductance in the control group. The most considerable increase was, as expected, in the asynchronous condition (cf. Figure 4). Even though the increase in interval_shock moment and interval_total was highest in the synchronous condition, there were no significant differences, H(2) = 0.573, p = .751 and H(2) = 0.323, p = .851.
Pan. The last shock moment reached significance for the first two intervals, H(2) = 6.509, p = .039, η G 2 = .170 and F(2) = 3.441, p = .047, η G 2 = .209, and showed similar tendencies for the last interval, H(2) = 5.499, p = .064, η G 2 = .134. Again, the effect sizes reached (medium-)  large values which are much higher than the differences between the control group and the increase group of Thayer and Levenson (1983) and approximately equal the effect sizes of the differences between the increase and decrease condition. It stands out that the synchronous condition reached the highest values over all intervals and that there is a huge gap to the increase of the asynchronous condition (cf. Figure 6). Bonferroni-corrected post-hoc tests showed that the control group differed from the synchronous condition (p = .033)

Hypothesis testing
In conclusion, the hypotheses were partly confirmed.
Hypothesis one (H1) was confirmed: The skin conductance of the participants in conditions with music always showed a higher increase than in the control group without music, and reached the significance level in several cases. The effect sizes mainly reach large values as well. The fact that there were not more significant results could be explained by the small number of participants in combination with high standard deviations. It is remarkable that, similar to Thayer and Levenson (1983) and Ellis and Simons (2005), an additive effect of music could be shown, even though there already was a strong emotional weight to the filmic material.
Hypothesis two (H2) was not completely confirmed. The increase of skin conductance was highest in the asynchronous condition over the entire time span, but there is no significant difference to the synchronous condition. During the first two shock moments, there was an increase in fear due to the musical shock moment taking place before the filmic shock moment, but with the filmic shock moment the skin conductance in the synchronous condition caught up. Thus, no effect of expectation (as Boltz et al., 1991 could determine) on the remembrance of film scenes could be shown.
The third hypothesis (H3) was confirmed based on the data of the third shock moment, which contained a feint. The music conditions reached the highest skin conductance values over all three intervals with significant results for the first two ones. However, the asynchronous condition reached surprisingly low values, but there is a simple possible explanation: Both visual and acoustic material are of small complexity in that part of the scene. In the music, there is mainly some pulsating sound, while in the film, the protagonist pans around his camera. The participants likely did not have enough indications to compensate for the displacement of the music with new synchronization points (see Lipscomb, 1995).

Limitations and future directions
Similar to Thayer and Levenson (1983) and Ellis and Simons (2005), heart rate did not deliver any results concerning the different conditions. However, heart rate showed that the chosen film scene caused stress in the intended way. More interesting and differentiated results were achieved with skin conductance. Due to the technical defect on the first survey day, there were only results from 29 test persons for skin conductance. This could be a reason for the lack of more significant values. It would be desirable to repeat the study with a larger number of participants in order to create more stable results. Furthermore, the habituation effect is an interesting topic to explore. Most of the cited studies focused on independent scenes or a small number of occurrences and therefore did not consider any kind of habituation. It can be assumed that excitement decreases as soon as the viewers recognize a system behind the placement of the music. For a follow-up study, we suggest either a constant system of music placement or a combination of feints, synchronous, asynchronous, and no music appearing in mixed order. Even though the SAM did not reach the significance level and did not correlate with skin conductance in this study, the synchronous condition reached the highest values, which is in contrast to skin conductance; hence, it could be interesting to compare the physical stress reaction with the subjective feeling of suspense in a larger study.