Cross-Modal Perceptual Organization in Works of Art

This study investigates the existence of cross-modal correspondences between a series of paintings by Kandinsky and a series of selections from Schönberg music. The experiment was conducted in two phases. In the first phase, by means of the Osgood semantic differential, the participants evaluated the perceptual characteristics first of visual stimuli (some pictures of Kandinsky’s paintings, with varying perceptual characteristics and contents) and then of auditory stimuli (musical excerpts taken from the repertoire of Schönberg’s piano works) relative to 11 pairs of adjectives tested on a continuous bipolar scale. In the second phase, participants were required to associate pictures and musical excerpts. The results of the semantic differential test show that certain paintings and musical excerpts were evaluated as semantically more similar, while others were evaluated as semantically more different. The results of the direct association between musical excerpts and paintings showed both attractions and repulsions among the stimuli. The overall results provide significant insights into the relationship between concrete and abstract concepts and into the process of perceptual grouping in cross-modal phenomena.

In recent years, perception studies have experienced a growing interest in cross-modal phenomena. The existence of naturally biased associations among shape, color, sound, taste, touch, and olfactory perception has been consistently shown, though mainly for simple stimuli, with different paradigms and by different methodologies (see e.g., Albertazzi et al., 2012Albertazzi et al., , 2014Bremner et al., 2012;Gilbert et al., 1996;Hanson-Vaux et al., 2013;Iosifyan et al., 2017;Kemp & Gilbert, 1997;Mankin & Simner, 2017;Marks, 1987aMarks, , 1987bOsterbauer et al., 2005;Sagiv & Ward, 2006;Stein, 2012). The debate on crossmodal phenomena touched on several points, such as the nature, the explanation, and the proper terminology to define them (Deroy & Spence, 2013a, 2013bSpence, 2011;Spence & Deroy, 2013). Cross-modal correspondences can be defined as naturally biased associations or congruency effects, between attributes or dimensions of stimuli in different sensory modalities (Spector & Maurer, 2008;Spence, 2011; for an overview of the contemporary debates regarding color/shape associations, see Dreksler & Spence, 2019). For example, it has been shown that correspondences exist between auditory pitch and the size and the shape of perceived visual objects (for a replication of a series of studies and further developments, see Parise & Spence, 2009). It has also been shown that the perceived correspondences between dimensions of stimuli in different modalities are described by scales of antonyms (Karwoski & Odbert, 1938;Karwoski et al., 1942;Osgood, 1960;Osgood et al., 1957; for a review, see Oyama et al., 1998). Culture may play a role in the choice of the antonyms: For example, Western music construes pitch according to an up-down spatial relationship (Pratt, 1930; see also Deroy et al., 2018;Lupton, 2018;Salgado-Montejo et al., 2016), while in other languages pitches are construed according to a size scale (small and large, as in Bali and Java), or to an aging scale (young and old, as in the Amazonian basin; Seeger, 1987; see also Albertazzi, Koenderink, et al., 2015). The most interesting novelty in the field of crossmodality, however, has been the shift of attention from the associations verified between simple stimuli of different modalities to processes of perceptual organization (Bayne & Spence, 2014;Sanabria et al., 2005;Spence & Chen, 2012;Spence et al., 2007;Stein & Meredith, 1993). Over the years, a series of stimuli of increasing complexity have been tested providing evidence of the presence of cross-modal Gestalten organizing multisensory grouping. For example, cross-modal correspondences have been verified between abstract concepts and color mapping (Albertazzi et al., 2013), between conceptual literary meanings and classical music , between contents of contemporary expressionist painting and Spanish flamenco music , and between contents of abstract paintings and tactile perceptions (Albertazzi, Bacci, et al., 2016). The last four studies do not make use of methodologies such as reaction times or implicit association tests. They focus on the phenomenological and subjective perceptual experience in first person account, without recurring to sensory-to-sensory or top-down inferential processes. These studies pertain to the Gestalt tradition and explain crossmodal phenomena in terms of grouping (Marks, 1989;Metzger, 1934;Spence, 2015;Watanabe & Shimojo, 2001; regarding the possible involvement of emotion in associations, see Fechner, 1876;Ortlieb, Ku¨gel, & Carbon 2020;Spence, 2019).
For Gestaltists, concepts are neither sets of attributes defining classes of objects (such as cat, bicycle, cathedral; Kass et al., 2015) nor disjunctive concepts, but perceptual (visual) concepts or perceived global structural configurations (see Arnheim, 1954, Chap. 2, § § 2-5, or pp. 27-64; see also Pinna & Albertazzi, 2010), and abstraction is seen as the ability to grasp the structural elements of a type of object by its perceived patterns. This explains, for example, the ability to see a "house" in a configuration of toothpicks, a "face" on a wall, or a "roof" in a triangle. Structural patterns are qualitative properties of phenomena. By no means are they metrical cues.
Kandinsky and Sch€ onberg is most evident during the initial period that led them to abstractionism in arts. However, while synesthesia seems to be a distinguishing trait in the painter (Ione & Tyler, 2003, pp. 223-224;van Campen, 2010, p. 56) and probably genuine and inborn if considered in the light of contemporary diagnostic criteria (Just, 2017); for a contrary opinion, see Baron-Cohen & Harrison, 1997, p. 10;Dann, 1998, p. 47;Harrison, 2001, p. 129), it is less traceable in the composer, with a few exceptions as in the case of his Op. 18, the Die Glu¨ckliche Hand (1910)(1911)(1912)(1913). Therefore, the comparison between the two artists primarily concerns the similarity of cross-modal dimensions expressed by their works of art. Kandinsky represented the concrete experience lived by listening to Sch€ onberg's Streichquartett, Op. 10, and the Klavierstu¨cke, Op. 11, in his painting Impression 3 (Concert). It was Kandinsky's opinion that Sch€ onberg's works, in music and painting as well, were dictated solely by the interest in the rules of organization of the inner sound (Kandinsky, 1912, p. 169 ff, 162).
The comparison between the two artists from a cross-modal viewpoint is possible when considering the works of the early Sch€ onberg, which have been traditionally associated with the expressionist movement in music. This association is further supported by his own activity as a painter. In fact, although Sch€ onberg never declared any intention to represent his music by visual art, his painting, that aroused Kandinsky's interest, displays many expressionistic characteristics (Adams, 1995).
Whether the two different paths pursued by the two artists toward their personal form of abstractionism on similar cross-modal dimensions would be ultimately perceivable (i.e., not only understandable in purely intellectual terms) and associable by naı¨ve participants was part of our research question. The occurrence of systematic associations between abstract stimuli presented in both visual and acoustic modalities, as well as their patterns, if confirmed, would shed light on the relation between the different concepts of abstraction used by the two artists; most of all, this result would contribute to the current debate on the presence of cross-modal Gestalten in highly complex stimuli.

The Study
The study tests in one experiment divided into two phases the existence of naturally biased associations among mostly naı¨ve participants between a series of Kandinsky's paintings and a series of Sch€ onberg's musical selections. The reason for choosing these stimuli is partially the strong cross-modal experience motivating Kandinsky's painting and its declared relationship with Sch€ onberg's musical innovations. The research focused on the different levels of abstraction and cross-modal associations embedded in the stimuli. The driving question was whether Kandinsky's paintings and Sch€ onberg's musical pieces (those tested in the experiment) shared parallel paths toward abstractionism, as has sometimes been speculated on the basis of the facts explained in the first section: the painter through Impressions to Improvisations to Compositions; the composer through atonality to dodecaphony. The parallel path toward abstractionism in the two artists cannot be mapped on exactly the same time span (as mentioned, Kandinsky's change of perspective is visible since the 1910s, Sch€ onberg's from the 1920s) but on a few common structural patterns expressing the new viewpoint in the two artistic fields. Also, as mentioned, Kandinsky's reported synesthetic experience is only slightly comparable to Sch€ onberg's (1950). Nevertheless, for the reasons explained in the first section, it is reasonable to assume that works from the expressionist periods of both artists would be associated. As to the later periods, it is interesting to test whether the parallel drawn between the abstraction process of the two artists is accessible to participants without a theoretical knowledge of the issue. Furthermore, to study whether or not specific fragments from Sch€ onberg's Op. 25 can be systematically associated with Kandinsky's paintings may shed light on the issue of the accessibility of 12-tone music by listeners as well on the idea of abstraction applied to Sch€ onberg's method.
As regards Kandinsky's work, we analyzed a series of paintings dating to the years of his first Compositions, and his progressive detachment from the figurative representation of objects characterizing the previous Murnau phase, which resulted in painting the pure relationships among the elements of visual, cross-modal appearances. We first analyzed 25 paintings, dated from 1910 to 1923, from which we chose 15, where the progression toward abstractionism was more visible: Precisely, 14 paintings belonging to the years between 1910 and 1914 (the socalled abstract period) and 1 in 1923 (the so-called Bauhaus period; see later). The choice fell on these pictures for the following reasons. As to the shape configuration, we chose paintings showing the transition from figurality to abstractionism (representations of elementary components of pictorial space such as lines and color spots-as occurs, for example, in Improvisation 19, Improvisation. Gorge, and Lake boat trip. The paintings of this period are characterized by the dominance of part/whole organization according to color spots, or lines grouping, orientation, and organization; as in Composition 5, Composition 6, and Improvisation 26; horizontality and verticality (as in Improvisation 19, Improvisation 14, and Black spot); rhythm, such as static or dynamic (as in Improvisation. Gorge, Improvisation 14, and Composition 7); connotative dimensions of color, such as bright/ gloomy and fiery/faded, attractive/repulsive (as in Composition 7, Lake boat trip, and Improvisation 14); and so on. These dimensions were considered in accordance with a series of observations made by Kandinsky himself in his works (Kandinsky, 1912, Chap. V;1913, p. 3): For instance, the structural presence of contraries (Gegens€ atze) ruling the configurations in these paintings concern mass versus line, shape versus color, warm versus cold, dark versus bright, thin versus thick, calm versus agitated, silent versus noisy, and so on. Kandinsky (1913) observes, for example, that in his Composition 4 (1911) light-sweetcold is the main opposition of contraries ruling the configuration of the painting (p. 35).
As regards Sch€ onberg's work, a total of 14 musical excerpts were extracted from his Op. 11 (Three Piano Pieces; Sch€ onberg, 1990a) and Op. 25 (Suite for Piano; Sch€ onberg, 1990b). Eight musical excerpts were extracted from Op. 11, and six musical excerpts from Op. 25. Musical excerpts were selected with the purpose of choosing the most representative musical passages from the two works of the composer (partly drawing on the existing literature, see comments on the individual musical excerpts in the Supplemental Material; Supko, 2017;Mayhew, 1962;Haimo, 1990), and in particular where the stylistic differences between the two different periods are most evident. For example, we selected the musical excerpts from Op. 11 trying to choose some of those passages where reminiscences of tonal harmony can be heard, and where the articulation and the agogics (intentional deviations from the basic tempo) can be distinctively attributed to the expressionist period. Similarly, we selected passages from Op. 25 where the dance features of the Baroque Suite are mostly evident in rhythm and tempo, and the ones that included the most remarkable presentations of the 12-tone material. Importantly, we tried to select musical excerpts so that each would be a complete fragment (i.e., with a perceivable beginning and end) and by trying to diversify the character of the musical excerpts as much as possible. To keep the stimuli as homogeneous as possible, the musical excerpts were extracted from recordings of the same performer (Maurizio Pollini) and were also selected to have similar durations.
Among the ones from Op. 11, three musical excerpts were chosen from the first piece (M€ aßig), three from the second piece (Sehr langsam), and two from the last one (Bewegt). Among those from Op. 25, one musical excerpt was chosen from each movement of the Suite, except for the Praeludium (because of the impossibility to isolate a fragment from this movement that would correspond to our requirements).
As we were not testing a one-to-one correspondence between paintings and musical excerpts, it was not necessary to have an identical number of visual (15) and auditory stimuli (14). Among Kandinsky's production, there were many paintings that could fit our selection criteria, and we tried to include as many of them as possible (15) within reasonable task efficiency/sample size limits. Instead, the limited extension of the chosen Sch€ onberg works, combined with our selection criteria, allowed the extraction of no more than 14 stimuli (1 less than the paintings).
The putative characteristics attributed to paintings and musical excerpts were tested in the first phase of the experiment by means of the Osgood (1956) semantic differential. The use of the Osgood semantic differential to test the associations was also partially suggested by Kandinsky's experience of the pervasive presence of contraries in visual appearances. Kandinsky (1913), in fact, speaks of 'fights among tones' [sic], 'contraries of the contradictions (Widerspru¨che),' such as light/dark and their cross-modal associations (light and sweet; cold and bitter) as founding the color harmony. The specific choice of the pairs of contraries for the Osgood semantic differential, instead, besides Kandinsky's observations, was due to features characterizing both the visual and acoustic stimuli. For example, we assumed that the calm/agitated couple might be present in a few musical excerpts, such as no. 3, no. 5, no. 6, no. 7 (agitated), and the attribute calm in the other musical excerpts taken from his Op. 11, such as musical excerpt no. 1 or no. 4 (see later). Similarly, these characteristics might be perceivable in Kandinsky's paintings Lake boat trip (calm) and Improvisation. Gorge (agitated). The grave/acuto (left in Italian) couple was chosen to associate register and color lightness; the pair hot/cold was chosen for its potential to map color to timbres and harmonies; the pair consonant/dissonant could map subject's perception of dissonance and roughness (see Tramo et al., 2001) of the harmony and contrasting colors in the paintings; and the pair pleasant/unpleasant as general affordances of the concrete experience of the stimuli (and not as an individual preference), as perceivable in Black spot, Improvisation 3 (Concert), Composition 8, and so on.
The prediction was that the participants would make systematic associations between the paintings and the series of musical selections; and that the associations would be due to the presence of similar dimensions in both, as also to be evidenced by the Osgood semantic differential. Needless to say, both the stimuli showed a high level of complexity due to the contents of the paintings and of the musical pieces. We were also aware that, for this reason, the associations could have been conveyed by different components of the artistic works, and that it was very unlikely that we would find a one-to-one correspondence between the individual elements of the pictorial and acoustic compositions. Finally, we did not test individual preferences: Our goal was to highlight, as far as possible, the role played by the (Gestalt) concrete and abstract dimensions in perceiving.

Participants
A total of 32 participants volunteered for the two phases of the experiment: 14 women (mean age: 33.5 years; standard deviation: 16.2 years; median: 23 years) and 18 men (mean age: 40.7 years; standard deviation: 17.4 years; median: 40 years). The choice of sample size is explained in the 'Statistical Methods' section.
Almost all participants were recruited from students in the Departments of Humanities, Psychology and Cognitive Science, Information Engineering and Computer Science, and Mathematics, University of Trento, Italy, although we also invited some colleagues of those departments to participate in the experiment. In so doing, we had participants of different background, age, and expertise. The address list of the students was provided by the student office. We first sent an email asking the students to take part in the experiment, mentioning that we were not looking for persons with professional experience in painting and music, although four participants had received musical training at a conservatory or from private lessons, and two of them were musicians. The questionnaire reported this information. The participants were also asked about possible conscious synesthesia (Albertazzi et al., 2013;Palmer & Schloss, 2010;Palmer et al., 2013) and only one declared to be a synesthete. We did not include synesthesia among the exclusion criteria, as the aim was to evaluate the existence of naturally biased associations in the general population. According to Sagiv and Ward (2006), the prevalence of synesthesia could be one in 20. The only exclusion criterion was visual impairment and self-reported acoustic impairment. Before the experiment, the participants carried out the Ishihara test for color. After the experiment, the participants were asked whether they had previously known the paintings and the musical excerpts that they evaluated. For all the participants (with exception of two), the musical stimuli were totally new. A few of them were generally acquainted with one painting only (Composition 8).
All the participants signed an informed consent form. The two phases of the experiment reported here complied with the ethical guidelines of the University of Trento.

Stimuli and Apparatus
The two phases of the experiment were carried on in the Experimental Phenomenology Laboratory (LabExP) at the Department of Humanities, Trento University. The laboratory had constant and controlled lighting conditions (ca. 10 lx on average in the room, given by a halogen lamp). The colors were produced on a monitor Eizo Color edge mod. CG276 (68 Â 27 cm), P7N OFTD1846 75Q, Mfd. 2013.05.24, S/N 23816053 A (resolution 2560 Â 1440).
The software used to calibrate the monitor was Eizo Corporation, Color Navigator 6 v.6.4.0.5. The calibration guaranteed a white D65, a 2.2 gamma, 120 cd/m 2 luminosity, maximum contrast. The monitor was recalibrated at the beginning of each session. The auditory stimuli were administered through Sennheiser RS180 Precision Headphones.
The two phases of the experiment were performed with an interval of about 30 minutes in which the participants left the laboratory for a short rest.

Phase 1 of the Experiment -evaluations by Osgood semantic differential
The first phase aimed to verify whether complex images and musical excerpts with varying perceptual characteristics led to consistent associations relative to a series of adjectives.
Participants were asked to rate 11 pairs of contraries (shown in random order) on a pseudo-continuous bipolar scale (ranging from 0 to 100) when looking at a painting or when listening to a musical excerpt (both presented in random order). The participants were told that they were going to be shown a set of paintings, one at a time, appearing on the left half of the screen, that could be freely zoomed by using a magnifier. The task was to evaluate the overall content (i.e., the meaning) of each painting (or of each musical excerpt) according to the 11 pairs of contraries (see later, Procedure). Pairs were presented on the right half of the screen and randomized with respect to their order of presentation in the couple (e.g., pleasant/unpleasant might be presented as unpleasant/pleasant).
The two contraries were placed at the extremes of a continuous line. Using a mouse, the participant had to place the pointer indicating his or her degree of agreement with the two adjectives with regard to the semantic content of the painting or of the musical excerpt. The software stored the choice with two scores, one for each member of the pair. Participants were allowed to change their choices at any time until they proceeded to the next stimulus. After each answer, confirmed by pressing the button 'Confirm' placed below the screen, a button with the inscription 'Proceed' appeared at the right top of the screen; by pressing it, the answer was registered and a new image or musical excerpt was presented.

General Materials
The stimuli consisted of 15 paintings by Kandinsky and 14 musical excerpts from Maurizio Pollini's performance extracted from Sch€ onberg's work (each excerpt lasted 22 seconds on average).

Procedure
The first phase of the experiment was performed using the semantic differential on a bipolar rating scale of adjectives (Osgood, 1956). The second phase of the experiment evaluated the direct association between visual and auditory stimuli.

Task of Phase 1
Participants read the following instructions on the screen: This experiment consists in the evaluation of visual stimuli (pictures) and auditory stimuli (musical excerpts).
On the left of the screen are the stimuli (see a painting or listen to a musical excerpt), while on the right there is a rating scale between two opposing adjectives. The cursor can be dragged to the right or left between the two adjectives, bringing it closer to one of the two. The task is to evaluate which of the two adjectives is associable with the stimulus presented, and to what extent, by positioning the cursor at the point that you consider most appropriate. You should prefer accuracy to promptness of response.
Once you have made your choice, press the "Confirm" button.
If you decide that neither of the two adjectives is associated with the stimulus, you can leave the cursor at the center and confirm the choice. By pressing the "Continue" button you can move to the evaluation of the stimulus on other pairs of opposing adjectives.
At the end of the session there will be a 30-minute break, after which the second session will start with instructions for the task to be performed.
When you are ready, press the "Start" button.

Phase 2 of the Experiment -direct association between paintings and musical excerpts
The aim of the second phase of the experiment was to verify whether some images of Kandinsky's paintings, with varying perceptual characteristics and contents, led to consistent cross-modal associations with musical excerpts taken from the repertoire of Sch€ onberg's works. Each participant saw a series of images of paintings in thumbnail on the screen (anytime the position of the paintings was randomly selected). The participant clicked on a specific image, which thus appeared in full screen mode, and likewise with the other images, without any constrained order. The participant viewed the images while simultaneously listening to a musical excerpt (presented in random order). The participant had to choose the image(s) that she or he most naturally associated with that music. She or he could choose up to three paintings associated with each musical excerpt and drag them in three different boxes at the bottom of the screen. The participant could go back to view the images again, and she or he could also listen repeatedly to the musical excerpt by pressing a button. Once the association had been performed, the selected images were moved down, each into one of the three boxes. Once the choice had been confirmed, it could not be changed, and the experiment continued with the presentation of all the images and the new musical excerpt, and so on until the musical excerpts were exhausted.

General Materials
The stimuli consisted of the same 15 paintings by Kandinsky and 14 musical excerpts extracted from Sch€ onberg's work.

Procedure
Participants read the following instructions on the screen: You will see a series of images of paintings in thumbnail on the screen. Click on one of them, which will appear in full screen mode, and then do likewise with the other images. At the same time, you will hear a musical excerpt. Select up to three images you most naturally associate with the music, placing them in three different boxes at the bottom of the screen. You can go back to re-view images already seen, and also to hear the musical excerpt again. Once you have confirmed your choice, it cannot be changed, and the experiment will continue with further musical excerpts until there are none left. You should prefer accuracy to promptness of response.

Statistical Methods
Descriptive statistics were calculated based on the rating values given by the participants. Euclidean distances between paintings and musical excerpts based on mean rating values for each adjective were calculated. The direct association between painting and musical excerpts was evaluated by using the v 2 test and the associated standardized residuals. Analyses were performed with R 3.3.1 software (R Core Team, 2016).
While the task of Phase 1 is mainly descriptive, the task of Phase 2 evaluates (using the v 2 test of significance) the presence of a direct association between paintings and musical clips. We expected that such an association existed and the approximate sample size was calculated in the following way: The contingency table which collects the results of task 2 has 15 rows and 14 columns and therefore 210 cells. Participants could choose up to three paintings for each musical excerpt that they heard. All participants choose three paintings, and therefore, there are 42 observations for each participant. The statistic to test the association between the variables 'painting' and 'clip' is asymptotically distributed (under the null hypothesis) as a v 2 distribution (with 182 degrees of freedom). The approximation to the true v 2 distribution could be considered good when the lowest expected frequency is higher than 5. To obtain the total number of observations, it is necessary to multiply the number of cells (210) by the minimum number of observations in each cell (6), thus obtaining a total of 1,260 observations. As each participant contributes with 42 observations, the minimum number of participants is 1,260/ 42 ¼ 30. To account for possible drop-outs, a total number of 32 subjects was considered. Table 1 reports the mean rating values for each adjective and for each painting given by the 32 participants (the corresponding standard deviations are reported in the Supplemental Table 1). Means range between 13 and 83. Painting no. 5 (Sketch for Composition 2) was considered the brightest of the 15 paintings (and therefore the least gloomy) and very fiery (and therefore not faded). Similarly, painting no. 10 (Landscape with red spots), painting no. 1 (Improvisation 10), and painting no. 15 (Composition 8) were evaluated as very bright. Painting no. 14 (Improvisation. Gorge) was considered the least static (and therefore the most dynamic); similarly, painting no. 12 (Picture with a white border), painting no. 13 (Composition 6), and painting no. 6 (Composition 5). Painting no. 8 (Impression III [Concert]) and painting no. 5 (Sketch for Composition 2) were evaluated as the fieriest. Although weaker, painting no. 1 (Improvisation 10) and painting no. 11 (Improvisation 26 [Rowing]) were also evaluated as fiery. Table 2 reports the mean rating values for each adjective and for each musical excerpt given by the same 32 participants (the corresponding standard deviations are reported in the Supplemental Table 2). Means range between 16 and 84; therefore, musical excerpt no. 1 (M€ aßig (bars 1-8) was considered the calmest of the 14 musical excerpts (and therefore the least agitated), while musical excerpt no. 14 (Gigue [bars 1-13]) was considered the least static (and therefore the most dynamic).
Euclidean distance measures the usual distance between two points on a plane, in n dimensions, it evaluates the distance between two "profiles" made up of the ratings attributed by the subjects to the adjectives. However, in order to calculate the "semantic" Euclidean distance between each painting and each musical excerpt, other 11 values are needed, that is, those corresponding to the antonym, calculated as the complement to 100 (when considering painting 9 these values are 61 55 45 43 45 57 51 56 54 43 45). These distances are shown in Table 3.

Second Phase of the Experiment
The v 2 test confirmed that the association between the variables 'painting' and 'musical excerpt' could not be considered random but instead systematic (v 2 ¼ 325; df ¼ 182; p < .001). Given that in this test the lowest expected frequency was less than 5, a Monte Carlo simulation was performed to better approximate the true sample distribution of the test. It confirmed the significance of the association (p < .001).
As the test did not indicate which musical excerpt was associated (positively or negatively) with which painting, a residual analysis was performed. A standardized form of the residual (which behaves like a normal deviate) was used to determine whether the residual was large enough to indicate a departure from a random choice. In this case, there was only about a 5% chance that any particular standardized residual would exceed 1.96 in absolute value. When we inspected 210 cells, about 10 residuals (i.e., 5% of 210) could have been so large solely because of random variation. On the other hand, as can be seen in Table 4, there were 26 residuals greater than 1.96 in absolute value.
Overall, there were 16 residuals greater than 1.96 and 10 residuals lower than À1.96. A positive residual means that the selected musical excerpt "attracted" the corresponding painting (i.e., the painting was associated with the musical excerpt more than the average); a negative residual means that the selected musical excerpt "repelled" the corresponding painting (i.e., the painting was associated with the musical excerpt less than the average).
On the other hand, the negative associations were weaker. Musical excerpt no. 14 (Gigue [bars 1-13]) was negatively associated with painting no. 3 (Lake boat trip) and showed the lowest residual (À2.76). Other three couples showed quite similar residuals (À2.5): Musical

Comparing the Results of the Two Phases
In Phase 1, participants evaluated both paintings and musical excerpts using the Osgood semantic differential, and the Euclidean distances between paintings and musical excerpts were calculated. On the other hand, in Phase 2, participants directly associated three paintings with each musical excerpt and residuals were used to evaluate the association between paintings and musical excerpts. To compare the results of the two phases, the correlation coefficient between distances and residuals was calculated. As expected, distances and residuals were inversely associated (r ¼ À.357; p < .001), that is, on average, when a painting was "near" to a musical excerpt, the corresponding residual was high and vice versa. Therefore, some degree of consistency was found between the two phases of the experiment.

General Discussion
The research verified the existence of cross-modal associations between a series of Kandinsky's paintings and a series of Sch€ onberg's music selections. Participants were for the most part (26 of the 32) without advanced knowledge/competence in music or art. The first phase tested both paintings and musical excerpts with Osgood semantic differential; the second phase tested a direct association between paintings and musical excerpts. The prediction that the participants would make systematic associations between the paintings and the series of musical selections, notwithstanding the diversity and high complexity of the stimuli, was largely validated.
Specifically, as regards Phase 1, with Osgood semantic differential, some of our assumptions were confirmed: For example, that the adjective 'agitated' might characterize some Sch€ onberg's musical excerpts such as no. 6, no. 7, no. 10, no. 13, and no. 14 and the adjective 'calm' in some of the other musical excerpts taken from Op. 11 such as no. 1 and no. 4. As predicted, the adjective 'agitated' characterized the painting Improvisation. Gorge and Composition 6. However, surprisingly, the painting Lake boat trip was not perceived as 'calm' (i.e., neither 'agitated' nor "calm"), although relatively 'calmer' than Improvisation. Gorge. However, it is interesting that none of the paintings was rated as 'calm,' reflecting the dynamic character of all the paintings selected in this study.
Landscape with red spots (no. 10) was rated as the calmest among the paintings, although not extremely calm.
Regarding the assumptions that guided the choice of the couples of contraries, the results showed the following. The warm/cold pair, surprisingly, did not result closely associated with the paintings, probably due to the great variety of colors used by Kandinsky in the selected stimuli. As to the fiery/faded and bright/gloomy pairs, the ratings seem to be more diversified with respect to the paintings, although no painting was rated neither faded nor gloomy, for similar reasons as described earlier. Also, the consonant/dissonant pair was not polarized with respect to any of the paintings, reflecting perhaps the difficulty of the participants in attributing an intuitive cross-modal meaning to such technical musical terms. The consonant/dissonant pair was more useful to describe the musical excerpts, classified as fiery, hard, and dynamic with an understandable prevalence of the 'dissonant' rating. The pleasant/ unpleasant pair was not polarized toward 'unpleasant,' with respect to any of the stimuli.
As regards Phase 2, the results of the direct association between musical excerpts and paintings showed the presence of both attractions and repulsions among the stimuli. On the other hand, the negative associations were weaker, reflecting the high suitability of auditory and visual stimuli to be combined together to produce different and specific Gestalten among very complex stimuli.
Analysis of the findings revealed that the results obtained with the two methods (the Osgood semantic differential and direct association) do not always overlap; something that we already found in a previous study of ours . Although the results are generally consistent, the evaluation based on the Osgood semantic differential (Phase 1) showed in specific occurrences a partial dissimilarity in the results of the evaluation based on direct association (Phase 2).
This partial diversity between the two phases of the experiment may reveal a structural difference between the linguistic and the perceptual mediums adopted in the evaluations, which is important to bear in mind in the debate among synesthesia, cross-modality, and ideasthesia. This result suggests that the semantic differential methods, also used in other related studies (e.g., Ran ci c & Markovi c, 2019), might be insufficient to fully account for cross-modal correspondences, in particular for multimodal Gestalt effects.
As to the aim of our study, we may conclude that the cross-modal patterns toward abstractionism, manifested in the works of the two artists, were often detectable by the participants and comparable in the first phases of their development (for Kandinsky, the period starting in the 1910s, for Sch€ onberg the period corresponding to his development of atonal music). Participants were also able to detect the change in Sch€ onberg's compositional structure leading to dodecaphony, which was, however, less associable with the figural elements still remaining in the second period of Kandinsky's painting toward abstractionism (mainly, Compositions). In fact, Sch€ onberg's dodecaphonic construction is a product of a more explicit mental operation put in place by the composer.
Concerning the debate on the opposite viewpoints in cross-modal studies, the results shed new light on the concept of ideasthesia. Ideasthesia is commonly conceived as the activation of concepts producing phenomenal experience, where concepts are generally understood in Fodorian terms, that is, as top-down cognitive abilities for our comprehension of the environment, and semantically driven. This conception, for example, equates associations between color and temperature (subjectively, red is warm, blue is cold) to associations of the type doctor and nurse (Nikoli c, 2016). These concepts and associations, however, are categorically very different (and presumably mediated by different neural pathways). As Hering and Katz observed, it is in the (universal) nature of the color red to be warm and of the color blue to be cold (Da Pos & Albertazzi, 2003;Da Pos & Valenti, 2007;Hering, 1920Hering, /1964Katz, 1935). The associations perceived between color and temperature, on which there is a general agreement in the field of arts (Albers, 1975;Kandinsky, 1912;Itten, 1961), are produced by concrete experience directly given in awareness. Color and its experienced temperature are not disjunctive and detachable components of a percept; moreover, they are not a product of top-down external associations between 'abstract' (in the sense of formal or syntactically processed) concepts due to past experience, according to the Humean and probabilistic hypothesis. They are internally related as a unitary qualitative content of awareness. Color and its experienced temperature form a unitary (Gestalt) concrete concept. Doctor and nurse, and similar pairs of concept associations, instead, are only externally, conventionally, and functionally related. Needless to say, the connotative properties of color, among which warmth/coldness, may be weakened by the context inducing phenomena of assimilation (see in Supplemental Material the comment on musical excerpt no. 8).
A point to be considered in the debate on ideasthesia, however, is firstly the profound symbolism of which Kandinsky's work is imbued and its impact on the final configurations of his paintings. Strict supporters of ideasthesia, in fact, may argue that the strong crossmodal (sometimes considered as synesthetic) content of his paintings has been induced by his theosophical or religious ideation (Kandinsky, 1912, Chap. 3), and the same would hold for Sch€ onberg. In this respect, Kandinsky's paintings and Sch€ onberg's music play the role of a case study because they are both culturally imbued with ideas circulating in the avant-garde of the 20th century (Washton Long, 1972Long, , 1975. However, as Kanizsa (1991) clearly stated, in perceptual analyses, heuristically one should distinguish phenomena pertaining to seeing (and other modalities) from phenomena pertaining to thinking. As to Kandinsky's own descriptions of his cross-modal experience analyzed with contemporary diagnostic criteria, they might allow us to hypothesize that the phenomenon is genuinely synesthetic (see e.g., Kandinsky, 1912, p. 179). Factually, the dimensions of seeing and thinking are co-present in ideation but still distinguishable. For example, the figure of the 'rider' or 'horseman,' as frequently appears in Kandinsky paintings, from a cultural viewpoint represents the knight of the Apocalypse. However, the cultural meaning as such cannot be considered directly responsible in determining or inducing the process of sensible abstraction in Kandinsky's works. In fact, the pattern (a rider, a horse) can appear in totally different cultural environments and contexts (such as in children's drawings, see Arnheim, 1969, Chap. 14). Moreover, most of the participants were totally unaware of Kandinsky's cultural and religious background, and therefore, it could not have influenced their choices.
Second, as regards the tendency to abstractionism prompted by the Jugendstil (Art Nouveau), through ornamental patterns and their intrinsic symbolism, although it influenced Kandinsky's formation during his years of training in Munich (Stelzer, 1964;Weiss, 1975), it is not what grounded his path to abstractionism: As he observed, it was the contents of his subjective experience that drove the process that led to the series of sketches for the creation of the Impressions (see, e.g., Gendarme, Impression 4 [1911]) to Improvisations (see e.g., Improvisation 14 [1910]-grounded on the interplay between color, form, brightness, and figural elements [lines, cuspids, heights]-to Compositions, during the years 1911-1914; see e.g., the transition from Landscape with tower [1909] to Church in Murnau [1910]; see Kandinsky, 1912, Chap. 8).
Similarly, as to the possible influence of theosophy on Kandinsky's painting and theory (e.g., on the German concept of "Geist," which is undeniable; Kandinsky, 1912, Chap. 3), it cannot be considered the decisive top-down factor in Kandinsky's production (for a different opinion, see Ringbom, 1966; on the diffuse interest in synesthesia by artists of the same period, see Besant & Leadbeater, 1901). Comparable influences on his theory of color were exerted by the theories of Goethe, Runge, Bezold, Wagner, Debussy, Fiedler, Swedenborg, Th. Lipps, and Worringer (e.g., the idea of a total work of art, the concept of visibility, and of inner necessity, see Kandinsky, 1912, Chap. 6). These theories, however, were diffused, discussed, and influential through the main avant-gardes of the time. The same holds for Sch€ onberg.
Finally, as the results of our study show, the meanings responsible for the associations were neither directly language nor culture-driven: And this is certainly true in the case of the second phase of the experiment, where no language dimensions were involved.
Common patterns were perceived by the participants between Kandinsky's first step toward abstractionism and Sch€ onberg's shift toward atonality. Subsequently, the composer's artistic turn to dodecaphony, mainly driven by an intellectual process (and therefore less influenced by perceptual dimensions during the creation process), showed perceivable gaps between the two artists' works of art. And in fact, in the case of Kandinsky, a greater concrete sensibility toward cross-modal dimensions in perceiving can be traced in his whole production. Vice versa, in Sch€ onberg, a sensitivity of this type, as mentioned, can be traced only in a defined period.
The fact that association patterns were weaker between Sch€ onberg's Op. 25 and the later Kandinsky's works could be ascribed to the (very) different notions of abstractionism that can be applied to the two artists' creative development. In their transition between different periods, while Kandinsky's explicit aim was to abstract the qualitative content of perception, Sch€ onberg aimed at giving a systematic order to qualities that had already been abstracted in his (and others') earlier works. Consider Op. 11, which in turn would contribute to inspiring Kandinsky's detachment from the figurative. If the association between Kandinsky's second period and Sch€ onberg's Op. 25 does not result from a common theoretical view, it can nevertheless result from the structural change in both artists' works, also influenced by the historical context. The detachment of both artists from their earlier technical habits, inducing Kandinsky to develop a more systematic framework to objectify conscious experience, and Sch€ onberg to devise a technique to systematically organize atonality, in fact, results in an increase in the methodological complexity of their works (e.g., in Kandinsky's latest Compositions, such as Composition 8, or the series of Improvisations, such as Improvisation. Gorge), which is incidentally perceivable to the viewer/listener (although not necessarily conceptually understandable) at least when matching the stimuli together.
Further studies might test expert musicians and painters only to verify whether the same correspondences take place and eventually to a greater extent in the case of expertise; or people from different cultures, totally unaware of Kandinsky's and Sch€ onberg's works. Another possibility would be to test Kandinsky's paintings of the Murnau period with the work of other composers such as Hartmann or Wagner. Then, assuming the presence of synesthetic traits in Kandinsky's works of art, it would be interesting to verify the comparison between Kandinsky and Scriabin on light and color (Kandinsky, 1912). A specific study might also focus on Sch€ onberg's Op. 18 and some of his own paintings of the same period.
As to the choice of the pairs of contraries for the Osgood semantic differential, a further analysis might be conducted on color/shape associations limited to the aforementioned connotative properties of color specifically experienced by Kandinsky.