A Protracted Sensitive Period Regulates the Development of Cross-Modal Sound–Shape Associations in Humans

Humans preferentially match arbitrary words containing higher- and lower-frequency phonemes to angular and smooth shapes, respectively. Here, we investigated the role of visual experience in the development of audiovisual and audiohaptic sound–shape associations (SSAs) using a unique set of five groups: individuals who had suffered a transient period of congenital blindness through congenital bilateral dense cataracts before undergoing cataract-reversal surgeries (CC group), individuals with a history of developmental cataracts (DC group), individuals with congenital permanent blindness (CB group), individuals with late permanent blindness (LB group), and controls with typical sight (TS group). Whereas the TS and LB groups showed highly robust SSAs, the CB, CC, and DC groups did not—in any of the modality combinations tested. These results provide evidence for a protracted sensitive period during which aberrant vision prevents SSA acquisition. Moreover, the finding of a systematic SSA in the LB group demonstrates that representations acquired during the sensitive period are resilient to loss despite dramatically changed experience.

(SSAs) have even been reported in the Namibian Himba tribe, which does not have an alphabet (Bremner et al., 2013). Combined with the near universality of the bouba-kiki effect, the latter finding argues against the idea that SSAs originate in any special grapheme-shape association.
Since the discovery of SSAs, the question of whether they are innate or learned has attracted considerable speculation. The near universality of the bouba-kiki effect and the presence of a moderate bouba-kiki effect in toddlers and even prelexical infants have served as evidence that SSAs have an innate basis (Maurer, Pathman, & Mondloch, 2006;Ozturk, Krehm, & Vouloumanos, 2013). For example, Pejovic and Molnar (2017) found evidence that audiovisual SSAs are present by 12 months of age. However, the strongest counterargument for the innateness hypothesis comes from the seminal work of Fryer, Freeman, and Pring (2014), who tested an audiohaptic version of the bouba-kiki effect in blind and partially sighted individuals who matched the pseudowords with haptically perceived shapes. Their findings were contrary to what would be expected if SSAs were innate: Congenitally blind participants in their study did not exhibit any systematic SSAs. A mixed group of late-blind and partially sighted individuals was found performing at an above-chance level, but these participants had a significantly reduced SSA compared with sighted controls.
A recent study in a larger sample of early-blind participants (blindness onset < 2 years of age) and lateblind participants (blindness onset ≥ 3 years of age in the sample) corroborates the claim that SSAs depend on visual experience (Hamilton-Fletcher et al., 2018), although other researchers have suggested special conditions under which SSAs may occur in the early blind (defined as blindness onset < 4 years of age; Bottini, Barilari, & Collignon, 2019). Moreover, no evidence for cross-modal associations between tactile and auditory motions (e.g., a link between increasing pitch and upward motion) has been found in early-and late-blind individuals (defined, respectively, as those with blindness onset ≤ 3 years and after the age of 5 years) but has been observed in sighted individuals (Deroy, Fasiello, Hayward, & Auvray, 2016).
Thus, existing research suggests a crucial role of visual experience in the emergence of some crossmodal correspondences. However, it is still unknown whether there is a sensitive period for cross-modal correspondences, such as the SSA, to be acquired or stabilized. Sensitive periods in development are epochs during which experience has an unusually strong impact on brain functions; after the end of the sensitive period, the acquisition of the same representations is impossible or incomplete (Knudsen, 2004). Determining sensitive phases in typical human functional development requires investigating individuals who suffered a period of blindness at birth but regained vision later. Such individuals allow researchers to determine whether a particular function, such as SSAs, can be acquired after the sensory input that seems to be crucial for its acquisition, such as vision for SSAs (Fryer et al., 2014;Hamilton-Fletcher et al., 2018), becomes belatedly available. In the present study, we tested 30 participants who were born with total bilateral dense congenital cataracts (CC group) and subsequently underwent cataractremoval surgeries. If SSA acquisition depends on a sensitive period in early ontogeny, we would expect a similar pattern of results in an additional group of 15 congenitally permanently blind individuals (CB group)that is, we would not find the systematic association between sounds and haptic-shapes that we would expect to find in a group of 70 typically sighted control participants (TS group).
Another crucial aspect of sensitive periods is that representations acquired during such periods are not lost (Knudsen, 1998). Thus, individuals who lose their vision after the sensitive period are expected to show systematic sound-haptic-shape associations, as the typically sighted do. This hypothesis was tested in an additional group of 12 late-blind individuals (LB group)-that is, people with blindness onset after the age of 12 years, when multisensory development as assessed in prospective studies comes to or has come to an end (Hillock-Dunn & Wallace, 2012;Nardini, Jones, Bedford, & Braddick, 2008;Röder, Pagel, & Heed, 2013).
Finally, investigating CC individuals allows us to study the recovery of sound-visual-shape correspondence as well. It could be argued that newly gained sight should allow the acquisition of SSAs (or, more generally, the acquisition of cross-modal associations) from the statistics of the natural environment, even if sight becomes available only late. This finding would clearly argue against an early sensitive period for the acquisition of SSAs. Since CC individuals typically still have visual impairments following surgery and recovery, we tested an additional group of 24 individuals with late-onset cataracts after cataract extraction (DC group). DC individuals underwent the same surgical treatment as the other cataract patients and also had some remaining visual impairments. All DC participants tested in the present study had suffered from markedly degraded vision before the age of 12 years. Inclusion of this group allowed us to test whether SSA acquisition is interrupted solely by a phase of congenital total loss of pattern vision or whether later phases of severe visual impairments during childhood also interfere with functional SSAs.

Participants
One hundred fifty-four individuals participated in this experiment. Thirty had their vision restored after having total bilateral dense congenital cataracts (CC group; mean age = 18.9 years, range = 6-46 years; 12 female; 28 right-handed; geometric mean visual acuity = 0.207, range = 0.014-0.7, no acuity data for 1 participant; average age at surgery = 58 months, range = 1 month-33 years). Twenty-six had surgery to remove developmental cataracts (DC group). Two participants in this group were rejected because of a history of developmental delays. The 24 remaining DC participants had the following characteristics: mean age = 13.83 years, range = 9-29 years; 13 female; 19 right-handed, 4 with unknown handedness; geometric mean visual acuity = 0.390, range = 0.003-1.00; average age at surgery = 9.30 years, range = 2-17.5 years. All DC individuals had suffered a period of degraded vision before the age of 12 years. We transformed the decimal visual acuities to LogMAR values to meaningfully compare the CC and the DC groups (Holladay, 1997). The DC group had a significantly higher visual acuity than the CC group-one-sided t test, t(38.877) = 2.064, p = .023.
Fifteen participants who lost their vision because of congenital peripheral defects, such as severe forms of Leber's congenital amaurosis, bilateral congenital anophthalmos, or aggressive retinopathy of prematurity (Stage 5), also took part in the experiment (CB group). One CB participant could discern hand movement at 10 cm; the rest had at most light-projection capacity. Their mean age (also mean blindness duration) was 27.87 years (range: 18-55 years); 6 were female, and 12 were right-handed. Additionally, 13 participants with late permanent blindness were tested for the study (LB group); 1 was excluded for a history of brain tumors and surgery. The remaining 12 had a mean age of 35.33 years (range = 21-61 years) and a mean blindness duration of 9.79 years (range = 6 months-39 years); 5 were female, and 11 were right-handed. Furthermore, 70 TS control participants (mean age = 24.04 years, range = 6-56 years; 52 female; 57 right-handed, handedness data of 6 participants were unknown) took part in the experiment.
All CC, DC, CB, and LB participants were recruited at the LV Prasad Eye Institute, Hyderabad, India, or from the local community of the city of Hamburg, Germany (see Table 1 for distributions of countries of origin and testing). The control participants had normal or corrected-to-normal vision, typical development of all sensory systems, and no neurological disorders. They were recruited from the local community in either Hyderabad, India, or Hamburg, Germany. All visually impaired individuals who participated in the study (CC, DC, CB, and LB participants) were free of any other sensory-system problems and did not have any neurological disorders.

Ethical approvals
The study was approved in parallel by the institutional ethical review board of LV Prasad Eye Institute, Hyderabad, India, as well as by the local ethical commission of the University of Hamburg Faculty of Psychology and Movement Sciences. The study conformed to the ethical principles of the Declaration of Helsinki (2013).

Consent and compensation
All participants provided written informed consent for the study. Additionally, the blind participants were orally informed about the general details of the study, data-security policies, and their right to terminate the experiment or withdraw consent for the preservation of collected data at any time. For participants who did not understand English or German, we also orally provided the same information in a language they could fully understand (e.g., Telugu, Hindi, Urdu, Bengali, or Tamil). For minors, a legal custodian's written informed consent was also obtained. For taking part in the study, adult participants received a small monetary compensation, and the expenses associated with participation (e.g., travel costs) were reimbursed. Minors received a small present instead of monetary compensation.

Experimental design
Stimuli. We decided to test SSAs rather than other cross-modal correspondences because results from previous studies (Fryer et al., 2014;Hamilton-Fletcher et al., 2018) have most consistently reported deficits for this type of cross-modal correspondence in congenitally blind or early-blind individuals. The set of stimuli consisted of five object pairs, four of which were haptic pairs and one a visual pair (see Fig. 1). Each pair consisted of one object with a smooth shape or texture and another with a spiky shape or texture. The haptic stimuli of pairs A through C closely resembled the three object pairs used in the study of Fryer et al. (2014). Specifically, pair A objects were 3-D printed in acrylonitrile butadiene styrene polymer (Fab Lab, Fabulous St. Pauli, Hamburg, Germany). The smallest bounding cuboid dimension for both objects was 100 mm × 70 mm × 60 mm. Pair B objects were flat shapes laser cut from 6-mm plywood sheets, and the smallest bounding rectangle size was 120 mm × 70 mm. Objects in pair C were 3-D-printed disks with a diameter of 40 mm and a thickness of 7 mm. One of these objects had rounded edges, and the other had a checkerboard-like geometric pattern that imparted a rough texture to the surface. Pair D objects were commercially bought wooden items and had a diameter of approximately 70 mm. Each haptic stimulus pair was presented in a black cloth bag measuring about 45 cm × 30 cm. The outlines comprising pair E were printed side by side on white, A5-size paper and were visually presented. The outlines of the objects in pair E were exactly the same as those of the objects in pair B.
Procedure. Separate questionnaires were used for TS participants, participants with a history of cataracts, and permanently blind participants to collect details pertinent to each group. The visual trial was run only in groups with visual capabilities (CC, DC, and TS). For all groups, a precomputed counterbalancing sheet was used to determine the order of trials, with the constraint that the visual trial in the CC, DC, and TS groups was presented either as the first or the last trial. The experiment was conducted with a script (see Section S1 in the Supplemental Material available online), and instructions were provided in one of the Object pairs A, B, C, and D were haptic forms, whereas object pair E was presented on a white background to participants with visual capabilities. Pair A consisted of 3-D models printed in acrylonitrile butadiene styrene polymer. The dimensions of the smallest bounding cuboid were 100 mm × 70 mm × 60 mm. Pair B consisted of flat shapes obtained by laser-cutting plywood. The shapes were about 6 mm thick, and the smallest bounding rectangle dimension was 120 mm × 70 mm. Pair C consisted of 3-D-printed acrylic disks, 40 mm in diameter and 7 mm thick. Pair D consisted of heart and star shapes made of wood, about 70 mm in diameter. Pair E consisted of visually presented shapes printed on white paper. The outlines of the shapes in pair E were exactly the same as those in pair B. Background colors in the figure are for denoting object classes and were not part of the experiment. Object colors of haptic stimuli were not visible to the participants and hence played no role in the task. languages the participant was able to understand well (English, German, Telugu, Hindi, Urdu, Bengali, or Tamil). The haptic pairs were handed to each participant one at a time in an opaque black cloth bag closed with a drawstring. Each participant received a haptic object pair exactly once, resulting in four trials involving all four haptic object pairs. Participants were instructed not to look inside the bag but instead to actively explore the contents of the bag by touching them. Thereafter, they were asked to bring out the object matching either bouba or kiki from the bag. The choice of whether to bring out bouba or kiki alternated each trial, and the sequence was counterbalanced across participants in combination with trial order (e.g., trial sequence: CBDAE; retrieval sequence: kikibouba-kiki-bouba-kiki; total cycle length: 4! × 2 × 2 = 96 for sighted participants, 4! × 2 = 48 for blind participants). At the end of all trials, participants were asked the reasons dictating their choices. Participants in the CC and the DC groups were also asked to partially copy the shapes of the objects in pair E visually to ensure that they were able to see the outlines in that pair.
Response coding. The object brought out or pointed to by the participant was coded on a response sheet. Subsequently, congruent matches were scored as 1 (i.e., kiki matched with an object with an orange background, and bouba matched with an object with a green background; Fig. 1); incongruent matches were scored as 0. In the visual modality, there was only a single trial. For the audiohaptic conditions, the average of the four trial scores was computed for each participant for visualization (see Fig. 2). Therefore, in Figure 2, a score of 1 indicates a completely congruent match in all four trials, and a score of 0 indicates a completely incongruent match. For the statistical analysis, we did not average the binary trials, instead modeling the possibly correlated nature of the trials with a random factor coded by participant ID.

Statistical analysis
For the three groups with visual capabilities who took part in this task (CC, DC, and TS), we analyzed soundvisual-shape-association (SSA-V) trials using logistic regression models, which implement the maximumlikelihood method, in the R programming environment (Version 3.3.2; R Core Team, 2016). Group (CC, DC, TS) was defined as a categorical factor. Employing two models, we first ascertained whether the probability of congruent SSA-V responses in the CC and the DC groups differed significantly from the probability of congruent SSA-V responses in the TS group (i.e., the difference from the TS group in log odds). Second, we examined whether a systematic SSA-V response was present in each of the groups-that is, whether the log odds of congruent SSA-V responses differed from chance level (zero log odds, P = .5) in each group (see Section S2 in the Supplemental Material for a detailed description).
All five groups (CC, DC, CB, LB, and TS) took part in the sound-haptic-shape-association (SSA-H) condition. Because each participant performed four trials, we employed a mixed-effects logistic regression model (Bates, Mächler, Bolker, & Walker, 2015) to test whether any of the visually impaired groups exhibited a statistically significant SSA-H reduction compared with the TS group. In this model, group (CC, DC, CB, LB, TS) was the fixed categorical factor and participant ID served as the random-intercept factor, taking into account the correlated nature of the data in each participant (see Section S2 in the Supplemental Material for a detailed description). Thereafter, we tested whether each group exhibited an SSA-H response that was significantly different from chance level by means of a zero-intercept version of the same mixed-effects logistic regression model. A priori sample-size calculations were performed using simulated data employing the (mixed) logistic regression models (see Section S2 in the Supplemental Material).

Results
We tested the development and maintenance of crossmodal SSAs in an audiovisual condition (SSA-V) with participants who had recovered their sight (CC group, n = 30; DC group, n = 24) as well as in a control group of typically sighted participants without any history of visual impairments (TS group, n = 70). In the SSA-V condition, participants saw a stimulus pair and had to indicate which shape matched a pseudoword (either bouba or kiki). In the three aforementioned groups and in two additional groups of congenitally, permanently blind participants (CB group, n = 15) and late permanently blind participants (LB group, n = 12), we ran an SSA-H task using four different haptic-shape pairs. In each of the four trials, participants received a pair of haptic stimuli in an opaque bag and had to indicate which object of the pair matched the pseudoword (bouba or kiki). In the two cataract groups and in the typically sighted group, the audiovisual condition either preceded or followed the audiohaptic conditions in a counterbalanced fashion. In addition, the order of the audiohaptic trials was randomized. The responses were analyzed with generalized linear mixed models (see the Statistical Analysis section, as well as Section S2 in the Supplemental Material).
The audiohaptic responses were analyzed using a mixed-effects logistic regression model with group as the fixed factor. The correlated nature of four trials per participant was modeled by a random-intercept factor for each participant. Comparing the generalized linear mixed models using both a likelihood-ratio test and a parametric bootstrapping test revealed an overall difference in SSA-Hs between groups, χ 2 (4) = 25.846, p < .001, parametric bootstrapping: p = .001. The consecutive logistic regression model revealed that the CC, the DC, and the CB groups, but not the LB group, significantly differed from the TS group ( In the audiovisual condition, a similar likelihood-ratio test and parametric bootstrapping test revealed that there was an overall difference between the three groups, χ 2 (2) = 14.808, p < .001, parametric bootstrapping: p < .001. A logistic regression model revealed that both the CC and the DC groups displayed a significantly Responses are shown separately for individuals with congenital cataracts (CC group), developmental cataracts (DC group), congenital permanent blindness (CB group), late permanent blindness (LB group), and typical sight (TS control group), with kernel density estimated with Gaussian kernels. The width of each plot indicates the density of the data, the red circles indicate group means, the white circles indicate individual data points (jittered for readability), and the error bars indicate 95% confidence intervals of the group means obtained by smoothed bootstrapping with Gaussian kernels. A value of 1 on the y-axis indicates a congruent SSA-H or SSA-V (kiki was represented with an angular shape and bouba with a round shape). A value of 0 indicates a incongruent SSA-H or SSA-V (kiki was represented with a round shape and bouba with an angular shape). The dotted line indicates chance-level performance. Only the CC, DC, and TS groups participated in the SSA-V condition. Black asterisks indicate significant differences between groups, and red asterisks indicate significant differences between group mean responses and chance (*p < .05, **p < .01, ***p < .001).

Discussion
In the present study, we investigated the presence of a sensitive period for the development of SSAs. Individuals who regained their sight through vision restoration surgery following a history of a transient congenital or developmental visual impairment due to cataracts were tested in both an audiohaptic (SSA-H) and audiovisual (SSA-V) context, as were sighted control participants. Additionally, congenitally and late permanently blind individuals took part in the SSA-H task.
As predicted by the assumption of a sensitive period in early childhood, we found no evidence for a systematic SSA-H in both the CC and the CB groups. The CB group's data replicated previous results in similar groups (Fryer et al., 2014;Hamilton-Fletcher et al., 2018), suggesting an essential role of developmental vision for the emergence of SSAs. Crucially, LB individuals showed a significant SSA-H indistinguishable from that of the TS group. This pattern of results demonstrates two remarkable things about sensitive phases: Visual input during childhood development is necessary for the acquisition and stabilization of representations, which seem to be invulnerable to even drastic and long-lasting changes, such as late permanent blindness for up to 39 years. Furthermore, the CC group did not demonstrate an SSA-V either, suggesting that the belatedly available audiovisual-shape statistics were insufficient for SSA acquisition. The absence of SSAs in the audiovisual and audiohaptic domains further supports the notion of ontogenetically early visual input (< 12 years) driving SSA acquisition. The latter is supported and qualified by the findings in the DC group: Like the CC group, the DC group lacked SSAs in both audiovisual and audiohaptic contexts. Since first indications of the bouba-kiki effect have been demonstrated in children and even in infants (Maurer et al., 2006;Ozturk et al., 2013;Pejovic & Molnar, 2017), an intact SSA-H effect in the LB group but not in the DC group suggests that a typical or high visual capability must exist over a protracted developmental phase to elaborate and stabilize cross-modal correspondences such as SSAs; once acquired, these representations seem to be resilient to changing visual environments.
The absence of SSAs in the DC group is remarkable because we have previously demonstrated much higher recovery of extrastriate processing in this group-partially indistinguishable from that of TS individualscompared with CC individuals . This observation encompasses face processing (Röder, Ley, Shenoy, Kekunnaya, & Bottari, 2013) and visual global motion processing  in the DC group. Because all DC individuals suffered from degraded vision before the age of 12 years, the absence of SSAs in this group provides strong evidence for a protracted sensitive period for SSAs before the age of 12 years. Moreover, since all LB individuals in the present study had typical vision until this age, we can conclude that 12 years of intact vision is sufficient for SSA acquisition. Finally, the results of the CC and the DC groups strongly suggest that possible mechanisms of sound-symbolic associations, such as statistical co-occurrence (Sidhu & Pexman, 2018), have sensitive-period constraints, because otherwise both CC and DC groups would have exhibited SSA-Vs driven by extensive exposure to audiovisual statistical properties after sight restoration. Moreover, this account does not explain why the CB group, as well as the CC and DC groups, did not develop normal SSA-Hs.
This pattern of results resembles previous findings in late-blind humans-for example, in the context of spatial reference frames for tactile processing (Collignon, Charbonneau, Lassonde, & Lepore, 2009;Röder, Rösler, & Spence, 2004) and auditory processing (Röder, Kusmierek, Spence, & Schicke, 2007). Late-blind individuals seem to use visual spatial representations despite having suffered partially longer durations of blindness than the CB individuals, who relied in these tasks mostly on body-centered reference frames. Moreover, studies in owls fitted with prisms for a transient phase during the juvenile sensitive period demonstrated that deviant cross-modal spatial associations learned during this period were not lost after prism removal and could be reevoked in adulthood (Knudsen, 1998). Fryer et al. (2014) reported diminished SSA-Hs in a mixed group of late-blind and partially sighted individuals, and Hamilton-Fletcher et al. (2018) found a lower SSA-H in late-blind participants (blindness onset ≥ 3 years) for low-pitched stimuli. On the basis of our results, we hypothesize that the reported SSA-H reductions might reflect averaging artifacts caused by including late-blind individuals with different histories of visual impairments. Reanalyzing the data of Hamilton-Fletcher et al. (2018) provided evidence for this hypothesis: Including only LB individuals with blindness onset after 12 years of age (N = 23), we found a robust SSA-H response that was indistinguishable from the SSA-H of the TS group of the same study (see Section S3 in the Supplemental Material). These findings in the LB group are reminiscent of the higher systematic sound-meaning associations that researchers have observed for words typically learned earlier in language acquisition, with the highest association for words acquired before the age of 13 years (Monaghan, Shillcock, Christiansen, & Kirby, 2014).
It could be argued that the LB individuals had a generally shorter blindness duration compared with that of CB individuals and that a longer blindness duration might have abolished SSA-H effects in this group. Impressively, however, the LB individual with the longest blindness duration (39 years) showed a fully typical SSA-H in our study, as the LB participant in the Hamilton-Fletcher et al. (2018) study did (40 years). Additionally, we found no systematic correlations between SSA-H and blindness duration in LB individuals in the present study or in the study of Hamilton-Fletcher et al. (2018; see Section S3 in the Supplemental Material). Thus, it seems highly unlikely that the blindness duration could account for the difference in SSA-H between the CB and LB individuals.
Although both vision and touch allow shape perception, visual dominance for shape acquisition could be predicted because of the higher spatial resolution afforded by vision, which in turn could foster SSA development and elaboration. If vision, however, does not provide more precise shape information during early ontogeny, SSAs might not be formed or elaborated for visual as well as for haptic shapes. In this context it is remarkable that an emergence of new cross-modal associations has been reported in earlyblind individuals (blindness onset < 2 years of age): Unlike sighted participants, early-blind individuals consistently associated higher pitch with smoother or softer textures (Hamilton-Fletcher et al., 2018). Texture can be well perceived by touch, and earlier work has shown that tactile and visual texture information are equally weighted in situations of visual-tactile conflict, unlike in shape conflicts, in which vision dominates ( Jones & O'Neil, 1985;Rock & Victor, 1964). Cross-modal correspondence might thus be defined by the dominance pattern of available sensory inputs, which in turn might be defined by the appropriateness of a sensory modality to process certain object aspects (modality appropriateness; Welch & Warren, 1980). Finally, we compared the SSA-V and SSA-H effects of the non-Indian TS subgroup to the Indian TS subgroup and found them to be indistinguishable (see Section S4 in the Supplemental Material), corroborating previous findings that SSA effects emerge independently of cultural backgrounds (Chen et al., 2016;Köhler, 1929;Oberman & Ramachandran, 2008). In the present context, this result excludes cultural differences as a possible alternative explanation for the absence of SSAs in the CC, DC, and CB individuals assessed in India.
In conclusion, the present results suggest that the development and stabilization of audiovisual as well as audiohaptic SSAs in humans depend on high-level vision over a protracted postnatal developmental period. At the same time, we provided evidence that prolonged blindness with an onset after this sensitive period fails to abolish SSAs, demonstrating the other side of the coin of sensitive periods, that is, the robustness of representations acquired during the sensitive period against loss.

Action Editor
Wendy Berry Mendes served as action editor for this article.

Author Contributions
S. Sourav and B. Röder designed the experiments and analyzed the data. All the authors were involved in the recruitment and classification of participants and in writing and revising the manuscript. All authors approved the final version of the manuscript for submission.