When Does Maluma/Takete Fail? Two Key Failures and a Meta-Analysis Suggest That Phonology and Phonotactics Matter

Eighty-seven years ago, Köhler reported that the majority of students picked the same answer in a quiz: Which novel word form (‘maluma’ or ‘takete’) went best with which abstract line drawing (one curved, one angular). Others have consistently shown the effect in a variety of contexts, with only one reported failure by Rogers and Ross. In the spirit of transparency, we report our own failure in the same journal. In our study, speakers of Syuba, from the Himalaya in Nepal, do not show a preference when matching word forms ‘kiki’ and ‘bubu’ to spiky versus curvy shapes. We conducted a meta-analysis of previous studies to investigate the relationship between pseudoword legality and task effects. Our combined analyses suggest a common source for both of the failures: ‘wordiness’ – We believe these tests fail when the test words do not behave according to the sound structure of the target language.


Introduction
Demonstrations of Ko¨hler's maluma/takete effect (1929Ko¨hler's maluma/takete effect ( /1947 have continued over several decades, in a variety of contexts (for recent review, see Lockwood & Dingemanse, 2015) including recent replications in remote groups, such as the Otjiherero-speaking Himba participants from Namibia (Bremner et al., 2013). Many researchers now tacitly agree the effect may have universal sensory underpinnings (Imai & Kita, 2014;Maurer, 1993;Maurer, Pathman, & Mondloch, 2006;Ramachandran & Hubbard, 2001). However, in an often overlooked report, Rogers and Ross (1975) failed to find the effect while conducting fieldwork in Papua New Guinea: Twenty participants identified as Songe (speakers of a subdialect of Hunjara, a language of the Orokaiva group) were split between expected and unexpected responses (9 to 11). Since their report, no other failures have seen the light of day, using either the original maluma/takete word forms or other variants (e.g., bouba/kiki: Ramachandran & Hubbard, 2001). It has therefore remained a mystery when, and under what circumstances, these effects differ between groups. Fortunately, our own recent failure represents a timely opportunity to revisit conditions that generate sound symbolic failures.

Study 1: Behavioural Test Method
Syuba (also known as Kagate, ISO 639-3 syw) is a Central Bodish language of the Tibeto-Burman family, with a population of around 1,500 speakers (Gawne, 2013). It is one of a number of mutually intelligible Yolmo (ISO 639-3 scp) dialects, which share lexical tone: a high-versus-low tone contrast on the initial-syllable (Teo, Gawne, & Baese-Berk, 2015). Yolmo speakers traditionally live in the higher hills and mountains of the Himalaya (see Gawne, 2016 for further details). Syuba speakers are all multilingual in Nepali for trading, with younger speakers undertaking formal education in Nepali and English (Mitchell & Eichentopf, 2013).
The second author recorded a native speaker of Lamjung Yolmo (a mutually intelligible Yolmo variety, selected so his voice would be unfamiliar to participants) in a quiet room in Kathmandu, saying 'kiki' and 'bubu' several times, until a clear version was recorded without noise from local traffic, chickens and so forth. In Yolmo, including Syuba, words starting with /k h / take high tone, and /b/ low tone. As /k h / only occurs word initial, and always cooccurs with tone, the production of 'kiki' /k h ı´k h ı´/ resulted in HH, and by analogy the speaker produced 'bubu' /bu`bu`/ with LL. The use of reduplicated syllables (cf., Ozturk, Krehm, & Vouloumanos, 2013) generated consistent tone within each word form, even though Yolmo tones usually even out after the first syllable (Teo et al., 2015). Since higher pitches ''go with'' more angular shapes, and vice versa (e.g., Walker et al., 2010), we expected the Syuba tones to enhance the spikiness of 'kiki' and the blobbiness of 'bubu.' The recorded tokens are available in the Open Science Framework repository for this article (Gawne & Styles, 2017: https://osf.io/wt95v/).
For visual stimuli, we chose a spiky and a curvy shape, which have previously been shown to elicit matches to 'kiki' and 'bubu' word forms for South East Asian participants (Hung, Styles, & Hsieh, 2017). Since not all of our Syuba speakers would be familiar with paperbased representations, we decided to present the stimuli as physical objects instead. To generate travel-hardy objects, templates were traced out and cut from a sheet of firm, but flexible craft plastic. Both objects were covered in a self-adhesive layer of felt, in a neutral tone, to prevent injury from the sharp point of the kiki spikes.
Twenty-four Syuba participants (F ¼ 19, age: 14-55 years, education: None-Masters) sat on the floor in a quiet place in or near their homes in Kathmandu or the villages in Ramechhap and were shown two cut-out shapes (one angular, one spiky). Participants listened to the test words using headphones and pointed to indicate the best match (see Figure 1). The study was ethically approved by the first author's institution (IRB-2014-08-014). In many cases, an audience of other family members or villagers was present during testing -as is common in field testing scenarios. Participants were given the opportunity to opt out of participating if they were uncomfortable with the audience; however, all participants chose to go ahead. Presentation of the audio stimuli using headphones and counterbalancing of stimulus order was used to reduce potential response bias from onlookers. Instructions were given in Nepali by the second author and a Yolmo-speaking research assistant, and selections were recorded by hand using a paper form.

Results and Discussion
Out of 24 Syuba participants, the shape-selection task was completed in the following ratio: congruent-to-incongruent, 11 to 13 -A resounding failure. 1 Since this is only the second documented case of healthy, typically developing people failing to show this well-documented effect, we believe that unpacking the source of this failure can provide a more nuanced understanding of the effect itself. In the following section, we present a linguistic analysis of underlying biases in canonical sound-symbolism test items, and how these biases may be able to explain the failure of both Rogers and Ross's (1975) Songe participants, and our Syuba participants.

Study 2: Linguistic Analyses
Figure 2 presents a visual array of prevalence rates for consonants and vowels across different languages. The figure is an extended International Phonetic Alphabet chart overlayed with colours showing the percentage of languages reported to include each sound. The data are drawn from the PHOIBLE Online database of 2,160 segments from 1,672 documented languages (Moran, McCloy, & Wright, 2014). While by no means an exhaustive listing of all sounds from all languages of the world, this visualization illustrates a general principle of linguistic typology: Some sounds turn up in more languages than others. It should be noted that the distribution of sounds across languages follows something like Zipf's Law (Zipf, 1935), with few sounds occurring in almost all languages, and the majority of sounds occurring in very few languages, as can be seen in the number of uncoloured segments (indicating that they occur in less than 5% of languages in the PHOIBLE data set).
The two most commonly tested linguistic pseudowords are maluma/takete (the original test pair, developed by Kohler) and bouba/kiki (developed by Ramachandran and Hubbard). Table 1 shows the segment-by-segment prevalence rates for each of the phonemes in the canonical words, along with the mean prevalence rates for sounds included in each word pair (Moran et al., 2014). Both word pairs are comprised exclusively of sounds with high prevalence rates (above 50%).
Why should 'canonical' test words be constructed of such high-prevalence phones to begin with? One likely explanation is that both human language and linguistic sound symbolism benefit from discriminability, or ''perceptual separation'' (cf., Ladefoged, 1993), which can be achieved easily when differences in the vocal tract are used to maximize acoustic contrasts. For example, among consonants, voiced versus voiceless, sonorant versus abrupt sounds made at the front versus the back of the mouth give highly contrastive acoustic profiles (e.g., /m/ vs /k/, respectively). Almost all languages use this contrast in their phomenic inventories (>90%). Given their perceptual discriminability, it is perhaps hardly surprising that these tokens act as exemplary ''end points'' in cross-modal mappings between speech sounds and other graded perceptual stimuli, such as shape (i.e., curvy, spiky). Similar perceptual linkages have been observed in the extensive literature on cross-modal correspondences between graded perceptual spectra including pitch, brightness, height, size and loudness (cf., Spence, 2011), some of which can be observed in infancy (Walker et al., 2010) and in nonhuman primates (Ludwig, Adachi, & Matsuzawa, 2011). However, not all sounds that are highly discriminable exist in all languages, and this mismatch can lead to unintended sources of bias in experimental stimulus sets (a point that has been made elsewhere by Styles, 2014). First, sounds that are within the experimenter's language may be assumed to be more 'universal' than they really are (a mismatch of local-to-global prevalence). For example, English sounds like the fricatives /f//v/ and the approximant /7/ have relatively low global prevalence (49%, 29% and less than 5%, respectively), and their inclusion in cross-linguistic stimuli may compromise or complicate data collection. Second, for a supposedly 'universal' effect, we know nothing about the sound-symbolic properties of low-prevalence sounds like ejectives /b'/, ingressives / & / and obstruents at retroflex /F// / and uvular /q//G/ places of articulation, and whether their cross-modal matching would differ between groups of participants who have different experience with these kinds of sounds. Both of these problems fit the model of Western, Educated, Industrialized, Rich, and Democratic (WEIRD) biases (Henrich, Heine, & Norenzayan, 2010), according to which the majority of scientific evidence comes from a small subset of people from predominantly WEIRD nations, and hence our assumptions about what is 'normal' often come from our own experience of the world. A related source of bias is that even when we do try to select sounds for their 'universality,' they may not actually occur in a particular language (a mismatch from global-to-local prevalence). For this reason, a more thorough consideration of stimulus items is needed, both to understand exactly what we have been measuring so far, and to unpack how it relates to the human experience more broadly.
An underlying assumption in linguistic sound symbolism is that humans and other animals share some degree of structural similarity in our multisensory processing systems, through either shared evolutionary heritage (Morton, 1977;Ohala, 1994) or shared sensory experience of the same environment (Maurer, 1993;Maurer et al., 2006;Mondloch & Maurer, 2004;Spence, 2011). If the sensory substrates of cross-modal matches are innately universal, then whether or not a particular sound exists in a particular speaker's language would not be expected to influence sound-symbolic 'matching.' On the other hand, if these matching processes are subject to environmentally driven plasticity, then we would expect to see language-specific differences. Given the well-known phenomenon of phonological 'tuning' processes in infancy, and its outcomes on adult perception (Iverson & Kuhl, 1996;Kuhl, 2004Kuhl, , 2010Polka & Bohn, 2003;Schwartz, Abry, Boe¨, Me´nard, & Valle´e, 2005; Tees, 1984), the presence or absence of a sound in a speaker's language may indeed impact the perception of a cross-modal match. Given these considerations, there is no guarantee of a match between the test word and a target language. To give an example, 'bouba' 'kiki' and 'takete' would be legitimate word forms in Japanese, but 'maluma' would fail, due to the absence of Japanese [l]. Similarly, 'bouba,' 'kiki' and 'maluma' are legitimate word forms in Tiwi (a language from the Tiwi islands, off the North coast of Australia), but this time, 'takete' would fail, due to the absence of Tiwi [e] (Moran et al., 2014). That is to say, for speakers of a given language, some of the test items are simply more 'wordy' than others. Even though the sounds in these words are high prevalence, more than 30% of the world's languages would be missing one or more of these sounds. As such, the status of a test item as a possible word in the speaker's language (the item's 'wordiness') may influence when maluma/takete effects arise. With this in mind, we evaluate the status of the test words in two cases of failure: Songe (Hunjara) speakers (Rogers & Ross, 1975) and our own Syuba speakers.

Match Between Failed Test Words and Target Languages
We conducted a check of the 'wordiness' of each stimulus item, against published descriptions of the sound structures of the languages. Table 2 presents a summary of mismatches between the target words and the languages' acoustic structures. According to Gray, Hiley, and Thom's recent documentation (2015), the Hunjara language of Papua New Guinea does not contain the sounds [l] or [t h ]. This means that despite the familiarity of these sounds to English speakers, the 'maluma' and 'takete' test items used by Rogers and Ross (1975) were not a good match to the structural regularities of the Hunjara language. In the case of our participants, following analysis from Gawne (2013), although none of the test sounds were missing from Syuba phoneme inventory, the combination of sounds violated the sound patterns of Syuba words: While the [k h ] does occur in Syuba, it never occurs wordmedially, and while [u] occurs, it never appears at the end of a two-syllable word. Furthermore, given the unusual sounds in the second syllables of both words, our speaker effectively pronounced the tones in each word as though it was made of two first syllables, with reduplicated tones /HH/ and /LL/, rather than the tone levelling that normally occurs in long Syuba words (i.e., /H-neutral/ and /L-neutral/). Thus, we can conclude that Rogers and Ross used stimuli that violated the phonetic structure of the Hunjara language, and we used  (2015). b Phonological analysis from Gawne (2013).
stimuli that violated the phonotactic and tonotactic structure of the Syuba language. In both cases, the test items were not 'wordy' in the target languages.
In both the Syuba language reported here and the Hunjara language of Rogers and Ross (1975), the words used in the test did not match the sound structure of the target language. This suggests a possible link between the lack of 'wordiness' in a test stimulus and failure to show the expected maluma/takete effect.

Study 3: Meta-Analysis of Published Maluma/Takete Effects
To put these two failures in context, we conducted a review of published maluma/takete effects. The purpose of the meta-analysis is to characterize previously published studies according to whether the stimuli should be considered 'legal' pseudowords for speakers of the target language and to use the multitude of published studies to establish the expected strength of maluma/takete effects for 'legal' pseudowords. Within this context, the meta-analysis allows us to investigate whether the data arising from these two 'illegal' pseudoword studies (i.e., nonwords) overlap with the former distribution. That is to say, whether these two small studies should be considered as statistically predictable outerliers in a wide general distribution, or whether they represent a separate, statistically discrete, distribution.

Methods
We conducted a semi-systematic review of published articles containing some version of the bouba/kiki or maluma/takete test, implemented as a choice between two shapes differing in roundness/angularity, in response to auditory stimuli similar to the original word pairs. Further details of the method and the articles included in the study can be found in the Supplementary Materials for this article. Figure 1 summarizes data from studies in which participants heard an auditory pseudoword (or a pair of pseudowords) and selected a best match from a pair of pictures exhibiting a curvy/spiky shape difference using a binary match design, where the word forms were designed according to the CVCV(CV) style of maluma/ takete-bouba/kiki pseudowords, and with predominantly similar phonological content (i.e., /m, l, b, u, o/ for rounded, and /k, t, i, e/ for angular).
Studies were included if: . The test presented an auditory pseudoword or a pair of auditory pseudowords.
. The test included pictures exhibiting a curvy/spiky contrast.
. The test was implemented to generate a binary match: EITHER a choice between two pictures (a picture preference test), OR where ratings for pairs of stimuli were translated into a binary measure per individual (e.g., rating for match between 'maluma' and curvy shape was greater than match between 'maluma' and spiky shape). . The report included the number of participants tested, for use in computing standard error, 95% confidence levels and weighting for individual studies (Ramachandran & Hubbard's, 2005 ''large classroom'' was conservatively estimated as N ¼ 100). . Participants in a given group were neurotypical adolescents or adults.
Studies were excluded if: . The test involved decisions about word 'meaning' rather than shape-matching (i.e., the substantial literature on ideophones expressing adjectival/adverbial properties, or names of nouns).
. The stimuli did not include canonical phonological structure (i.e., /m, b, l, u, o/ for rounded, and /k, t, i, e/ for angular) for the majority of consonants and the majority of vowels in the majority of words. For example, the recent study by Drijvers, Zaadnoordijk, and Dingemanse (2015) was excluded as the 'pointy' words did not include voiceless stop consonants. . Noncanonical forms made up the majority of stimuli in a pooled data set and item-level statistics were not available in the published document.

Data inclusion/exclusion
. Where data were available for individual trials, each trial using canonical phonological stimuli was included in the meta-analysis (e.g., a 'bouba' trial and a 'kiki' trial in Bremner et al., 2013). . Where data were available for multiple types of trials, trials were included if they used canonical phonological stimuli and were excluded if they included a mix (e.g., only Condition 1 from Nielsen & Rendall, 2011 was included), or used substantially different phonemes (e.g., only selected trial types from Fort et al., 2015). . Where some subsets of the data came from clinical groups (Oberman & Ramachandran, 2008;Occelli, Esposito, Venuti, Arduino, & Zampini, 2013), only participants from the neurotypical control groups were included.
Pseudoword 'legality' . Pseudowords in all nine studies for English-speaking participants were classed as phonetically and phonotactically 'legal' as judged by the authors (both native Englishspeaking linguists). . Pseudowords in one study with Italian-speaking participants (Occelli et al., 2013) were judged to be phonetically and phonotactically 'legal' by the first author of that study, who noted a small proportion of the pseudowords ended in consonants, which is a lowfrequency word type in Italian, but none were nonwords (personal communication with the first author). . Pseudowords in studies with French-speaking participants (Fort et al., 2015; Peiffer-Smadja, 2010) were judged as legal by two researchers with expertise in French phonology or phonetics, who noted that some pseudowords represented low frequency word types, but none were nonwords (personal communication with the second author). . For one study with Himba participants who speak a dialect of Otjiherero (Bremner et al., 2013), a reference grammar was consulted (Mo¨hlig, Marten, & Kavari, 2002). Vowels in the test words /buba/ and /kiki/ were attested in the vowel inventory. Among consonants, bilabial and velar places of articulation are both attested (/p, k/), but voiced/ voiceless contrasts were not attested in the stop consonant inventory. This means the consonants in /buba/ and /kiki/ would exhibit a salient phonological contrast according to place (bilabial/velar), but no contrast due to voicing or aspiration. We have classed these data as phonologically 'suspicious' and moved them from the metaanalysis to the ''comparison set'' for comparison with published 'legal' studies and further discussion.

Data handling
. Data were included if they could be expressed as either the number of participants who gave the expected outcome along with the total number of participants, or participants' mean response over multiple tests, along with the group standard error. . In articles where means and standard errors were given in figures only, values were measured from the figures. . Where multiple trials were performed by a single participant within a single study, the results were combined to produce a mean for that experiment. . Where a single paper reports multiple experiments with different groups of participants, the experiments are treated as separate data sets, as they represent the responses of different people.
Meta-Analysis. The meta-analysis was conducted using the Random Effects Model described by Neyeloff, Fuchs, and Moreira (2012). This procedure allows calculation of values from studies with standard error arising from different computations (i.e., number of occurrences vs. mean of means). Standard error was as reported in the original article or calculated according to Hackshaw (2009). The data tables for the analysis can be found in the Supplementary Materials for this article, as well as in the Open Science Framework repository for this article (https://osf.io/wt95v/).

Results and Discussion
The results of the meta-analysis are given in Figure 3, where each discrete data set is represented as a single row of data, where the marker represents the effect size, and the horizontal line, the 95% confidence interval (CI). Chance (50%) is marked with a dashed line, and the meta-analysis of all studies is represented by a diamond summarizing the data sets from the ''Legal Pseudoword studies.'' According to the results of the meta-analysis, published reports of a bouba/kiki test using predominantly canonical speech sounds with phonologically and phonetically legal stimuli report an average rate of 89% congruent responses in binary preference procedures (95% CIs [84,94]). The observed power of this effect calculated in G*Power (Faul, Erdfelder, Lang, & Buchner, 2007) is effectively 1, meaning that the maluma/takete responding at above chance rates should be highly replicable, even with small samples, so long as the majority of the stimuli comprise of the canonical /b, m, l, o, u/-round, and /t, k, i, e/-spiky, phonemes. Below the meta-analysis are data from studies where phonological 'legality' of the stimuli is less clear, for comparison with the meta-analytic average. In the Himba study with speakers of Otjiherero (Bremner et al., 2013), the data align well with the published pattern of bouba/kiki effects, as reported in the original article. By contrast, in the two studies of interest here, the response rates for non-English-speaking participants are substantially different. Both of these studies included an English language comparison group, where it is clear that the English-speaking controls performed in line with the published pattern, as shown by their degree of overlap with the results of the metaanalysis (although note that there was no sample size included in the report from Rogers & Ross, 1975). The lack of overlap between the 95% CIs of the two types of experiments shows that our Syuba data and Rogers and Ross's Hunjara data represent a pattern of data discrete from the other data sets, effectively aligning with chance.
Following the meta-analysis, we conducted a further exploratory analysis (unplanned prior to data collection). We used the meta-analytic mean of 89% as a test value against which we compared the results of the two studies of interest, to check whether these samples are sufficiently well powered for comparison, using power analysis in G*Power (Faul et al., 2007). For Hunjara (M ¼ 0.45, SD ¼ 0.38), the effect size was large (d ¼ 1.16), and the observed power was high (.9995), suggesting that the original sample size of 20 participants is sufficiently well powered, and for future replications seeking a minimum power of .95, as few as 10 participants should suffice if the effect size is representative of the general Hunjaraspeaking population. For Syuba (M ¼ 0.46, SD ¼ 0.50), the effect size was also large (d ¼ 0.86), and the observed power also high (.993), suggesting that the original sample size of 24 participants is sufficiently well powered, and that future replications with this group could be expected to differ from the previously reported mean with as few as 16 participants at a minimum Power of .95, if the effect size here is representative of the general Syuba population. These Power analyses suggest that although the sample sizes here are small, they are sufficiently well powered for comparison with the previously published pattern of responses. Notably, these are the two studies in which the pseudowords contained multiple deviations from the phonology or phonotactics of the participants' languages and did not involve participants repeating the word prior to making a shape selection.  So, given that the test words used with Himba participants (Otjiherero) contained sounds not reported in the phonological inventory of their language, why did the Himba study align with the typical pattern but not the other two studies? One likely reason is perceptual assimilation: According to the Perceptual Assimilation Model of Best and colleagues, nonnative phones articulated at the same place using the same articulatory organs are perceptually assimilated to attested phonological targets (Best, McRoberts, & Goodell, 2001). Hence, even though /b/ does not have phonemic status in Otjiherero, [b] and [p] could be perceived as variants of the single bilabial category /p/. Furthermore, according to the description of the methods (Bremner et al., 2013), the instructions were verbally translated for the participant, and the participant was asked to articulate the test word before making their choice. It is therefore likely that any unfamiliar sounds in the experimenter's utterances were assimilated into Otjiherero-compliant forms, either by the translator or by the participants themselves.
To give an example of this kind of assimilation during production, similar processes can be seen when native speakers of English and Japanese pronounce foreign loan words from each others' languages: In standard Japanese, the combination /tu/ is unattested, so the English loan word 'tuna' becomes /tsuna/ in Japanese. By contrast in English, the combination /ts/ does not occur at the start of words, so the Japanese loan word 'tsunami' becomes /sunami/ in English. These kinds of changes represent assimilation of nonnative sound combinations to native targets. Since the procedure of Bremner involved Hunjara-speaking participants articulating the word forms before performing the matching task, it is likely that native target assimilation occurred and may have driven the 'legal' pseudoword performance we observe in the meta-analysis.
By contrast, in our own study and in Rogers and Ross (1975), participants were not instructed to articulate the test words, so nonnative elements of the word forms may have been preserved as ''weird sounding.'' It therefore appears that asking participants to articulate nonnative elements in a bouba/kiki test word may be a way of enhancing the 'wordiness' of stimuli.

General Discussion and Conclusions
Despite the purported ubiquity of Kohler's maluma/takete effect in the past 80 years of literature, there are surprisingly few published studies that include full data reports from 'straight' replications of the shape-preference method, using canonical acoustic stimuli. To put this lack of evidence in context, Ko¨hler's original report of the maluma/takete effect (1929/1947) includes only the comment ''most people answer without hesitation,'' and no more detailed statistics became available for these stimuli for 35 years, until Holland and Wertheimer's (1964) rating task included a binary preference measure. Similarly, Ramachandran and Hubbard's (2001) original report of the bouba/kiki effect reported 95% prevalence but omitted sample size and methodological details. Similar prevalence rates were repeated in several subsequent papers by the same authors (again, without methodological or statistical details), until they reported 98% prevalence ''in a large classroom'' (Ramachandran & Hubbard, 2005). The first time they reported complete experimental methods and data was in a separate experiment in 2008 (Oberman & Ramachandran, 2008) -by which time others had begun replicating the effect with more complete reporting (Maurer et al., 2006). This meta-analysis brings together results from approximately 558 participants and represents the largest collection of adult data from tests using canonical stimuli. According to this meta-analysis, future studies can expect 84% to 94% of people to give the expected response if they use pseudowords in which the majority of 'curvy' phonemes are /b, m, l, o, u/, and the majority of 'spiky' phonemes are /k, t, i, e/. Clearly, using less-canonical consonants, less-canonical vowels or a mix of congruent and incongruent phones in a single pseudoword should be expected to produce a lower response rate -as demonstrated most effectively in the graded comparisons of Fort et al. (2015) and also consistent with several other studies experimentally manipulating the degree of match (D'Onofrio, 2014;Nielsen & Rendall, 2011, 2013Ozturk et al., 2013;Peiffer-Smadja, 2010). The findings are also consistent with the lower rates of responses observed in studies that contrast consonants that are all voiced (e.g., /m, l, n/ vs /b, d, g/, but no voiceless /p, t, k/), where congruence rates can be as low as 73% (Drijvers et al., 2015).
One possible reason that the Hunjara-speaking Songe and the Syuba show such a large departure from the published record is that failed maluma/takete experiments remain hidden away in ''file drawers'' of linguistics and psychology departments across the globe. If this is the case, the natural distribution of the maluma/takete effect may well be more spread, and these two data points may simply represent the unlucky few failures to detect a true effect. If so, we encourage researchers to ''bring out their dead'' in the interests of clarifying the scientific record. Until this happens we have to consider the alternative possibility that the data reported in the meta-analysis represent a 'normal' effect, observed in most populations across the globe, including the remote Himba community, and these two 'failed' samples differ for other reasons. We propose that a promising source of this 'difference' could be the phonetic/phontactic legality of test words in the target language, if the words are presented via audio without any articulation by the participant. We suspect that if participants cannot automatically parse linguistic strings into ordered sequences of phonemes, then well-rehearsed ancillary processes like sensory mapping may break down. This is a testable hypothesis, and future research can explicitly test whether phonetic/phonotactic legality (wordiness) is the source of the difference we observe here.
On the matter of wordiness, where sound symbolism occurs in natural languages, it is most notable in specialized words denoting mimetic onomatapoeia (e.g., moo), synthetic or conventionalized sound symbolism (e.g., knock knock, sniffle) and other kinds of words with iconic functions (Hinton, Nichols, & Ohala, 1994). While English does not have a particularly rich vocabulary for these kinds of words, some languages have a specialized word class known as 'ideophones,' which can be very extensive, as in the case of Japanese, with something like 4,500 onomatopoeia and ideophones (Masahiro, 2007), where sounds can be used to denote not only auditory but visual experiences (kira kira -'twinkling'), tactile textures (sara sara -'softly smooth'), bodily sensations (peko peko -'to be hungry'), manner of motions (fuwa fuwa -'floatily') and even silence itself (shi n). These kinds of words are often considered 'marked,' in that they stand out from regular words in the language and often exhibit sound combinations or syllable patterns that are uncharacteristic of regular words in the language (Diffloth, 1979;Dingemanse, 2012). As noted by Childs (2014), the expressive nature of ideophones may result in competing demands for these parts of speech to stand out from the rest of the language while at the same time be a recognizable part of it. In his examples from Zulu, ideophones are marked by differential use of pitch (F 0 ), and he suggests that suprasegmental violations (pitch) may be more acceptable than segmental (phonology, phonotactics) while still retaining inherent wordiness. The consensus therefore appears to suggest that while ideophones or expressive parts of speech can under certain conditions deviate from the norms of what makes a word 'wordy,' the ways in which ideophones deviate is also governed by its own set of regularities, as Childs terms it constraints on violating constraints.
Could our reduplicating / L bu L bu/ and / H ki H ki/ stimuli match a similar 'constraint' -a pattern for ideophones in Syuba? Syuba has a small inventory of onomatapoeic words (30 out of a lexicon of 3,339 words in one dictionary), all but two of which are explicitly mimetic (SIL, 2016). Although most of these mimetics include some degree of reduplication (e.g., / L tswa H tswa˜/ for the sound of a pig), none describe visual or textural characteristics of objects. Our pseudowords were presented with a pair of physical objects differing only in their edge characteristics, making it semantically unlikely that Syuba speakers would interpret them as novel ideophones, since Syuba does not attest this kind of onomatapoeia. However, as we did not provide a narrative frame for the unusualsounding words, it remains possible that participants may have interpreted the unfamiliar word forms as ideophone-like (cf., Dingemanse, 2014), as noun-like, or as adjective-like. Regardless of the particular syntactic interpretation, it remains surprising that our participants show a pattern of responding that differs so dramatically from the previously reported effects of this kind.
Both languages that have so far shown this difference from the published norm share the feature that the pseudowords used for test were not word-like for the participants involved. We suspect this effect may be related to learning processes that occur early in childhood: First, 'tuning' to the acoustic structure of frequently heard sounds in one's native language -a precursor to linguistically refined categorical perception of native language phonemes (Iverson & Kuhl, 1996;Kuhl, 2010;Maye, Werker, & Gerken, 2002;Werker & Tees, 1984); and second, 'tuning' to frequently heard sound combinations -useful for identifying word boundaries, as a precursor to word-learning (Go´mez, Bootzin, & Nadel, 2006;Saffran, Aslin, & Newport, 1996). Both of these processes are normally thought of as linguistic adaptations for comprehension of ongoing meaningful speech. And yet, here we see linguistic tuning processes have created sensory effects that cascade outside purely listening for language: Linking auditory sounds to visual shapes appears not to occur systematically if the sounds are not well-represented by the (linguistically tuned) sensory processing system. This is the first proposal in decades that attempts to explain why supposedly universal sound symbolism sometimes fails. Furthermore, it is an eminently testable hypothesis, which we hope will generate plenty of more or less successful replications, in more or less WEIRD languages. To facilitate this endeavour, you can download a template to cut out your own shapes, or 3D print them (here: https://osf.io/wt95v/). We look forward to seeing what future failures will tell us about this new theory and how our understanding of connections between visual and auditory perception can be further refined by a better understanding of the boundaries between linguistic and nonlinguistic sensation.
Lauren Gawne's, research focuses on the documentation of Tibeto-Burman languages, with specialisation in evidentiality, gesture and critical approaches to language documentation. Lauren is currently a David Myers Research Fellow at La Trobe University, and was an Endangered Languages Documentation Programme Postdoctoral Fellow at SOAS from 2015 to 2017.