Pitch Perception in Tone Language-Speaking Adults With and Without Autism Spectrum Disorders

Enhanced low-level pitch perception has been universally reported in autism spectrum disorders (ASD). This study examined whether tone language speakers with ASD exhibit this advantage. The pitch perception skill of 20 Cantonese-speaking adults with ASD was compared with that of 20 neurotypical individuals. Participants discriminated pairs of real syllable, pseudo-syllable (syllables that do not conform the phonotactic rules or are accidental gaps), and non-speech (syllables with attenuated high-frequency segmental content) stimuli contrasting pitch levels. The results revealed significantly higher discrimination ability in both groups for the non-speech stimuli than for the pseudo-syllables with one semitone difference. No significant group differences were noted. Different from previous findings, post hoc analysis found that enhanced pitch perception was observed in a subgroup of participants with ASD showing no history of delayed speech onset. The tone language experience may have modulated the pitch processing mechanism in the speakers in both ASD and non-ASD groups.

auditory perception observed in ASD is described by WCC theory, which highlights the difficulties at the global level that leads to the failure of the individuals with ASD to integrate information from various sources into a coherent whole because of their local bias (Happe´& Frith, 2006). Ja¨rvinen-Pasley and Heaton (2007) explained the perceptual advantage of their ASD participants with reference to both theories. Those participants focused on pitch discrimination, as instructed, and ignored other aspects of the input, including verbal-semantic information. Thus, their pitch discrimination performance was unaffected by the semantic content, unlike the neurotypical (NT) controls.
Scholars have also highlighted the contribution of stimulus complexity during perception in ASD. Stimuli of different degrees of complexity require different levels of neuro-integrative processing and lead to varying performance in auditory processing in ASD (Bonnel et al., 2010;Samson, Mottron, Jemel, Belin, & Ciocca, 2006). Evidence of such differential neurointegrating processing was initially put forward to describe visual perception performance in ASD (Bertone et al., 2005). In that original study, individuals with ASD demonstrated better control in identifying the orientation of first-order static visual stimuli than in tasks requiring second-order static visual stimuli, as the latter required more extensive neural circuitry. Samson et al. (2006) reanalyzed the behavioral and neurological data in the literature on auditory perception in ASD and concluded that individuals with these disorders exhibit a similar dissociation in the performance of auditory perception between simple and complex stimuli as in the visual modality. These researchers contended that the data support the neural complexity hypothesis, which posits that the perception of simple low-level stimuli is enhanced in ASD, whereas the perception of complex stimuli, which requires more neural resources, is impaired.
On the other hand, less robust findings or even reversed patterns of the auditory discrimination performance in ASD are also observed in the literature. For example, Heaton, William, Gummins, and Happe´(2008) pointed out that individuals with ASD showing exceptional pitch discrimination skills are the minority (around 9%). In addition, the superior pitch processing skills of the three individuals with ASD was not related to their intelligence level with two of them demonstrating normal IQ and one demonstrating significant impairment. Jones et al. (2009) also reported similar findings that enhanced frequency discrimination was only exhibited by a subgroup (20%) of the adolescents with ASD. Unlike Heaton, William et al. (2008), Jones et al. found that most of the individuals with superior pitch perception skill belonged to the high IQ group (IQ > 80) and a number of them had a history of delayed speech onset.
In a more recent study, Kargas, Lo´pez, Reddy, and Morris (2015) explored the presence of the enhanced low-level auditory discrimination skills in ASD with a more stringent research design such that the individuals with ASD were matched for age, IQ, and absence of formal musical training with non-ASD controls. Unlike most previous findings, Kargas et al. reported a generally diminished, rather than enhanced, performance in frequency discrimination in the ASD group when compared with the controls. However, exceptionally enhanced frequency discrimination skill was still observed in the group but only restricted to a small number of individuals with ASD (around 9%). Interestingly, the number of individuals in the non-ASD group showing enhanced pitch perception was similar (around 14%).
The stronger design of Kargas et al. (2015) provided a more complete picture regarding the observation of enhanced auditory perception in ASD after taking various potential contributing factors (e.g., functioning level, age of the individuals, and formal musical training) into account. In summary, enhanced pitch perception is observed in ASD but only happens in a subgroup of individuals rather than as a pervasive phenomenon across the population. In addition, the perceptual advantage is generally limited to low-level pitch detection or non-speech stimuli.

Definition of Low-Level Stimuli in Tone Languages
The distinction between low-level and complex auditory stimuli is relatively straightforward in non-tone languages, as pitch carries no linguistic information at the lexical level. However, that distinction is less clear-cut in tone languages, in which pitch variation within a syllable entails a change in lexical meaning. Cantonese is a typical example of tone languages and has six citation tones. Each syllable carries a lexical tone. For example, the Cantonese syllable /ji/ with a high-level (55) tone refers to ''clothes,'' whereas with a mid-level (33) tone it means ''idea,'' and with a mid-low level tone (22) it means ''son.'' In addition to different tone levels, there are also three contour tones: high-rising (25; /ji25/ means ''chair''), low-rising (23; /ji23/ means ''ear''), and low-falling (21; /ji21/ means ''two''; Bauer & Benedict, 1997; see Figure 1). The tonal features of Cantonese provide unbiased and interesting grounds for exploring whether individuals with ASD who speak the language demonstrate auditory domainspecificity (Ja¨rvinen-Pasley & Heaton, 2007) and superior pitch perception at the local level, as predicted by EPF model.
The study reported herein addressed two research questions. First, how is pitch perception affected by the nature of auditory stimuli (i.e., real syllable, pseudo-syllable, and non-speech tones) in Cantonese-speaking adults with ASD? In other words, do Cantonese-speaking adults with ASD exhibit similar sensitivity to pitch in speech with meaning (real syllable), speech without meaning (pseudo-syllable), and non-speech stimuli? To find the answer, the performance of individuals with ASD was compared with that of a group of matched NT adults. Second, do the individuals with ASD exhibit a superior ability to detect subtle differences between pitch levels compared with their NT counterparts?
If the ASD group exhibited similar performance in response to real syllable, pseudosyllable, and non-speech stimuli, then that finding would provide support for the reduced domain-specificity noted in Ja¨rvinen-Pasley and Heaton (2007). Based on EPF theory , it was also hypothesized that the ASD group would be more sensitive to detecting differences in the stimuli at a local level. The meaning in speech stimuli was found to have an adverse effect on pitch discrimination in those without ASD (Heaton, Hudry et al., 2008;Ja¨rvinen-Pasley & Heaton, 2007). Hence, it was anticipated that the NT group in this study would perform significantly better in processing non-speech and pseudo-syllable stimuli than real syllable stimuli while the ASD group was unaffected by the nature of the stimuli.

Method Participants
All participants were native Cantonese speakers who had received at least 9 years of compulsory education at mainstream schools in Hong Kong. Only those with no reported intellectual disabilities and with both normal hearing and normal visual acuity with or without correction were eligible for participation. The hearing ability of all participants was screened with a GSI 18 screening audiometer in a sound-proof booth at The University of Hong Kong. The passing criteria were set as at 25 dB HL at frequencies of 1000, 2000, and 4000 Hz in both ears (American Speech-Language-Hearing Association, 1997). Only those who passed the hearing screening were included.
ASD group. Participants in the ASD group were recruited from employment programs particularly designed for young adults who have been diagnosed with Asperger syndrome or high-functioning Autism. The employment programs were run by two local nongovernmental organizations in Hong Kong. The ASD diagnosis of the participants was made during their childhood by a clinical psychologist or a pediatrician. At that time in Hong Kong, clinicians mainly based their judgment on the clinical criteria stated in the Diagnostic and Statistical Manual of Mental Disorders, third edition (DSM-III; American Psychiatric Association, 1980) and/or International Classification of Diseases, 10th revision (ICD-10; World Health Organization, 1992). The current state of ASD was verified by the clinical judgment of the third author who is a speech-language pathologist with ASD expertise, and the Autism Diagnostic Observation Schedule (ADOS-2; Lord et al., 2012; Module 4) administered by research-reliable personnel. Twenty participants with ASD were recruited and passed the hearing screening. The mean age of the group was 25 years (SD ¼ 3.22) ranging from 18 to 33 years old and 17 of them were males. Among the 20 participants, three participants scored at the range of ''autism spectrum'' and 17 participants at or above the thresholds of ''autism'' in the Module-4 of ADOS-2 (Lord et al., 2012). According to the parent reports, 3 participants in the ASD group produced the first words around their first birthday while 16 participants were reported to have a delayed onset of their first words ranging from 30 months to 4 years old (see Table 1). One participant failed to provide the information regarding his speech onset from his parents. NT group. Twenty NT adults who met the inclusion criteria were recruited as the control group. The participants in the ASD group were matched individually with those in the NT group for sex, chronological age, and education background, as well as for the experience of formal musical training, which has been shown to have potential confounding effects on pitch perception ability (Scho¨n, Magne, & Besson, 2004). Participants in the NT group filled out the questions from the Autism Spectrum Quotient (AQ; Baron-Cohen, Wheelwright, Skinner, Martin, & Clunbley, 2001). The average AQ scores of the NT group were 18.25 (SD ¼ 5.48; range ¼ 9 to 29) and none of the participants scored higher than the suggested cutoff of 32 (Baron-Cohen et al., 2001). The mean age of the NT group was 24 years (SD ¼ 3.6) ranging from 17 to 34 years old. Participants' nonverbal IQ scores were estimated using the average score between two abbreviated nine-item forms of the Ravens Standard Progressive Matrices developed in Bilker et al. (2012). The two cohorts did not differ in Ravens scores (Mann-Whitney

Stimuli
The pitch discrimination experiment involved three types of auditory stimuli: real syllables, pseudo-syllables, and non-speech tones. Two monosyllabic words were used in each of the real syllable and pseudo-syllable conditions comprising a consonant-vowel-(consonant) combination with a fricative or glide and they were /jsm/ and /s~/. Two pseudo-syllables / fam/ and /w~/, which exist either as a systematic gap that violates the phonotactic rules of Cantonese or an accidental gap, were used as the pseudo-syllable stimuli. These two types of stimuli were produced by a male native Cantonese speaker with Tone 1, that is, a high-level (55) tone in Cantonese. For each monosyllabic word, four stimuli with different pitches were synthesized from the original stimulus by shifting the pitch contour down by about one semitone in a stepwise manner using the PRAAT program (Boersma & Weenink, 2001). As a result, each monosyllabic word consisted of stimuli with five different pitches, as illustrated in Figure 2.
Twenty-five pairs of stimuli of different interval sizes (i.e., one semitone, two semitones, three semitones, four semitones, and no difference) were generated for each syllable, with the occurrence of the five tones counterbalanced. Two syllables were used in each speech stimulus condition, yielding 50 trials for each. For the non-speech stimuli, the value of fundamental frequency was extracted from the real syllable stimuli. Then, croaking-like non-speech stimuli were generated using PRAAT. The resulting stimuli were manipulated to produce 25 stimulus pairs, as for the speech stimuli. All pairs were repeated once, and thus the non-speech stimulus condition also had 50 trials.

Procedures
An auditory discrimination task was employed as an experiment in which the stimulus pairs were presented by a computer. Stimuli were blocked according to type. Auditory instructions read aloud by the same speaker who recorded the stimuli were presented before the task. Within each block, stimuli were presented randomly using E-Prime 2.0 software. Two practice trials (one with identical pair and one with one-semitone difference) with feedback informing the participants of the correct answers were included in each block to provide a sense of the difference that required the participant to note. Within each pair, the interstimulus interval was around 700 ms. When a pair of stimuli was presented, the participants were asked to indicate whether the stimuli were the same or different by pressing the corresponding keys on the keyboard. The number of correct responses was captured and recorded by the E-Prime program. The blocks were presented in an order that was counterbalanced across participants from the same group. Each participant was tested individually, and all experiments took place in a quiet room.

Results
All data from the pitch discrimination task were recorded in the E-Prime program and then imported to the SPSS program for statistical analysis. Descriptive data on the results of the different stimuli types in the pitch discrimination task are displayed in Table 2. It can be seen that, in general, all participants performed better in the non-speech stimulus condition, followed by the pseudo-syllable and then real syllable conditions, except for the interval with no difference. As expected, the larger the pitch interval size, the higher the degree of accuracy in the discrimination task. Both the ASD and NT groups displayed close to perfect performance in discriminating the pairs in which the syllables differed by four semitones.
Data were analyzed using a three-way mixed-effect model analysis of variance (ANOVA) model, with group (ASD vs. NT) as the between-subject factors and stimulus type (real syllables, pseudo-syllables, vs. non-speech sounds) and interval size (one, two, three, vs. four semitones) as the within-subject factors. The percentage of correct trials at interval size in the different stimulus types served as the dependent variable.
There was no significant main effect for stimulus type and no significant main effect for group, and the effect for stimulus type-by-group interaction was also not significant. On the other hand, the main effect of interval size was significant, F(1.46, 55.43) ¼ 60.34, p < .001, 2 p ¼ .641, with participants scoring significantly lower for the pairs with one semitone difference than for the others. In addition, the stimulus type-by-interval size interaction was also significant, F(6, 126.57) ¼ 3.24, p ¼ .004, 2 p ¼ .079. Follow-up ANOVAs were conducted separately for each interval size. The follow-up analysis revealed the main effect of stimulus type to be significant only for the pairs with one semitone difference, F(1.85, 70.10) ¼ 3.842, p ¼ .029, 2 p ¼ .092, but not those on other levels. Pairwise comparisons adjusted by the Bonferroni method showed the percentage of correct trials to be significantly higher in the non-speech stimulus condition (M ¼ 70.31, SD ¼ 25.93) than in the pseudo-syllable (M ¼ 61.88, SD ¼ 33.50, p ¼ .007) while the differences between other stimulus condition pairs (real syllable vs. non-speech and real syllable vs. pseudo-syllable) were not significant. Figure 3 summarizes the results.   In summary, the participants in both the ASD and NT groups had significantly more correct trials for the stimuli in the non-speech than for those in the pseudo-syllable stimuli in discriminating pairs with one semitone difference but not for the other levels. The performance of the two groups was not significantly different. The ASD group did not exhibit enhanced pitch discrimination ability relative to the NT group regardless of the stimulus types.

Subgroup Analysis
Given the evidence of subgroup performance in pitch perception in ASD with reference to their onset of speech, a post hoc analysis focusing on the interval size of one semitone difference was conducted by separating the performance of the three participants with ASD showing no delay in their onset of speech (CKL, HKW, and LWH) and their group as well as their matched control group. Their performances were compared against the average group performances of ASD-Delay and NT-R, the groups after excluding them and their matched controls in the ASD and NT groups, respectively (see Figure 4). The three participants who showed no delay in the onset of their first words demonstrated heightened performance in general. The participants CKL and LWH showed perfect performance in all three stimulus types and HKW showed 0.93, 0.56, and 0.94 standard deviation above the mean of the NT-R group in the stimuli of real syllable, pseudo syllable, and non-speech, respectively. After excluding the six participants, the main effect of group and stimulus type, and their interaction effect remained the same such that only the main effect of stimulus type was significant, F(2, 31) ¼ 5.79, p ¼ .007, 2 p ¼ .252. Pairwise comparisons adjusted by the Bonferroni method showed the percentage of correct trials to be significantly higher in the non-speech (M ¼ 67.10, SD ¼ 25.81) than pseudo syllable stimulus condition (M ¼ 57.54, SD ¼ 31.11).

Discussion
The aim of this study was to examine whether Cantonese-speaking adults with ASD have a similar degree of sensitivity to pitch in speech (real syllables) and non-speech stimuli at monosyllable level when compared with matched controls. The ASD group demonstrated significantly weaker performance in discriminating the pitch pairs for the pseudo-syllable stimuli than non-speech stimuli and the differences between real syllable and non-speech, real syllable and pseudo-syllable were not significant. More specifically, the effect of stimulus type lay in the one semitone difference condition. The same pattern of better performance in nonspeech and real syllable conditions than pseudo-syllable was also observed in the NT group. The performance in the real syllable conditions revealed that the pitch processing involved tapped into the phonological representations of the participants in both groups. The difference of the two tones presented at two-semitone interval or more appeared to be sufficient for the listeners to encode another lexical tone category in the language as the accuracy was around 90% or above. When the categorical boundaries of lexical tones in the adults were intact, the discrimination of these pairs would not be a challenge. The patterns were not surprising as participants in the ASD group were able to communicate with verbal language in which lexical tones are the basic units. Discriminating units that contrast with meaning would not be a problem to them. When the difference of the two tones dropped to one semitone, the accuracy in both ASD and NT groups decreased quite remarkably, suggesting that the one semitone difference may be at the border of phonetic boundary for two lexical tones.
The nonsignificant difference in the performance between the real syllable and non-speech stimuli in the ASD group apparently supported the reduced domain specificity of pitch perception in autism described by Ja¨rvinen-Pasley and Heaton (2007). Their ASD group exhibited comparable discrimination skills in pitch sequence pairs in both music and speech conditions. The reduced domain specificity in the ASD group was explained by the participants' exceptional ability in directing attention to smaller units of information regardless of contexts. In the present study, however, the same perceptual pattern was also noted in the NT control group, and the ASD participants as a group did not show enhanced or diminished performance. The presumably distracting linguistic information in the real syllable stimuli did not pose particular difficulty to both groups.
Strictly speaking, this study did not replicate the phenomenon that individuals with ASD mainly focus on low-level characteristics of sounds using the current paradigm. The present results only supported the claim that enhanced perceptual processing in ASD, when compared with non-ASD counterparts, may be mainly limited to low-level simple pure tone processing (e.g., Bonnel et al., 2003;Heaton, 2003;Heaton, Hudry et al., 2008;Lepisto¨et al., 2008;Mottron et al., 2006;O'Riordan & Passetti, 2006) but not spectrally complex stimuli as those used in the present study (Samson et al., 2011). The non-speech stimuli that modulated from speech stimuli in the present study were assumed to be low level in Cantonese because of the removal of the linguistic information. However, acoustically, the stimuli were spectrally complex when compared with pure tones. Therefore, enhanced performance in ASD may not be revealed in this case. In other words, stimuli complexity may still be a potential factor that contributes to the unusual auditory processing in ASD (Samson et al., 2011). More systematic and direct investigation of this speculation can be conducted.
Another possible explanation for the absence of atypical performance of the ASD group might be that enhanced perception in ASD is not determined by sound complexity. Č eponiene _ et al. (2003) suggested that the main problems of the children with high-functioning ASD lie in selective attention to ''speech-like'' stimuli; rather than at the sensory processing stage. By using event-related brain potentials evidence, Č eponiene _ et al. (2003) attested that children with high-functioning ASD showed typical brain responses in detecting pitch changes in speech and non-speech sounds but impaired involuntary attentional responses specific to speech sounds. The authors concluded that the perception of the different types of sound in children with autism at sensory processing stage is intact. However, they show deficit orienting to stimuli which is contingent on the speech likeness of the sound stimuli instead of acoustic complexity in general. Finally, both groups performed weaker in the pseudosyllable stimuli than the non-speech stimuli. This observation was not of expectation. It might be possible that the pseudo-syllable stimuli contain more irrelevant information that interfered individuals' processing. As a result, both groups found it more difficult to discriminate pitch in the intermediate speech and non-speech context.
The post hoc subgroup analysis after separating individuals with and without delayed onset of speech in the ASD group provided some additional information regarding the processing mechanism of the ASD group. The participants with delayed speech onset as a group (n ¼ 16) in the present study showed no significant difference from their matched non-ASD controls in discriminating pitches in real syllables and non-speech. On the other hand, the three participants without delayed speech onset showed relatively stronger pitch discrimination skills in the three stimulus types when compared with the typical control group, and in particular the real syllables (see Figure 4).
The diverse performance of the participants with and without delayed speech onset was generally congruent to the observed heterogeneity of ASD in auditory perception described in previous studies (Bonnel et al., 2010;Heaton, Williams et al., 2008;Jones et al., 2009;Kargas et al., 2015). For examples, Jones et al. (2009) found that their adolescents with ASD in general as a group showed no processing advantage in detecting differences in frequency, and enhanced pitch discrimination was observed in a small proportion of the participants. Similarly, Bonnel et al. (2010) found that enhanced pitch discrimination for simple tones were only observed in their participants with autism, who exhibited a delayed onset of first words, but not with Asperger syndrome, who showed mostly typical milestones of speech characteristics (DSM-IV, APA, 1994). Heaton, Williams et al. (2008) and Kargas et al. (2015) provided the evidence that superior pitch processing ability is restricted to a subgroup of the ASD population rather than a pervasive phenomenon. In the present study, enhanced pitch discrimination was also observed as a characteristic of a particular subgroup of ASD individuals who speak Cantonese as their first language.
The relatively superior pitch perception ability of three individuals having no speech onset delay in real syllables may be due to the fact that they dedicated more neural resources to the processing of these stimuli. Samson, Zeffiro, Doyon, Benali, and Mottron (2015) found that their participants with ASD with no speech onset delay demonstrated greater task-related neural activity than typical individuals in several brain regions when processing non-social stimuli and frequency modulated sounds. Those highly activated brain regions included the inferior frontal gyrus (IFG), peri-auditory middle, and superior temporal gyri, which are associated with language processing. For those with speech delay, higher activity was also reported but mainly in the perceptual cortex, which is associated with low-level auditory cortical processing. Given that the stimulus materials used possess speech-like acoustic properties, the large engagement of language-related regions during pitch processing in the no-delay group may explain why this group also performed better when processing acoustic components that are relevant to speech recognition than the typical matched controls (Samson et al., 2015).
In addition, the findings might point to the language-specific influence on pitch perception in both typical and clinical population. Pitch perception in Cantonese speakers has a heavy functional load where speakers have to process lexical tones extensively in everyday life. The impact of this demand is reflected in studies that showed a language-specific pattern to tone perception as early as 4 months old in infants learning a tone language (e.g., Yeung, Chen, & Werker, 2013) and the association between pitch perception ability and children's later language development (Antoniou, To, & Wong, 2015). Ample evidence has supported that linguistic experience can shape the functional circuit involved in pitch processing during speech perception. For example, significant differences in behavioral and neural responses to pitch in tone versus nontone language speakers in both speech and musical contexts had been reported (e.g., Giuliano, Pfordresher, Stanley, Narayana, & Wicha, 2011;Pfordresher & Brown, 2009;Wong, Parsons, Martinez, & Diehl, 2004;Wong et al., 2012). In particular, cortical processing of lexically relevant pitch contrasts in tone speakers is language specific (Hsieh, Gandour, Wong, & Hutchins, 2001;Wong et al., 2004). The linguistic functions of pitch may have modulated individuals' perception mechanism in general. Cantonesespeaking individuals with ASD who showed some sensitivity to the language in the early years (i.e., without delayed onset of speech) may therefore possess an advantage in pitch processing after the long-term exposure to lexical tones. They therefore exhibited superior pitch perception ability in non-speech as well as speech materials. It may also be possible that the perception of pitch in individuals with ASD speaking tone and non-tone languages involves distinct processing mechanisms and different brain regions or pathways. Future brain studies would provide neurophysiological evidence to better supplement and account for these predictions.
The present finding provided certain new observations about pitch perception in ASD, but the study has several limitations. First, the present study only examined pitch discrimination. Previous studies suggested that the enhanced perception in ASD may be better realized in identification than discrimination task (Samson et al., 2006) and future studies may explore other testing methods to elicit the underlying skills. Second, the subgroup analysis is not an a priori study design. The number of participants with and without delayed speech onset was not even, which did not support powerful comparison with inferential statistics. Third, only speakers speaking one tone language were examined. Future studies can also involve individuals with other tone language background (e.g., Mandarin) to examine if similar phenomena can be observed. Finally, the present study made use of participants' education level and their performance in the abbreviated form of the Raven's Standard Progressive Matrices Test as the measures of the matching participants' intelligence. Future studies could use IQ scores in Wechsler IQ test or the full version of the Raven's Progressive Matrices.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported by a Small Project Fund from The University of Hong Kong (Ref: 201209176020).