An examination of prosody and second language sentence processing through pause insertion

Objectives: Research on second language (L2) sentence comprehension often has examined reliance on semantic and syntactic information but has left aside for the most part the role of prosodic cues. In the present study, we compare less- and more-proficient L2 learners’ integration of prosody and syntax structure during auditory L2 sentence comprehension. Design: Two group Chinese learners of L2 English learners (A2 and C1 levels) participated in an auditory comprehension task, which included sentences that had artificial pauses inserted either between or within syntactic boundaries. After hearing each sentence, learners were asked to judge the translation as ‘identical’ or ‘not identical’ on the keyboard. Data Analysis: We conducted t-tests and an analysis of variance to examine prosodic effects among the two learner groups. Findings: The results showed that both A2 and C1 learners were sensitive to pauses. However, the direction and magnitude of this sensitivity was significantly different for the two groups. A2 learners were faster to respond to auditory sentences in which a brief pause was placed within syntactic phrases. Contrarily, C1 learners responded faster when the brief pause was placed between syntactic phrases. Originality: Unique to the present study is the inclusion of the pause-insertion paradigm to examine the role of prosody in L2 auditory sentence processing. Implications: The results imply that the two groups of learners do not rely on prosodic and syntactic cues in the same manner when processing L2 sentences. We argue that the processing mechanisms involved in L2 sentence comprehension evolve hand-in-hand with L2 proficiency development. We discuss the implications of these findings for future research.


Introduction
In speech production, a number of processes must occur for an utterance to be well-formed and comprehensible to a listener. In addition to constructing a message and then choosing the words that adequately convey that message, these processes must include how words should be merged together in accordance with the syntactic rules of the language, and how those words are grouped together in real time during overt production. The former falls under the domain of syntax, while the latter is part of prosodic phrasing. The role of prosody is an under-investigated issue in second language (L2) sentence comprehension. While there seems to be consensus that there is not a one-to-one relationship between prosody and syntax, there are several factors, such as the effects of L2 proficiency, that require further examination (Nickels & Steinhauer, 2018). To this end, in the present study, we examine L2 sentence processing among beginning and intermediate learners using a pause-insertion paradigm. In the next section, we provide a brief background on prosody and L2 sentence processing. We then present the current study, discuss the results, and offer suggestions for future research.

The role of prosody in L2 sentence processing
The contributions of semantic and syntactic cues to L2 sentence processing are seemingly disparate (see Schafer et al., 2000), making it apparent that these different accounts, experimental materials, and proficiency levels merit further exploration (Nickels & Steinhauer, 2018). Firstly, since previous studies have been limited to phenomena such as the integration of morphosyntax (Hopp, 2015) or ambiguous phrases (Kjelgaard & Speer, 1999), it is unclear whether these findings are domain-specific. Secondly, the sentences used in previous work have typically involved a high demand for cognitive effort (e.g. Nickels & Steinhauer, 2018;Pauker et al., 2011). This can potentially create a source of bias in language processing (Cunnings, 2017). Tasks that are highly cognitively demanding may mitigate L2 learners' use of phrase-based structure strategies. For instance, garden-path sentences are assumed to create more difficulties in processing for L2 learners (Levy, 2008) because the maneuvering of the phonic and lexical cues consumes more cognitive resources than for native speakers (see also Hopp, 2015). This, in turn, may distract their attention from syntactic structure (Meroni & Crain, 2003). Employing experimental materials that are less cognitively demanding is essential to avoid a ceiling effect for less-proficient learners.
Research to date has mostly examined the pattern of reliance on semantic and syntactical information in sentence processing (see Hopp, 2015) but has failed to address prosodic information. Only a few studies have investigated the integration of prosodic and syntactic information during L2 auditory sentence comprehension. For example, Dekydtspotter et al. (2008) examined how syntax, prosody, and context interact when second-semester and fourth-semester English-French learners processed relative clause attachment in complex nominal expressions. The results showed differences in how the two groups of learners processed L2 sentences such that less-proficient individuals demonstrated a preference to obey principles of structure, while their more-proficient counterparts displayed a tendency to reflect prosodic segmentation of the sentences. The authors argue that for more-proficient learners, this development 'suggests a growing ability to revise the initial parse after prosodic feedback within the limits of cognitive resources ' (p. 477). In another study, which draws on neurological evidence, Nickels and Steinhauer (2018) explored how first language (L1) background and L2 proficiency influence the processing of prosody-induced garden-path effects. The researchers analyzed the event-related potential (ERP) of Chinese and German L2 learners of English and found that L2 proficiency shapes the integration of prosodic and syntactic cues. This finding was true regardless of L1 background (Chinese versus German).

Prosody in L2 processing
In spoken language, the integration of prosodic and syntactic structures is implemented naturally and frequently (Schafer et al., 2000). Prosodic and syntactic boundaries always coincide, especially in speech read aloud (Cole, 2015). According to the generalized wrap theory (Truckenbrodt, 1995), a prosodic boundary tends to align with the syntax juncture (cf. Frazier et al., 2004). In speech comprehension, listeners construct an abstract prosodic representation to maintain spoken sentences in their immediate memory. This provides the initial domains for syntactic structuring and semantic analyses (Schafer, 1997). For instance, when hearing (1): (1) The bus driver angered the passenger with a mean look.
listeners attach with a mean look to the passenger (low attachment in syntactic structure) if a prosodic boundary between angered and the passenger splits the sentence into two prosodic phrases (see Schafer, 1997, for more about the prosodic visibility hypothesis). Due to the alignment of the prosodic phrases at their right-hand edge with major syntactic boundaries, prosodic boundaries are taken as location markers of syntactic juncture (Cole, 2015). Therefore, when processing demanding utterances such as garden-path sentences, native speakers demonstrate congruency effects, as evidenced by faster response times (RTs) when syntactic boundaries agree with prosodic boundaries and slower RTs when they are incongruent. Kjelgaard and Speer's (1999) work offers a demonstration of this finding. In the study, the researchers manipulated prosodic boundaries (intonational phrase) in sentences to examine the effect of prosody on parsing in sentence comprehension. Examples are in (2) and (3): (2) When Rodger leaves the house is dark.
(3) When Rodger leaves the house it's dark. Kjelgaard and Speer's (1999) results from a speeded phono-syntactic grammatical judgment, end-of-sentence comprehension, and cross-modal naming task showed a facilitation when prosodic boundaries coincided with syntactic boundaries. However, there was interference when prosodic boundaries were located at non-syntactic boundaries. Additional support to the claim that prosody structure is represented in the underlying syntactic structure of spoken language comprehension comes from studies using an electroencephalograph (EEG) (Bögels et al., 2013;Pauker et al., 2011;Steinhauer et al., 1999).

Present study
Unique from previous research, the current study incorporates a pause-insertion paradigm to examine L2 learners' integration of prosody and syntax structure in sentence processing. In spoken speech, prosodic boundaries mark phrase structure boundaries. A pause is the most salient boundary marker among the three acoustic cues (i.e. lengthening before the boundary, f 0 , and pause). Pause insertion is a brief moment of silence placed into spoken sentences and can be manipulated in experimental conditions. In studies on garden-path sentences, silent pauses are often inserted in the ambiguous parts, either congruent or incongruent with the phrase structure (e.g. Buxó-Lugo & Watson, 2016). A pause manifests itself in the syntactic parsing of a sentence in a natural context by segmenting sentences into chunks grammatically and/or semantically. In other words, a pause plays a direct role in parsing a sentence syntactically into sections (Hua, 1998). Pauses are so important to understanding grammatical structure that listeners habitually come to parse sentences by relying on the pauses in a specific context (Blau, 1990;Jacobs et al., 1988;Quirk et al., 1985;Voss, 1984). As found in the studies by Bögels et al. (2013), Maxfield et al. (2009), andPauker et al. (2011), pauses facilitate sentence processing when they are consistent with grammatical structure and hinder processing when they are inconsistent. What is not yet known, however, is whether these pauses will have differential effects on processing that is sensitive to L2 proficiency.
In the present study, we use novel experimental stimuli among two groups of learners with significantly different L2 proficiency levels. We conducted two judgment experiments to examine sentence comprehension using basic sentences as experimental materials rather than cognitively demanding sentences, such as garden-path sentences, ambiguous sentences, or sentences with grammatical violations (Witzel et al., 2012). In the experiments, pauses were inserted either at phrase boundaries or non-phrase boundaries. We expect that the experimental manipulation of pause placement will result in the following.
• • When inserting pauses at phrase boundaries (i.e. consistent with the syntactic phrase structure), sentence processing is faster because processing efficiency is improved by representing the syntax structure. • • When inserting pauses at non-phrase boundaries (i.e. inconsistent with the syntactic phrase structure), sentence processing is slower because there is lower processing efficiency due to syntactic violations. • • In line with Dekydtspotter et al. (2008), the less-proficient learners will process L2 sentences faster when pauses obey the principles of structure, whereas the more-proficient learners will process sentences faster when pauses obey prosodic segmentation.

Experiment 1
Participants. Thirty-three Chinese (13 males, 20 females) learners of L2 English were recruited from two first-year high-school classes in the Guangzhou area. All participants were between 16 to 17 years old (Mean = 16.35; SD = 0.48), were learning English in formal classrooms, and had no immersive English learning experience. Based on their performance on the Senior High School Entrance Examination (SHSEE), their English proficiency was estimated to be at the A2 level of the common European framework of reference (CEFR) for languages with the reference to New English Curriculum distributed by the Chinese Ministry of Education (2018). More specifically, their English proficiency was homogeneous, with an average SHSEE score of 104.98 (ranging from 97 to 111; SD = 3.59). They had a vocabulary size of approximately 1500 words and had learned fundamental English sentence structures. All participants received a small gift for taking part in the study. They reported being neurologically and psychiatrically healthy, with basic computer skills.
Materials. We include non-cognitively demanding sentences because, based on recent neurocognitive research (e.g. Ding et al., 2016;Sheng et al., 2019), the linear array of words forming a sentence embodies the syntactic organization upon which they are hierarchically constructed. Ding et al. (2016) found that phrase boundaries produced significantly larger EEG amplitude than inner chunks, indicating that phrase boundaries work as an external representation of syntactic structure (i.e. the structure is syntactically violated when its boundary is broken). These results imply that using non-cognitively demanding sentences to explore syntactic processing mechanisms is beneficial and may help to avoid incomplete results, reduce cognitive load, and lower the sensitivity generated by differences in the L1 and L2. The experimental sentences were taken from New Concept English: Book 2. The selection criteria were such that the sentences were on the same topics as those in the Chinese national textbook but with different contents in order to keep the participants' interest and avoid unfamiliarity. The sentences consisted of only grammatical structures learned by the participants. The words in the sentences were among the 1000 most frequent words in English according to the New Curriculum Standard (English) and the difficulty of the hierarchical structure was the same across all sentences (z = 1.96).
There were 30 sentences identified to be included in the study. A pilot test was conducted on a separate group of high-school students who had SHSEE scores similar to the experimental participants, were the same age, and had similar educational backgrounds. Sentences that elicited an error rate of 25% (n = 3) or more were excluded from the experimental procedures. This left 27 sentences in the experiment: six simple sentences, three compound sentences, six sentences with objective clauses, three sentences with relative clauses, five sentences with attributive clauses, and four emphasis sentences. The variety of syntactic structures was to increase the likelihood that the sentences would have a variety of matching types between prosodic boundaries and syntax structures. Table 1 displays some characteristics of the sentences.
The experimental sentences were then translated into Chinese by one of the researchers and the translations were checked by a second bilingual to ensure accuracy. A word in the middle part of the Chinese translation of an English sentence was replaced by a grammatically identical but semantically varied word to create a Chinese sentence that had a different meaning than its English counterpart. These were used as the false translations in the judgment task. For example, in the task, the auditory stimulus 'Jason was able to make use of / the great number of new words / than ever before' was followed by a visual Chinese sentence that either had an identical meaning, such as '约翰逊能够前所未有地使用大量的新词汇' or different meaning, as in '约翰逊能够前所未 有地使用大量的旧词汇.' In the Chinese sentence with different meaning, the counterpart of 'new' is not '新' but rather '旧,' which is opposite to '新' in meaning. Forty percent of the Chinese translation sentences were of false translation.
All of the sentences were produced by a male native speaker of English (Received Pronunciation) with normal speech speed (192 words/minute) in accordance with the requirement of the English New Curriculum Standard for Senior One students (Chen Lin, 2005) and recorded in a soundproof chamber using a cooleditor pro. 2.0 software, at a sampling rate of 44 kHz. These auditory sentences were then manipulated such that two pause types-pauses between phrases (PBPs) and pauses within phrases (PWPs)-were inserted in the sentences. A phrase here referred to a compounding of a modifier and a head, like 'new words,' or a formula phrase like 'make use of.' For PBPs, 300 ms pauses (Liu, 2007) were inserted at a phrasal syntactic boundary and, for PWPs, 300 ms pauses were inserted outside a syntactic boundary. The sentences were played to the participants aloud. An example of the PBP condition can be seen in (4) and the PWP condition in (5). Note that forward slashes ( / ) denotes an inserted pause.
(4) Jason was able to make use of / the great number of new words / than ever before.
(5) Jason was able to make / use of the great number / of new words than ever before.
The PBP stimuli were randomly split into Part A1 and Part A2 and the PWP stimuli were grouped as Part B1 and Part B2. Parts A1 and B2 were then paired to create Stimuli Set I and Parts A2 and B1 were paired to make Stimuli Set II. The participants were randomly divided into two groups, with half completing the materials of Stimuli Set I and the other half completing Stimuli Set II.
Procedure. Participants were seated approximately 50 cm in front of a 21" computer monitor. The computer was a Lenovo-M800EE that was equipped with LE5 (LingJi-le5) headphones. The participants were instructed to judge, as quickly and accurately as possible, the translations as identical to or different than what they had just heard through the headphones. As illustrated in Figure 1, each trial began with a fixation cross that appeared on a white background for 1000 ms. After the 1000 ms, a visible trumpet symbol appeared on the screen while an English sentence was played through the headphones. Immediately following this, a Chinese sentence (i.e. either an identical or different translation) was visually presented using Song font type size 36 on the screen. Participants were asked to press 'j' (Identical, i.e. true) on the keyboard if the Chinese sentence had the same translation as the English one they heard or press 'f' (Varied, i.e. false) if not. In order to avoid distraction by the reading, the presentation time of the Chinese sentence was limited to 3000 ms. Three sentences not included in the actual experiment were used as practice examples to demonstrate how the experiment worked. The task was conducted using E-prime, which measured the accuracy and RTs from the appearance of the Chinese sentence until the response on the keyboard.
We chose to use a judgment task to judge a corresponding translation identical or not for two reasons. The first is that translation is a process that less-and intermediate-proficient learners undertake unconsciously (Thierry & Wu, 2007). Using a Chinese sentence to test their comprehension does not add extra load to the participants, but rather is consistent with their typical behavior. The second reason is that a translation task is most often used to test these learners' understanding of the L1 in classrooms and in examinations, which suggests that a translation task is valid in checking comprehension in the L2. Therefore, using L1 Chinese sentences to test comprehension is more natural than, for example, a grammatical judgment task or a prosodically natural judgment task. Besides these two judgment experiments, a frequently used task in comprehension checking is an end-of-listening question. The questions are presented in the same language as the stimuli. However, reading the questions in the L2 could cause more variance in the data, since the responses would be sensitive to L2 reading abilities. On the other hand, presenting the questions in the L1 will involve L1 reading skills, which are assumed to be more or less the same across all participants.
Below we report on the analyses conducted on RTs and error rates using SPSS software. For the RT data, only correct responses were included in the analyses. For accuracy, both incorrect and correct responses were analyzed. Means of RTs and accuracy for participants that were two SDs above or below their overall means or smaller than 500 ms were excluded from the analyses and treated as outliers. This data cleanup resulted in a normally distributed data set.
Results. Paired t-tests showed that learners were faster when they judged sentences in PWP conditions than in PBP conditions (see Table 2).
For accuracy, the results of the paired t-tests showed that there was no significant difference between the two conditions (see Table 3).
Although the results from Experiment 1 shed light on processing among beginning L2 learners, in Experiment 2, we tested another group of L2 learners with a significantly higher proficiency level to see if similar processing patterns would emerge. Materials. The materials were the same as in Experiment 1.
Procedure. The procedure was the same as in Experiment 1.
Results. Paired t-tests showed that these participants were faster when they judged sentences in PBP conditions compared to sentences in PWP conditions (see Table 4). For accuracy, the results of the paired t-tests showed that there was no significant difference between the PBP and PWP conditions (see Table 5).
Further analyses. Given the differential results we found between the performance of the A2 and C1 learners, we conducted additional analyses using RT data from Experiments 1 and 2 to further explore the possible effects of proficiency level. We ran a 2 × 2 analysis of variance (ANOVA) on the RTs using proficiency (A2 versus C1) as the between-group factor and pause position (PBP versus PWP) as the within-group factor. The results showed a significant main effect for language proficiency, F(1, 49) = 47.44, p < 0.01. There was also a main effect for pause position, F(1, 49) = 10.60, p < 0.05. Importantly, there was a significant interaction for language proficiency and pause position, F(1,49) = 172.48, p < 0.05, suggesting that the effects of pause position have a differential effect that is modulated by L2 proficiency level.

Discussion
Our study had two objectives: one was to explore L2 learners' ability to integrate prosodic and syntactic processing during L2 sentence comprehension. The other was to test whether this is sensitive to L2 proficiency. Experiment 1 showed that Chinese speakers with an A2 level of L2 English processed L2 sentences significantly faster when pauses were inserted at locations inconsistent with phrase boundaries compared to when pauses were at locations consistent with phrase boundaries. This is contradictory to previous studies (e.g. Nickels et al., 2013) showing a facilitative effect when the prosodic boundary is consistent with the syntactic boundary but not when the prosodic boundary conflicts with the syntax. For the beginning L2 learners in the current study, the syntactic violation of phrase structure did not interfere with their sentence processing. These results indicate that beginning L2 learners do not primarily depend on syntactic cues during L2 sentence processing. These patterns add to the growing body of literature showing effects for bilingual dominance (Chang et al., 2014), L1-L2 similarity (Flege et al., 2002), and language exposure (Love et al., 2003) (see also Lei, 2013). Speech comprehension, known for its cognitive complexity, involves multiple tasks, such as lexical identification and access, syntactic construction, and semantic organization. This implies maneuvering large quantities of cognitive resources. For these beginning learners, however, less automation of L2 processing means even more consumption of cognitive resources (Hahn & Friederici, 2001;Jiang, 2004Jiang, , 2007. 1 Due to the limited cognitive resources for processing L2, it seems as though beginning L2 learners are less sensitive to syntactic cues. Instead, they focus their attention on accessing the meaning of the words. With this lexical-semantic information, they can comprehend the utterance with the help of the L1 grammar. It is likely that the beginning L2 learners processed sentences based on their L1 (Chinese) grammar by associating the L2 content words briefly with their L1 translation equivalents (see Kroll & Stewart, 1994;Schwieter & Sunderman, 2009). Chinese lacks morphological changes and, often, words do not have the same syntactic category in English (e.g. without changing form, a word can be an adverb, verb, or noun). As such, language comprehension depends more heavily on the processing of lexical meanings rather than syntactic structure. It is likely that the A2 learners did not need to parse the sentences using prosodic information. Rather, they translated each English into its Chinese counterpart immediately after they heard it (see Wu et al., 2013, who argue that translations are automatically primed during target language processing). In other words, when they were constructing sentence structure, they were likely doing so according to Chinese syntax. It is this strategy that ensures the success of sentence comprehension and is accountable for their fluency in comprehension without suffering an incongruent effect. This is validated by the observance of no significant differences in their judgment accurateness. The A2 learners in our study were not affected by syntactic irregularities, which accounts for their high efficiency in building sentence representation without being affected by broken syntax processing.
In Experiment 2, we found the opposite result among a different group of L2 learners, this time with C1 proficiency. These individuals were significantly slower at processing L2 sentences when pauses were inserted at locations inconsistent with phrase boundaries. These results align with Nickels et al.'s (2013) findings concerning the interface between prosody and syntax. For these C1 learners, syntactic violations hindered processing, suggesting that syntactic structure plays a critical role in L2 sentence processing at higher proficiency levels. The intermediate learners exhibited syntactic processing as a priority strategy. Perhaps due to their advanced competence in phonological perception, lexical access, and semantic organization, they were able to comprehend L2 sentences using grammatical constructions to organize words into larger semantic units, reducing the influence of the L1. These results concord with those reported by Omaki andSchulz (2011), Hopp (2015), and Witzel et al. (2012).
When considering the differential findings in the two experiments, we argue that L2 sentence processing is dynamic, moving from semantic priority during early proficiency to syntactic priority later on. This explanation accounts for some disparities in prior studies given that the sentence-processing mechanisms between L2 learners and native speakers are different only in quantity but not fundamentally (Clahsen & Felser, 2006). As Hahn and Friederici (2001) found in their ERP study, beginning L2 learners (L1 Japanese) failed to show a P600 component, but the advanced L2 learners (L1 Russian) did, in which they found that the processing mechanism from the L2 learners of intermediate or advanced levels to native speakers was different (see also Rah & Adone, 2010). Here it is worthwhile to refer to Hahn's (2001) interpretation of her study, which suggested that L2 learners' processing was developmental in nature, with the competence of the item in question being a modulating variable. In Hahn's (2001) study, the author found that the violation of participle inflection induced a P600 component during sentence processing among German L2 learners, but the violation of plural inflection did not. She attributed the disparity to the complexity of the two syntactic rules (i.e. German participle inflections are easier to learn than plural inflections). Similarly, Angelovska and Hahn (2009) found that highly proficient L2 learners outperformed the native speaker counterparts both in perception of English accent as well as in spontaneous speech, regardless of how typologically distant their L2 was from the L1. As such, the authors argued that there is no profound difference between L1 and L2 processing (see Angelovska, 2012, for a qualitative study in which pause insertion was identified as a foreign accent marker when judging the speech samples from Angelovska & Hahn, 2009).
Our findings may also explain the varied results between studies that use identical stimuli. For example, Witzel et al. (2012) reported different findings than Clahsen and Felser (2006) even when using the same stimuli of ambiguous relative clauses. These opposing results are most likely caused by individual differences among the participants. In the present study, we identified that L2 proficiency is one of these modulating factors.

Conclusion
The present study revealed significant differences in how beginning and intermediate learners process L2 sentences. We argue that the processing differences between the two groups is attributed to their differential L2 proficiency, calling into question the account that L1 speakers differ from L2 learners with respect to sentence processing. It is possible, therefore, that L2 proficiency modulates whether syntax takes priority over semantics during L2 sentence comprehension. The disparities in previous findings have been vast; however, our study suggests that there may not be a fundamental difference in processing mechanisms between L2 learners and native speakers but rather they are modulated by L2 proficiency. Unique from previous studies, we tested both beginning and intermediate L2 learners.
Future research may wish to include a larger spectrum of L2 proficiency in longitudinal and unique designs that may not have originally been used in this research area. This will improve theories of L2 processing ecologically and begin to dispel doubts on the universality of specific stimuli used in previous studies. The inclusion of a control group should also be considered in future work, as this could establish a baseline to which experimental groups could be compared. Finally, researchers should endeavor to include measures of phonological awareness given its importance for the interpretation of meaning. 2 A recent study published after we conducted the current study (Schmidtke & Moro, 2020/Forthcoming) found evidence of a transition from a lexical processing strategy that is heavily reliant on phonological decoding to word-reading behavior that is more actively engaged in higher order cognitive processes, such as meaning integration. Measuring the phonological awareness should be included in the materials of future studies, including those that employ pause-insertion, as this may help to further elucidate our knowledge of the integration of prosody and syntax during auditory L2 sentence comprehension.