Speaking in a second language but thinking in the first language: Language-specific effects on memory for causation events in English and Spanish

Aims and objectives/purpose/research question: This paper’s objective is to offer new insights into the effects of language on memory for causation events in a second language (L2) context. The research was driven by the question of whether proficient L2 users acquired L2 thinking-for-speaking-and-remembering strategies along with the relevant expressions for different types of causation (intentional versus non-intentional). Design/methodology/approach: The cognitive domain of causation is an ideal platform for this investigation, since the lexicalisation of causation differs clearly in the two languages under consideration, English and Spanish. Spanish speakers always distinguish between intentional and non-intentional events through the use of different constructions. The English pattern of lexicalisation in this domain often leaves intentionality unspecified. Our methodology involves an experimental elicitation of event verbalisations and recall memory responses to video stimuli by English and Spanish monolinguals and bilinguals. Data and analysis: The analysis has shown that the Spanish monolinguals and first language (L1) Spanish/L2 English speakers always distinguished between intentional and non-intentional events, while the English monolinguals and L1 English/L2 Spanish speakers generally used expressions that were underspecified with regard to intentionality. Findings/conclusions: All populations used their habitual language patterns as an aid to memory. Spanish monolingual had better recall than their English peers. L2 speakers were mainly relying on the L1 in spite of speaking only the L2 during the experiment. Originality: Possible effects of these typological differences between an L1 and an L2 on speaker recall memory have not been investigated before. Significance/implications: The research presented in this paper informs the theoretical assumptions related to the thinking-for-speaking hypothesis by showing empirically that late bilinguals adhere to their L1 patterns as an aid to memory while speaking in their L2. This novel finding contributes to an improved understanding of language processing and language use among late bilinguals.


Introduction
The goal of this study is to investigate whether a second language (L2) can have consequences for the speaker's memory of events in a cognitive domain that involves typological contrasts between a first language (L1) and an L2. Our focus is on language and cognition on-line, that is, thinkingfor-speaking and remembering when language is explicitly used and when access to it is not prevented in any way. A number of recent studies in psycholinguistics have found some language effects on-line in experimental tasks that require active use of language and that do not impede access to it during task performance. For instance, Lucy and Gaskins (2003) showed that the classification of objects in an experiment was driven by the habitual language-engendered preferences of different languages. Levinson (2003) demonstrated that similar language-specific preferences are detected in spatial orientation, where language-specific frames of reference are used to guide navigation. It was Slobin (1996Slobin ( , 1997, however, who originally proposed the thinking-for-speaking hypothesis to account for the variety of situations where language effects on conceptualisation are most likely to occur, and this hypothesis best captures the on-line effects of language on cognition that have been found. Specifically, as Slobin (1996Slobin ( , 1997Slobin ( , 2000Slobin ( , 2003Slobin ( , 2006 explained, when we think for the purpose of speaking, writing, translating and also remembering, we use language to guide us in this process. Consequently, the language we speak affects our thinking that happens while we are engaged in language-driven activities. These effects are then limited to on-line processes and may not be present or relevant off-line, when language is not actually used (or when its use is disabled; see, e.g., Trueswell & Papafragou, 2010). This hypothesis is different from the classic linguistic relativity hypothesis that was derived from the work of Sapir and Whorf and that was seen as advocating a language-dependent world view that may be held regardless of whether language is explicitly used or not. In this study we are concerned only with occasions when language is being actively used, either overtly or covertly.
A significant number of studies have documented that thinking-for-speaking is indeed languagespecific (see, e.g., Slobin, 2006, for an overview), and numerous factors have been put forward in order to specify the exact occasions when its effects are observed, and when they do not appear. We argue here that language effects are more likely to appear when experimental stimuli are complex, because the integration of complex information presented in the stimuli may be more susceptible to language-specific lexicalisation resources that would be activated as an aid to memory (as indicated in previous research; see, e.g., Fausey & Boroditsky, 2011; see also Trueswell & Papafragou, 2010). Filipović (2011) has already provided evidence for language-specific effects on recognition memory for complex motion events in both monolingual and bilingual speakers. Another study (Filipović, 2013a) showed how memory for causation events can be influenced by the language one speaks in monolingual populations.
In the current study we are probing the effects of an L2 on recall memory for causation events. We assume that our bilingual participants (learners of L2 English and L2 Spanish, respectively) will have access to both their L1 and L2, even though they are asked to verbalise only in their L2 throughout the experiment. We accept Grosjean's (2001) assumption that both languages in bilinguals are always active, albeit to a different extent in different circumstances, that is, our participants are always in a bilingual mode. We believe that this is especially true in the case of L2 speakers who are very late bilinguals and who are less likely to be in a fully monolingual mode when speaking an L2. To ensure that both languages are active in our participants, we give the experimental instructions in the L1s while performance throughout the experiment is carried out in the L2s.
It is important to mention that this interdisciplinary research area involving language and thought has been much more focused hitherto on monolingual speakers, although some recent studies have included bilingual speakers of different kinds (e.g. Athanasopoulous et al., 2015;Kousta, Vinson, & Vigliocco, 2008;Lai, Garrido Rodriguez, & Narasimhan, 2014). The field has begun to appreciate the importance of bilingual empirical and experimental data for the purpose of linguistic relativity research and, more generally, for research on the interaction between language and thought and language processing in general (e.g. see Filipović, 2011Filipović, , 2013aJarvis & Pavlenko, 2008;Pavlenko, 2014). Nevertheless, most of the previous studies dealing with both monolingual and bilingual populations have dealt with categorisation and similarity judgements, while hardly any dealt with memory. The central contribution of the present study lies in the fact that it provides both bilingual lexicalisation data and an empirical probe for recall memory effects.

General studies of language-specific effects
The question of when there are and when there are not language-specific effects has been approached in many different ways, theoretically, methodologically and empirically. This variability in approach is partly the reason why the answers to the questions of whether language influences thought have varied substantially. Another reason why research into language and thought has often produced conflicting results is the fact that language is not a uniform, static phenomenon, but rather a multifaceted, bio-social construct, a system that is complex and adaptive (see Filipović, 2014;Filipović & Hawkins, 2013). Consequently, it is not hard to envisage that, under different circumstances of use, we may elicit different outcomes (see, in particular, Athanasopoulous et al., 2015, on the malleability of language effects). For instance, different experimental set-ups have been employed in the past, some actively and explicitly involving language (e.g. Malt, Sloman, & Gennari, 2003), others only allowing its tacit presence (Filipović, 2010a(Filipović, , 2010b, and then there have been those that blocked access to habitual language use via verbal interference (e.g. Trueswell & Papafragou, 2010) or via parallel task distraction (Filipović & Geva, 2012). The experimental tasks themselves have also been very varied, including similarity judgements using triads (Malt et al., 2003), categorisation (Lucy & Gaskins, 2003) or recognition from memory (Filipović, 2011). It is hardly surprising that there is no overall agreement as to when we can expect effects of language on other cognitive functions (see Bylund & Athanasopoulos, 2014, for a recent detailed overview). What we do know is that language is closely connected with other cognitive functions, such as memory, as well as nonverbal cognitive systems, for example the sensomotor system (see Pulvermuller, 2005). By the same token, many recent psycholinguistic studies have converged on the idea that both universal and language-specific factors are involved in the perception and in the linguistic categorisation in the domains of colour (e.g. Regier & Kay, 2009), space (Landau, 2010) and motion (Filipović, 2010b). For instance, Regier and Kay (2009) state that "Whorf was only half right", since they have discovered that language influences colour perception in only half of the visual field. Namely, language affects colour perception primarily in the right visual field and they hypothesise that this is probably due to the activation of language regions of the left hemisphere. Furthermore, colour naming seems to reflect "both universal and local determinants" (Regier & Kay, 2009, p. 7). Landau (2010) argues that languages have a role in enriching our sensorial representations, even though spatial language seems to depend on our universal and pre-linguistic experience. Filipović (2010b) showed that universally perceivable (but not necessarily lexicalised) subcomponents of the manner of motion, such as pace, rhythm or step-size, can guide recognition memory rather than linguistic expressions.
The language-specific lexicalisation patterns that do seem to elicit language-specific effects are found in experiments that include visually and cognitively enhanced processing load, achieved by the introduction of complex tasks (e.g. when more than one manner of motion per each stimuli is presented; see further discussion on complex tasks in the Stimuli section). By the same token, Trueswell and Papafragou (2010) show that language can be "optionally recruited for encoding events, especially under conditions of high cognitive load". They compared native speakers of two languages, English and Greek, whose respective languages have different means for lexicalising motion events. In one of the experiments the authors established, using eye movement recording, event encoding was made difficult because of a concurrent non-linguistic task of tapping and "participants spent extra time studying what their language treats as the details of the event" (Trueswell & Papafragou, 2010, p. 64). Resorting to linguistic encoding for task performance was eliminated when there was no concurrent task, which made the event encoding easier and also when the concurrent task involved the use of language (counting aloud). They concluded that language effects were "malleable and flexible", but that they "do not appear to shape core biases in event perception and memory".
A number of factors seem to play a role when it comes to whether we may have language-specific effects on cognitive tasks, such as categorisation or memory. It seems that the nature of the task itself (simple versus complex) and involvement of verbalisation (explicit, implicit, blocked) are of key importance in this context. What previous research has achieved is to clarify when we may or may not expect language effects and how to determine the source of the differences between the two outcomes. For instance, we can understand that the reason why some studies did not elicit language effects may not be because there are none to elicit, but rather because language is more actively recruited for tasks if the tasks are more complex. The domain of causation that we have chosen for the current study is an ideal testing ground because causation events are inherently complex; Fausey and Boroditsky (2011, p. 155) explain why this is so: Observers must integrate information about the basic physics of the event (e.g., whether the person touched the balloon, whether the balloon popped, whether he touched it right before it popped) with more social cues about the individual's state of knowledge and intentions (e.g., whether he meant to touch the balloon, whether he knew the balloon was there, whether he was surprised at the outcome). The need to integrate many different types of information to construe an event may leave some events especially susceptible to linguistic and cultural influences.

Studies on second language acquisition
A number of studies in L2 acquisition and late bilingualism have emphasised the possibility that learning an L2 may affect the on-line verbalisation-driven conceptualisation system, that is, the thinking-for-speaking mechanism as defined and described by Slobin (see, e.g., Cadierno, 2010;Cifuentes Feréz & Rojo, 2015;Ellis & Cadierno, 2009;Filipović, 2011;Hijazo-Gascón, 2015). For instance, Cadierno (2010) found strong L1 lexicalisation effects on L2 expressions in a study that contrasted L1 Russian, German and Spanish learners of L2 Danish. Further, a study by Hasko (2010) on L2 acquisition of Russian by L1 speakers of English showed that L2 on-line performance provides evidence that surface structures "mediate our thinking in a non-trivial way" (Hasko, 2010, p. 57). In the domain of motion events, the differences in the lexicalisation patterns between English and Russian are such that they prevent L2 learners from developing an L2-based thinking-for-speaking pattern, which requires attending to and verbalising different conceptual categories from the ones present in their L1. Hasko concludes that the acquisition of an L2 needs to include not only the internalisation of broad grammatical and lexical items, but also an adaptation to the "new ways of attending to, and think-for-speaking about, conceptual domains that may be encoded differently in their L1 and L2" (Hasko, 2010, p. 57).
Some evidence suggests that thinking-for-speaking is not fixed, firm and static. It can change (see, e.g., Han, 2008;Stam, 2010), but not all aspects of it change in the same way and not for all learners equally -there may be a great variability in how this process of change unfolds (see also Han, 2010, on fossilisation of L1 thinking-for-speaking; also Bylund & Jarvis, 2011, on L2 effects on L1 event conceptualisation). The factors involved in the change will be multiple, starting from the learning environment and frequency of use of the languages involved, as well as some general principles of L2 processing and use (see Filipović & Hawkins, 2013, for a proposed multiple factor model of L2 acquisition). Specifically, in the domain of causation, Hendriks, Hickman, and Demagny (2008) detected an increasing attempt to produce target-like structures that nonetheless remained source-like, regardless of the different proficiency levels tested. The conclusion they draw is that learners mastering an L2 may require some reconceptualisation of spatial information (or in other words, re-thinking for speaking about space). Overall, L2 speakers have been shown to be significantly affected by their L1 language-specific patterns of lexicalisation during task performance (L1 transfer, see Cadierno, 2010), but then again there are also studies that reported an absence of any language-effects and only universal constraints on perception (e.g. Coventry, Valdés, & Guijarro-Fuentes, 2010).
Studies in late adult bilingualism have shown that different factors create different effects on task performance, and in different circumstances different factors have been shown to be the strongest, for example, L1 transfer in categorisation (Cadierno, 2010), language of the environment in syntactic attachment preferences (Dussias, 2001), language of operation during task performance (Athanasopolous et al., 2015) and universal ("atomic") perceptual features in recognition memory (Filipović, 2010b). It would involve a lengthy digression to discuss these and other relevant studies in detail in this context and, in any case, they do not bear direct relevance to our current investigation into L2 effects on memory. Almost all of the previous studies involve research on categorisation or similarity judgements and not on memory (with a few exceptions, such as Filipović, 2011, which is a recognition memory study). Moreover, we cannot fully compare these previous studies with the present one, since they are methodologically very different from ours. We are not aware of any prior bilingual recall memory study in this area, even though there is a substantial body of work in the area of bilingual processing and memory storage (see a recent overview by Filipović, 2014).

On causation in language and memory: The outline of the research domain
Causation is a ubiquitous and existentially primary cognitive domain. Speakers talk about the causes and effects of actions on a daily basis, regardless of the language they speak. Languages vary with respect to how they divide the continuum of possible meanings related to causation. There can also be more than one option for describing a causation event even within a single language. For example, in English one and the same event can be described as Jill broke the vase, The vase was/got broken or The vase broke, depending on how much information we know or want to reveal about the event and how much agency we observed or felt was involved (see Fausey & Boroditsky, 2010, for an insightful study of the effects of these differences on jury judgement).
Overall, however, English and Spanish have different resources when it comes to the expression of events in this domain, as illustrated in Table 1 (adapted from Gibbons, 2003).
Previous studies have found language-specific effects on both recognition memory (Fausey & Boroditsky, 2011) and recall memory (Filipović, 2013a) for causation in monolingual speakers. 1 Fausey and Boroditsky (2011) tested memory for agents in English and Spanish native speakers. English does not regularly make an explicit distinction between intentional and non-intentional events, while Spanish does so by using two distinct constructions. The lexicalisations of nonintentional effects in Spanish do not contain information about agents due to the extensive preferred use of impersonal constructions in that language. What Fausey and Boroditsky (2011) have shown is that, as a result of this difference in the grammar and habitual lexicalisation pattern between the two languages, speakers of English may have better memory than Spanish speakers when it comes to non-intentional event participants. This is due to the fact that English can use a transitive construction with a subject and an object regardless of the intentionality of the subject's action on the object (e.g. I broke a glass; Table 1, example (2a) could refer to either an intentional or a non-intentional event). This is in contrast with their Spanish peers, who use transitive constructions for intentional events (Table 1, example (1a)) and the impersonal constructions for the non-intentional ones (Table 1, examples (1d) and (1e)). We have to note that English also has a possibility corresponding to the one in Spanish in its affective dative (or dative of interest) construction, which does provide a clear and unambiguous reference to an unintentional event, such as The glass broke on me (comparable to the construction in (1d) in Spanish). This construction does exist in English but its frequency appears to be extremely low, especially in comparison to the frequency of the corresponding construction in Spanish, which is "undeniably part of the genius of Spanish" (Pountain, 2003, p. 116). Crucially, not a single English native speaker has ever used this construction to verbalise unintentional causation in an experimental setting (Filipović, 2013a). Therefore, we can conclude that the distinction of intentional versus unintentional is not as consistently drawn in English as it is in Spanish, on either the lexical or the constructional level.
Further language effects on memory for causation have been shown in a different study by Filipović (2013a), where monolingual speakers of English and Spanish were asked to verbalise and later recall the nature of the witnessed events themselves (as either accidental or intentional) rather than focusing on event participants, as in the Fausey and Boroditsky's (2011) study. In English, the relevant verbs and constructions do not specifically express unambiguous intentionality, as we mentioned, for example, The man dropped the glass (on purpose or not?). The Spanish constructions differentiate between the intentional meaning (El hombre botó el vaso, which means The man threw the glass) and the non-intentional meaning (Se le cayó el vaso al hombre, which means To the man it so happened that the glass fell; see also the examples (1a) and (1d) in Table  1). This persistent linguistic attention to intentionality was shown to be an advantage to Spanish speakers in the recall memory task of Filipović (2013a). In fact, their recall memory with regard to whether an action was intentional or not was significantly better than that of their English peers (Filipović, 2013a).

Participants
The participants for this study were English learners of Spanish (n = 28) and Spanish learners of English (n = 27). We also included a control group of monolingual English and Spanish speakers, all university students (n = 20 for each language). The English monolingual and both bilingual speaker groups were tested in the UK at the University of East Anglia, while the monolingual Spanish speakers were tested in Spain at the University of Zaragoza. Both L2 speaker groups learned their respective L2s at their local universities, and had an equal amount of L2 immersion experience (approximately 6 months on average). All the L2 learners were tested in their L2. The mean age of the monolingual participants for English was the same for both populations, 21.0 years of age. The English L2 learners (mean age 20.5) were Spanish Erasmus students approximately half way through their year abroad at the University of East Anglia at the time of the experiment. The Spanish L2 learners were all English L1 speakers (mean age 21.5) and students of the Spanish language in their final year of study at the University of East Anglia, who had recently spent approximately six months in a Spanish-speaking country during their study abroad period. The proficiency in the L2 for both groups was controlled based on the grades in the respective L2 language subjects in their studies and the information they provided in the questionnaires prior to their participation (comprising details of age, gender, years of learning L2 and average grades for L2 language exams). In terms of the well-known Common European Framework of Reference (CEFR) descriptors for levels of L2 proficiency, the participants were classified into two groups (B1-level, equivalent of the UK 2.2 and low 2.1 class grades and B2-level, equivalent to the UK high 2.1 or 1st class).

Stimuli
The stimuli used comprised video clips filmed with a Sony DCR-HC18E digital video camera and the experiment was run on a portable PC laptop using Microsoft Office PowerPoint. Each target video clip contained an event with either an intentional or a non-intentional causation event (e.g. a girl popping a balloon on purpose versus a girl playing with a balloon, which popped accidentally and this clearly surprised her). There were 10 target videos and 10 filler videos. The filler videos depicted non-causation events, for example, a man drinking coffee or a woman reading a book. All the target videos were matched for action type (e.g. both intentional and non-intentional breaking events were witnessed by all participants). All the videos were pilot-tested by two native speakers of each language in order to ensure that they were uniformly judged as either intentional or nonintentional. The target videos (intentional and non-intentional actions) are given in Table 2 and the still shots are given in Figure 1.

Procedure
The participants were shown three video clips for habituation, one containing an intentional causation event, one a non-intentional causation event and one a non-causational event (e.g. an agentive motion, such as running). They were told that they would be watching videos of the duration between 6 and 9 seconds, depicting various actions and that their task was to verbalise what had happened after each video clip (verbalisation stage). After watching the videos (which were randomised across the participants in order to avoid recency effects) and verbalising what was seen, a distractor task of 120 seconds in the form of a 10 × 10 grid of randomised letters was shown on the screen, and the participants were asked to count how many letters M, N and Z they could see.  After the distractor task, the participants were asked to recall the events depicted in the videos by answering two questions about the witnessed events. The questions were unbiased with respect to intentionality (e.g. Did you see a girl with a blue balloon? Was what happened in that video accidental or on purpose?). They were asked to mark their answers on an answer sheet by circling YES or NO depending on whether, to the best of their recollection, the event in the stimuli contained an intentional or a non-intentional action, respectively. They were also told that they should not guess and that they should leave a question unanswered if they were not able to recall the relevant information.
Responses were classified as incorrect if the participants circled the wrong answer (i.e. intentional instead of non-intentional) or if they failed to give any response (i.e. they left both options unmarked because they could not recall the crucial piece of information). A follow-up check was carried out whereby each participant saw just the target events again after they had completed the experiment, and all the participants were asked to state for each of the target clips whether the causation event depicted was clearly intentional or non-intentional. The agreement regarding the adequacy of event classification in the stimuli was high for both participant groups (98% for English L1 and 99% for Spanish L1 speakers).

Hypotheses
In line with the thinking-for-speaking hypothesis (Slobin, 1996(Slobin, , 1997(Slobin, , 2003(Slobin, , 2006, our central assumption is that the speakers' memory for causation events will be influenced by the languagespecific patterns of the language they use to verbalise the stimuli (the L1 for monolinguals and the L2 for bilingual participants).
We hypothesise that the Spanish monolingual speakers will have a better performance on the recall of non-intentional events in particular, since this is where Spanish affords subtler distinctions than English. This assumption is motivated by the following reasons. Intentional events can be described using the same means in both languages (simple transitive constructions) and it has been shown before that the difference in both verbalisation and memory between English and Spanish participants pertains to the non-intentional event stimuli in particular (Fausey & Boroditsky, 2011;Filipović, 2013a). A further reason could be due to the fact that even though verbs in English are unspecified with regard to intentionality (e.g. Bill pushed George; Joe dropped the bag; etc.), the default meaning is generally the agentive intentional one and nonintentionality must generally be explicitly indicated outside of the verb, for example, by using an adverb (Bill pushed George by accident; Joe dropped the bag accidentally; etc.). Hopper and Thompson (1980) made the relevant point that prototypical transitivity does indeed involve intentionality. A transitive agent is normally an intentional instigator or a causer of the action (e.g. I threw the ball). In a less prototypical transitive sentence in English, the subject can be a causer of the event without being the intentional or instigating agent (e.g. as in I dropped the glass). This has been shown to be the case in an experimental setting whereby intentional events were verbalised by English speakers without using additional adverbials (e.g. the adverbial expression on purpose could have been used in addition to the expression X pushed Y in cases where X pushed Y with intention and yet it never actually got used). Non-intentionality, on the other hand, when the English speakers chose to indicate it explicitly, was generally signalled by using additional adverbials, such as accidentally or inadvertently (Filipović, 2013a). Thus, we can conclude that the transitive SVO construction in both English and Spanish is a typical lexicalisation pattern for intentional events. Previous research (Filipović, 2013a) has also shown that most of the time there was no explicit specification of intentionality in English and that the expressions tended to be left ambiguous (e.g. The woman pushed the bottle off the table). We assume that adverbial specifications will not be given frequently, since if an element is not obligatory (e.g. an adverb) it is more likely to be left out. We have also noticed in our previous research (Filipović, 2013a) that both constructions He broke the glass and The glass broke (Table 1, examples (2a) and (2c)) are used in English to refer to either intentional or non-intentional events in the experimental stimuli. By contrast, the clearly distinct constructions in Spanish (Table 1, examples (1a) and (1d)) are habitually used to signal clearly when an event is intentional and when it is not. Finally, we may have to allow for the possibility that the mere nature of the stimuli is positively skewed towards the salience of intentionality. What we mean by this is that intentional actions in general may be easier to remember regardless of language, since the depiction of intention is made obvious (e.g. in our stimuli, a girl approaches the bed on which a Barbie doll is positioned and pushes it off the bed with a clear intention). On the other hand, non-intentional events have an element of surprise and unexpectedness, which may be harder to record and recall exactly, especially if the focus of attention on this specific event feature is not aided by language.
For our L2 learner groups, we could expect their recall to be influenced by the language they use to verbalise the stimuli, that is, their respective L2s, whereby L2 Spanish speakers should demonstrate better recall with regard to the intentionality distinctions than the L2 English speakers if they are thinking-for-speaking in their respective L2s. However, that may be too simplistic to hypothesise for a number of reasons. Our L2 learners will stay in the bilingual mode throughout the experiment since, as pointed out on numerous occasions (Grosjean, 2001;Grosjean & Soares, 1986;Soares & Grosjean, 1984), it would be impossible to completely switch off the L1, especially in late L2 learners. We have to assume that the L1s of our learner groups, in which their proficiency is higher, will also stay active. Therefore, we cannot exclude the possibility that our participants' performance will be affected by factors other than just their L2 pattern; access to the L1 and some universal perceptually salient event features that are not dependent on language may also play a role. If the participants behave in their L2 as in their respective L1 monolingual control groups, then we may conclude that their memory is driven by their L1 regardless of the verbalisation in their L2. If this is the case, we would need to conclude that thinking in the L1 persists even when speaking in the L2. If neither L1 nor L2 effects are detected, then perhaps some universal salience of events themselves may be a stronger factor in recall, on this occasion, than language.
To summarise, if language of operation was the strongest factor, we could see the L2 learners of English and Spanish respectively behaving like the monolingual counterparts of their L2s, respectively. On the other hand, if the L2 was not the strongest mechanism of operation for lexicalisation and memory then we would be able to detect which other factor (e.g. L1 transfer or universal perceptual salience; see Filipović 2010a, 2010b) may underlie the participants' recall after verbalisation in an L2.

Statistical comparisons
We performed a statistical analysis using a two-way analysis of variance (ANOVA), whereby the two independent variables were mother tongue (English versus Spanish) and language used for verbalisation (L1 versus L2), and the dependent variable was recall memory for intentionality. The test revealed a significant effect of both independent variables (p < .05). We subsequently performed a post-hoc test in order to tease out the exact points at which these effects are located. The post-hoc Bonferroni procedure was used to perform pairwise comparisons between the means for incorrect recalls for all four groups (two monolingual and two learner populations). Significant differences were found between both the two monolingual groups (mean difference 0.018; SE = 0.4) and between the two learner populations (L1 English/L2 Spanish and L1 Spanish/L2 English; mean difference 0.039; SE = 0.37). There was no significant difference in performance when the participants were grouped based on their L1. Similarly, the difference in the class of grades within the learner populations (UK 1st/high 2.1 versus UK low 2.1/2.2) was also not significant (p > .05).
We also carried out an item analysis using a one-way ANOVA in order to detect whether any specific item in the stimuli created specific verbalisation or recall problems for our participants. The English and Spanish monolingual speakers did not differ in the number of correct answers for intentional events (F (1,5) = 0.74; p > .05), but for the non-intentional events, the Spanish monolinguals made significantly more correct recalls than their English counterparts (F (1,17) = 10.05; p < .05). By comparison, there was also no difference between the two L2 learner groups in their memory for intentional events (p = .69, SD = 0.27). For the non-intentional events, however, there was a significant difference between these two populations. English learners of Spanish had incorrect memory recalls in 39% of the cases, while the Spanish learners of English had only 14% incorrect recalls (see Figure 2). This difference was statistically significant (p = .017, SD = 0.47). Out of the five non-intentional target events depicted in the stimuli (see Table 2), the mean value for incorrect recall per participant was 1.98 for L1 English/L2 Spanish speakers and 0.69 for their L1 Spanish/L2 English counterparts.

Verbalisation responses
Monolingual speakers of both languages used similar agentive constructions to describe intentional (voluntary) actions (e.g. The girl pushed the doll off the bed), but they differed in the descriptions of the non-intentional (accidental) actions, as evident in the examples below 2 : INTENTIONAL ACTIONS (no difference in lexicalisation between English and Spanish) (3a) The woman dropped the magazine on the floor. (3b) La mujer botó la revista al suelo.
"The woman threw the magazine onto the floor" Examples (5b) and (6b) reveal that the Spanish monolingual speakers tend to provide explicit information about the action being non-intentional, while the English descriptions do not specify this piece of information.
With regard to the L2 learners, we noticed that both groups used grammatically acceptable L2 expressions, but the Spanish learners of English followed the L1 thinking-for-speaking, always drawing explicit distinctions between the intentional and non-intentional meanings even though their L2 English does not formally require it. The two L2 groups did not show coding differences in the case of intentional actions, but the non-intentional event lexicalisations were more explicit, detailed and specific when Spanish was the L1 and English the L2, while the English L1 learners of L2 Spanish used constructions indeterminate with respect to intentionality, very much in line with their L1 pattern.
English learners of L2 Spanish tended to use acceptable grammatical structures (88% of all descriptions), but they used them indiscriminately with respect to the kind of causation involved. For instance, they used the impersonal constructions se cayó, se rompió for both intentional and non-intentional events. This shows that they had acquired the correct forms but were not accurate when it came to how and when these constructions were to be used in Spanish, namely for nonintentional actions only. This tendency is in line with their recall memory performance, which shows that the L2 thinking-for-speaking patterns were not adopted by the L1 English/L2 Spanish group (see the discussion in the Discussion section). This group also showed avoidance of the relevant impersonal constructions and preferred the active SV sentences instead (such as La botella cae, meaning The bottle falls) very much in line with the pattern of their L1; see example (2c) Table 1). On the other hand, we see that the coding of L1 Spanish learners of L2 English reflects their heightened and explicit attention to intentionality even in an L2 that does not formally require it. They coded the stimuli 97% correctly with regard to the grammatical pattern of the English language (occasionally making errors due to L1 transfer -e.g. see example (7b) in Table 3: the verb exploder (to explode) is transitive in Spanish but not in English).

Discussion
Our results show that recall memory can be affected in a language-specific way. Monolingual speaker groups confirmed earlier results reported in the context of memory for causation events (Fausey & Boroditsky, 2011;Filipović, 2011). Our control group of native monolingual speakers of English used structures ambiguous with regard to intentionality and without any adverbial specification in 83% of cases, indiscriminately referring to both intentional and non-intentional acts with the same expressions (i.e. structures such as The girl popped the balloon or The balloon popped). By contrast, the Spanish monolingual speakers used SVO constructions such as La muchacha rompe el globo (The girl broke the balloon) only when the action was clearly intentional. The non-intentional actions were consistently lexicalised with se + affective dative constructions in Spanish (as in example 6b in the Statistical comparisons section). We can infer, based on these verbalisation data, that the two native speaker populations show a clear distinction with respect to the two respective lexicalisation tendencies: the English -ambiguous with respect to intentionality, and the Spanish -using clearly unambiguous structures for intentional versus non-intentional meanings. English speakers do have a potential option that can be used for the purpose of drawing distinctions between intentional versus non-intentional events (as in He broke the vase versus The vase broke), but this potentially contrastive option is not exercised in the same consistent manner in which the Spanish speakers use the distinguishing options available to them. Expression of agents as subjects regardless of intentionality in a SVO structure is a strong feature of English, as Fausey and Borditsky (2011) showed, just as the use of affective dative construction with se in Spanish for non-intentional event descriptions is adhered to with remarkable consistency.
With regard to the L2 learners, we can say that their memory recall is mostly informed by their L1-entrenched preferences even when they speak in an L2. Our Spanish learners of English performed significantly better than the English learners of Spanish on the recall memory task (see Figure 2). This indicates that the habitual L1 linguistic focus on intentionality persisted in the L1 Spanish learners of English even when they were asked to verbalise only in the L2 English where the relevant intentionality distinctions are not habitually lexicalised. We are also able to conclude that even though L1 English learners of Spanish have acquired enough relevant grammatical knowledge in the L2 Spanish, they did not acquire the relevant awareness of the event feature of intentionality that native speakers of Spanish habitually attend to in language (and, consequently, in memory). Furthermore, L2 Spanish speakers, unlike the Spanish native speakers, used the relevant se-constructions indiscriminately for both intentional and non-intentional actions, while their meaning in Spanish is clearly non-intentional. They were also showing the tendency to avoid the use of the relevant se-constructions and were using the regular SV or SVO structures when the seconstruction would have been the most appropriate, such as La botella cae (The bottle falls) or La muchacha rompe el globo (The girl breaks a balloon); see also the Appendix for more details.
A detailed analysis of the verbalisation patterns used also reveals that learners resort to using constructions that are acceptable in both the L1 and L2 even though this use does not quite mimic the native speaker use (see the Statistical comparisons section for the verbalised responses analysis and also the Appendix). This tendency has already been detected in bilinguals (see Filipović, 2011;Lai et al., 2013;Nicol, Teller, & Greth 2001). In other words, L2 learners tend to lexicalise the target events in ways that are acceptable in both their languages (L1 and L2), but their lexicalisation preferences in the L2 are different from the respective monolingual lexicalisation patterns observed (see also Cunningham, Vaid, & Chen, 2011, for a related observation). Namely, L1 Spanish/L2 English speakers use grammatical structures in their L2 but they tend to add information about intentionality in adverbial expressions, unlike the native monolingual speaker populations. This is because their L1 requires this kind of semantic specificity and they provide it in the L2 English even though this strategy is not characteristic of the English lexicalisation pattern. L1 English/L2 Spanish speakers also acquired the relevant grammatical structures but not their appropriate contexts of use. We may therefore conclude that full form-to-meaning mapping and understanding of patterns of use for the learners of Spanish as an L2 has not yet occurred and that raising explicit awareness about these and other relevant typological differences in lexicalisation patterns, as well as their potential consequences, should be one of the fundamental goals of L2 instruction. 3

Conclusion
In this study, our main goal was to check whether proficient L2 speakers of English and Spanish, respectively, have also learned to re-think for speaking through the acquisition of the relevant L2 constructions and lexicalisation preferences. We did not include experimental conditions without verbalisation, or with actively blocked verbalisation, since our main aim was not to explore what happens when language is not accessible. We were interested precisely in what happens under normal, everyday circumstances when language is indeed participating in our cognitive activities, such as perception, categorisation and memory. We probed for language-specific effects on recall memory and compared the results of both monolingual and bilingual L2 learner populations. The monolinguals' performance was in line with the respective lexicalisation patterns for causation in English and Spanish. The Spanish monolinguals gave more informative descriptions for non-intentional events and were able to recall them better than the English speakers. The L2 speakers seem to have acquired the respective relevant L2 structures overall, but some differences in expression compared to the monolingual populations were also detected. Crucially, the recall memory performance of the L2 groups seems to follow the L1, not the L2 patterns. Speaking in the L2 did not negatively impact the recall memory for non-intentional actions in L1 Spanish/L2 English speakers and, conversely, L2 Spanish did not make a positive impact on the recall of the L1 English/L2 Spanish group. The linguistic focus on intentionality seems to be deeply engrained in L1 Spanish speakers, while learners of Spanish do not benefit from this focus entrenched in the Spanish language. This means that the learners of Spanish may not have been made fully and explicitly aware of the relevant distinctions that matter to Spanish speakers and to which they have been sensitivised through habitual and consistent expression of the key causation and intentionality distinctions. In other words, an adequate form-to-meaning mapping and usage-driven full acquisition has not been achieved among these learners. By contrast, Spanish learners of English transferred the focus on intentionality positively from their L1 into their L2 for the purpose of both verbalisation and memory, but there was a trade-off: the narrative style in L2 English was different from that of the respective monolingual English population because the learners used more adverbial phrases in order to convey the relevant intentionality details as favoured by their L1 Spanish.
It seems that the L1 and not the language of operation (L2) was the strongest factor aiding recall on this occasion. We must, therefore, ask what this finding means for our theoretical assumptions with regard to thinking-for-speaking. The conclusion we would argue for is that thinking for speaking and remembering in our late bilingual participants is mainly influenced by the L1, because these speakers are still thinking in their L1 when using the L2. Speakers may not resort to their L1 in some other less complex tasks that they can perform solely by using their L2 (e.g. categorisation or similarity judgements for simple events), but for more complex and demanding tasks such as lexicalisation of causation and its recall, L2 learners still rely heavily on their L1. It may also be the case that, as one of the reviewers pointed out, more proficient L2 learners with extensive immersion experiences could demonstrate more of an L2 effect in this and similar tasks. The current study has only included learners of relatively limited experience and immersion and this is indeed one of the study's limitations, in addition to the relatively small item list and participant pool. However, this study did include participants with different levels of competence and achievement in terms of their L2 language assessment results (i.e. higher versus lower class grades), but this difference did not have any effect on performance in the task. Future research is needed in order to establish whether immersion and substantially more advanced L2 experience would indeed lead to thinking in the L2 rather than the L1 while doing complex tasks.
Our findings appear to be in line with the idea by Schmidt (1990Schmidt ( , 1993 that consciousness at the level of noticing (or our term, raising awareness) is a necessary condition for language learning (but see also Fukuta, 2016, for an overview of different views on the subject). By the same token, noticing and explicitly focusing on form-meaning mappings in instruction can be beneficial to learners, as proposed by Cadierno (2008) and Ellis (2008). Furthermore, the results from this study strongly indicate that certain typological differences may have important effects beyond the realm of merely detecting language contrasts. They can impact communication and translation of information, which may impact our understanding of events, our judgement and the outcomes of legal cases (see Filipović, 2007Filipović, , 2013b, for further exemplification and analysis). For all these reasons, it is our hope that this study will inspire similar endeavours in the field of bilingualism research and within the field of psycholinguistics more generally that will bring us closer to a full understanding of how language and other cognitive mechanisms interact and affect one another.
Finally, it may not always be beneficial to fully re-think for speaking in L2s because we may lose certain advantages deriving from the L1 that help us focus on certain aspects of an event. On the other hand, re-thinking for speaking may have advantages for the production of a more native-like narrative style. This may be the necessary trade-off in multilingual communication and this is hardly a matter of constant conscious and deliberate choice. Nevertheless, it may be generally beneficial to harness the potential of multiple ways of looking at the same situations, whereby some languages will have advantages in some domains of experience (e.g. the focus on the manner of motion in English), while other languages will do so in certain other domains (e.g. the focus on intentionality in Spanish). The explicit teaching of typological differences between languages and their consequences along these lines will benefit L2 learners on numerous levels, not least by bringing them closer to the patterns of use that are in the spirit of their L2s. Furthermore, this may also enable language learners to acquire L2s better and to draw on some new categories for thinking-forspeaking that can serve as an aid to memory and as a problem-solving tool in other cognitive tasks.

Author biography
Luna Filipović is Professor of Language and Cognition in the School of Politics, Philosophy, Language and Communication Studies, University of East Anglia. She earlier held an ESRC Postdoctoral Research Fellowship in psycholinguistics at University College London and a Leverhulme Trust and Newton Trust Early Career Research Fellowship at the University of Cambridge. Her research interests are in psycholinguistics and forensic linguistics, with a focus on bilingualism, language typology and language effects on memory. She has written and edited several books and published articles in many journals and edited volumes.