Avoiding the cum hoc ergo propter hoc fallacy: Comments and questions regarding Full Transfer Potential

In this commentary to Westergaard (2021), we focus on two main questions. The first, and most important, is what type of L3 data may be construed as supporting evidence–as opposed to a compatible outcome–for the Linguistic Proximity Model. In this regard, we highlight a number of areas in which it remains difficult to derive testable predictions from the model that go beyond compatibility with multiple outcomes that should, in principle, be mutually exclusive. The second part of this commentary deals with Westergaard’s (2021) a priori questioning of wholesale transfer as a tenable hypothesis on the basis of it creating a context for massive unlearning, both in L2 and L3 acquisition, when humans seem to display conservative learning traits from L1 acquisition already. We argue here that decades of accumulated empirical data in L2 and L3 studies have shown enough evidence of L1 transfer and restructuring to render this argument a non sequitur. In connection to this, we discuss some of the issues related to adaptive accounts of linguistic transfer across instances of language acquisition.


I Introduction
The scientific method is a process of conjecture (hypothesis generation), prediction and subsequent testing in an effort to gain greater understanding. By its very nature, then, it is a process where, much more often than not, theoretical contributions are destined to be wrong. Somewhat ironically, though, being wrong is essentially the goal. By initially assuming our theory -or the one(s) we are testing -is (are) wrong and providing clear predictions for falsification, we engage in the healthy process of elimination of otherwise reasonable conjectures. Failure to prove a theory wrong iteratively and with reliable replication, then, increases the odds that the original conjecture is less wrong than its competitors. Even in such cases, empiricism over time is likely to reveal imprecision in the original conjecture in absolute terms. The cyclical nature of the scientific method reveals the clandestine benefits of being wrong: each time a reasonable conjecture can be safely discarded we get closer and closer to ultimate understanding. In this sense, scientific inquiry and method parallel the structure and layer-by-layer peeling of an onion. Each underlying layer remains invisible to the naked eye before the peeling of the previous one. The emergence of (competing) new theories is, thus, stepwise in nature. The new departs from its predecessors having borne witness to -and thus benefitted from -the proverbial layers previously peeled. Progress too is stepwise, as all reasonable conjectures that were properly vetted empirically leave an indelible mark even as they are discarded.
The article 'Microvariation in multilingual situations: The importance of property-byproperty acquisition', by Marit Westergaard (2021), embodies a potential case in point: Should Full Transfer (FT) in the sense of Schwartz and Sprouse (1996) come to a natural end? There is no doubt that a property-by-property account of transfer is a reasonable conjecture. In fact, it has all the makings of an impactful hypothesis: it is elegantly simple, intuitive and compatible with some empirical observations. We commend Westergaard for writing this article and challenging everyone to think. As usual, her writing is clear, and her positions are well grounded and contextualized. This has enabled us to derive greater understanding about her convictions and points of departure. There are several important macro-questions -those related to implications for the field and the lay of the theoretical landscape -and micro-questions -those pertaining to particular statements, queries or claims made in the article -that we could address herein. Space limitations being what they are, we restrict ourselves to one of each. At the macro-level, we will ask if Westergaard's (2021) proposal meets the criteria needed for testing to determine its full potential, if not serve as a replacement of predecessors. That is, does it avoid the cum hoc ergo propter hoc logical fallacy by projecting beyond potential compatibility with data outcomes, offering testable predictions for its own falsification? At the micro-level, we will address Westergaard's (2021: 388) questioning of why 'the brain would create a context for massive unlearning in L2 [second language] acquisition'.
Not least because we will challenge our esteemed colleague on the above non-trivial issues, we wish to also highlight first some important points where we, unsurprisingly, agree. We agree that the underlying properties and mechanisms of all instances of language acquisition are essentially the same. Observable differences, then, reflect surface manifestations of how the same mechanisms interact with previous linguistic and extralinguistic factors distinguishing sets of learners. We agree that there are distinct grammars for different languages in the mind, irrespective of the age at which they are acquired. We agree that all languages remain active in the mind of all types of bi-/multilinguals. All of this agreement, however, is compatible with classic FT accounts. Westergaard (2021: 388) asks, 'if our genetic endowment for language learning makes us able to make fine distinctions in L1 [first language] acquisition, why should we not be able to do so also in L2 acquisition?' We could not agree more: L2/Ln learners process and render fine grained distinctions, but this is visible in both transfer effects themselves and subsequent restructuring. Context and history are always important. FT is often, certainly by Schwartz and Sprouse (1996) and all applications thereof, aligned with Full Access to UG, as the full name of their proposal suggests: Full Transfer/Full Access (FTFA). While this means that the starting point of acquisition is distinct based on experience a second language learner (L2er) has had that a first language learner (L1er) lacks -claimed full specification at an initial state in L2 as opposed to underspecification in an L1 -the process is not envisaged to be fundamentally different, which classically refers to accessibility to underlying mechanisms (see Bley-Vroman, 2009). As White (2008) puts it, 'Different? Yes. Fundamentally? No.' In fact, much of the research contributing the most convincing data showing, to our mind, the fundamental similarity in the relevant sense between native and non-native acquisition has been that arguing for FTFA.

II Addressing the macro-question
The most important requirement for a theory of transfer in L2/LnA -or, more generally, for those parts of a general theory of L2/LnA that provide an account of transfer phenomena -is for it to be able to model transfer's role in the developmental process. This means understanding, minimally, (1) when transfer happens, and (2) how it happens. Both of these aspects are present in Westergaard's (2021) proposal to a greater or lesser degree of concretion. However, it is precisely the level of granularity in the articulation of a model's mechanisms that ultimately determines its falsifiability. In this sense, a crucial point concerns the nature of the evidence required to evaluate the claims of a property-byproperty account. Westergaard (2021: 397) argues that the ideal testing ground for her theory is 'a comparison between L3 [third language] learners and two groups of L2 learners (with L1s that are the same as the previously acquired languages of the L3 group),' as '[t]his methodology makes it possible to identify the exact contribution of the additional language involved in L3 acquisition.' While we agree that these comparisons are informative (and have used them ourselves; see Rothman and Cabrelli Amaro, 2010), there is a problematic vagueness in the criterion established by Westergaard (2021) for what should constitute evidence for her model in these studies. She argues that the L3 group is expected to perform better than the corresponding L2 group on properties that are similar between the L3 and the language the L2 group is missing, but worse in those properties shared by the L3 and the L1 of both groups, when these differ in the other language, as a result of non-facilitative transfer.
There are, in our view, two assumptions behind this reasoning that contradict in whole or in part the claims being made elsewhere in the article. The first is that, for a theory that explicitly assumes that 'anything may transfer' as opposed to 'everything does transfer' (p. 389), expecting that the influence of the L3 group's L2 will always be detectable in a given linguistic property sampled pseudo-randomly from the grammar amounts to conceding that transfer is so pervasive that we might as well assume it affects every single property. Note that this is not alleviated by a view of the grammar in terms of micro-cues (whereby no two languages could ever have identical rules) nor by assuming that 'using a structure from a previously acquired language to parse L3 input will result in a temporary, unstable L3 representation that is influenced by that language' (p. 396). If non-facilitative influence is indeed unstable and temporary, it should be (much) harder to detect in 'snapshot' studies, where learners are tested only once. In sum, expecting worse performance in the L3 group in Westergaard's (2021) example necessarily entails hoping to capture the confluence of two (very) rare events from the model's perspective: that the L2 has exerted some influence on the L3 representation for that particular property (when it could not have done so), and that it has exerted this influence sufficiently close in time -and developmental sequence -at the point of testing, at least in a representative number of learners, for it to yield detectable effects in the experiment. Of course, the model also allows for other outcomes: If the L3 learners do not show worse performance than their L2 peers for that property, then one can assume that their L2 never exerted non-facilitative influence on the L3 grammar -because the model does not necessarily expect this -or, alternatively, that this influence has been 'washed out' (p. 396) by conflicting L3 input and simultaneous reinforcement from the (more) target-like L1 representation.
The question at hand regards how FTP renders predictions a priori as required by the scientific method for theory testing. Incorrect as competitor models might be, such as FT/ FA, their predictions are clear and thus amenable to falsification. What precisely, then, should researchers use to render the predictions of FTP prior to data collection, as opposed to using its open-endedness to offer claims of compatibility no matter how the data turn out? In an appropriately designed methodology, how can we reduce or commit FTP predictions, especially as regards non-facilitative transfer effects, to a single outcome? Westergaard (2021) brings up the notion of misanalysis several times, but it is not clear how this is operationalized in practical terms. When (under what conditions) is the parser predicted to misanalyse the L2/L3 input as processable by an L1/L2 micro-cue that is ultimately offtarget? Under what specific conditions is the parser predicted to avoid this interference? How does structural proximity relate to the factors involved in misanalysis?
Provided the right context, it is not entirely unusual (although never desirable) for a model to be compatible with different, even opposite outcomes to a given combination of variables. However, this always speaks to the granularity of the theory's predictions, and is thus a wake-up call to deficits in its falsifiability. Only if one or more of the terms of the equation are not fully specified does a model yield predictions that are broad enough to accommodate vastly diverging outcomes. However tempting, we must not mistake the model for the real thing. The variables involved in the complex dynamics of language acquisition, processing and use do take on very precise values, and different combinations of those values determine distinct (and crucially predictable) measurable outcomes. It is our responsibility as scientists to commit to the development of models that capture as much of that variation as possible, and that do so through prediction, not only compatibility, in compliance with the scientific method and its dependence on empirical falsification. Westergaard (2021: 388) asks 'why the brain would create a context for massive unlearning in L2 acquisition, when this is avoided in L1 acquisition.' There are, we believe, at least three problems with this questioning. The first relates to the existing evidence base in the L2 literature. The other two relate to the very framing of the question, and concern undue assumptions underlying its formulation.

III Addressing the micro-question
Asking this question in the first place would make the most sense in light of a dearth of evidence for a significant amount of unlearning, attested in the second language acquisition (L2A) literature that could be reasonably linked back to L1 effects. However, there is no shortage of studies across a significantly varied amount of L1-L2 language pairings showing this. At early stages of L2 acquisition, where initial state models are best tested, there is an overwhelming amount of evidence to suggest L1 transfer that will need to be 'undone', and often is. The consistency of transfer effects early on, their lingering nature throughout development, our ability to make predictions in terms of timing in L2 development for which properties might show greater or lesser resilience in restructuring (e.g. L1-superset-to-L2-subset vs. L1-subset-to-L2 superset) all couple together, in our view, to suggest that L1 transfer is representational. If so, then restructuring must take place. If restructuring must take place, then there simply is 'a context for massive unlearning in L2 acquisition, when this is avoided in L1 acquisition.' All cognitive-based (i.e. not only generative) approaches to L2A acknowledge that there is a significant amount of restructuring/unlearning implied. And so, if this is the reality of the evidence base we have at our disposal, then all theories must explain why this happens. The burden of explanation of all these data rests equally on theories advocating selectivity or conservatism in transfer itself. If transfer is property-by-property, why would there ever be a need for restructuring? We might expect, alternatively, L2A to reflect facilitation from the L1 and learning to fill in the gaps. Westergaard (2021) makes reference to misanalysis, a point we have already discussed above, as the basis for non-facilitative effects that she acknowledges obtain and require restructuring, but why is misanalysis so prevalent?
The first issue related to the framing of the question itself is the implicit underestimation of the role of experience in shaping cognitive processing. A parallel to human vision might help illustrate this point. Video game players have been consistently shown to outperform non-players in visual tasks of different kinds (e.g. Castel et al., 2005;Green and Bavelier, 2007). While a causal link has successfully been established between experience and outcomes (by means of replicable interventions of visual training through video game play; see Green and Bavelier, 2007), all accounts of this difference focus on changes in visual processing, that is, in subsequent higher-order processing by vision centers of the brain (primary visual cortex, dorsal and ventral streams) of the low-level information acquired by the system's sensors (the retina, primarily). The claim is not that the physiology of the visual system is affected by experience (mechanisms remain constant), but rather that the mind/brain adapts as a consequence of particular experience in video game players to process the same low-level information in different ways.
Similarly, the logic behind models of L2/LnA assuming full transfer is not that subsequent instances differ from L1 acquisition in any fundamental aspect -for most researchers, ourselves included, the mechanisms of learning by parsing (failure) are indeed assumed to be the same -but rather that experience shapes the learning task in nontrivial ways. In this context, asking why the mind would not avoid a certain reflex in L2 acquisition that was never available in L1 acquisition resembles a historian's fallacy. There is no way for the mind, at the onset of L2 acquisition, to estimate (and thus model its behavior around) the cost/benefit of linguistic transfer, because it has no experience with it. This point is easily missed if one accepts a hidden premise in Westergaard's (2021) formulation of the original question (and the second of the framing problems we referred to above): that transfer and overgeneralization can be notionally equated simply because non-target-like linguistic representations fall within the range of their respective outcomes. Westergaard's (2021) overall claim that transfer is not a priori wholesale is actually compatible, in our view, with the general notion of both an adaptive approach to cognitive processing in general (in the present case, seen through the lens of language learning) and our assertion that transfer turns out to be wholesale in L2A and L3A, a point we alluded to in Rothman et al. (2019). As we have discussed therein, although Westergaard (2021: 393) questions this, we take full transfer to be an overt reflex of adaptive cognitive economies inherent to subsequent (language) learning. While the experience of having to 'unlearn' at the cost of transfer in L2A could itself be a motivator for a more adaptively conservative property-by-property transfer approach in L3A, the contexts of L2A and L3A embody distinct sets of relevant circumstances. The parsing failure experiences brought about by transfer in L2A might not suffice to favor property-by-property transfer yet, because L3A is the first time multiple sources for transfer are available. Thus, L2A and L3A are not comparable for the relevant experiential factors. L3 is, in these terms, much more comparable to L4/Ln. An adaptive approach might then expect the restructuring experience from transfer in L3A to be sufficient to promote a more conservative property-by-property approach in L4/LnA in line with the general view of FTP. If found, this would lend support to Westergaard's (2021) general position that transfer is not wholesale by default (yet can be by circumstance, compatible with what we have argued for L2 and L3 in our work). At the same time, it would highlight our general point related to the interface of adaptive cognitive processing, and the same language acquisition mechanisms ultimately giving rise to different learning trajectories when operating over contexts which delimit their interactions in distinct ways.