Toward a Science of Effective Cognitive Training

A long-standing question in the behavioral sciences is whether cognitive functions can be improved through dedicated training. It is uncontested that training programs can lead to near transfer, meaning increased performance on untrained tasks involving similar cognitive functions. However, whether training also leads to far transfer, meaning increased performance on loosely related untrained tasks or even activities of daily living, is still hotly debated. Here, we review the extant literature and, in particular, the most recent meta-analytic evidence and argue that the ongoing crisis in the field of cognitive-training research may benefit from taking a more mechanistic approach to studying the effectiveness of training. We propose that (a) adopting a more rigorous theoretical framework that builds on a process-based account of training and transfer, (b) considering the role of individual differences in the responsiveness to training, and (c) drawing on Bayesian models of development may help to solve controversial issues in the field and lead the way to designing and implementing more effective training protocols.

Practice supposedly makes perfect. But does it also lead to tangible improvements in skills not directly trained? Addressing how experiences generalize beyond the context in which they take place can answer fundamental questions of cognitive architecture and learning. Since the early 2000s, there has been a particular interest in whether executive functions (EFs) can be improved (Smithers et al., 2018), in particular, working memory (WM), inhibitory control (IC), and cognitive flexibility (CF; for a review, see Strobach & Karbach, 2021). This interest was nurtured by findings that EFs in childhood are linked to academic achievement, mental health, social functioning, and well-being both during childhood and especially later in life (Moffitt et al., 2011). As a result, attempts to impinge on these critical life skills have surged, but findings remain equivocal (Diamond & Ling, 2020;Redick, 2019;Titz & Karbach, 2014).
The "brain-training industry" has capitalized on man's tireless endeavor to self-improve, as indicated by a forecasted net worth of more than $8 billion by 2021 (Ahuja, 2019). This mandates a critical examination of the quality of existing evidence for the benefits of cognitive training of EFs against stringent criteria. Despite several recent best-practices recommendations (Green et al., 2019;Simons et al., 2016) for evaluating the effectiveness of cognitive training, a comprehensive understanding of how, for whom, and why certain training can be effective is still missing. Thus, we need a mechanistic framework on cognitive training, integrating methodology and theory to drive the field forward.
We briefly review theoretical positions and empirical evidence in favor of and against the effectiveness of EF training. What emerges is a striking discord in the field, with strong claims and supporting evidence on both sides, giving rise to the questions of how and whether these discrepancies can be reconciled. We propose three key paradigm shifts to facilitate a rapprochement and suggest novel and necessary ways to assess whether and how cognitive training can be effective: (a) establishing a mechanistic link between a training mechanism and a transfer domain, (b) considering the importance of individual differences in the effectiveness of training, and (c) offering a theoretical perspective on when and how particular training interventions might be most effective. We hope that this will facilitate incorporating changes into both training design and analysis and clarify how training might impact cognitive functions.
What Is the Consensus on Cognitive Training?
Although it is uncontested that training can impact closely related domains (near transfer), it is still intensely debated whether they lead to improvements in loosely related domains (far transfer; Diamond & Ling, 2020). Theories on the possibility of far transfer also range in their optimism. In their common-elements theory, Thorndike and Woodworth (1901) argued that transfer happens within one domain via knowledge that shares common elements but that far transfer is rare. Since then, Anderson (1982) assumed that production rules coordinate exchange between specialized cognitive systems but are often specific to a particular task. In contrast, the primitive-information-processing-elements theory (Taatgen, 2013) claims that training on particular tasks evolves a set of operators toward that task, which should be useful in new contexts and lead to transfer. The cognitive-routine framework (Gathercole, Dunning, Holmes, & Norris, 2019) posited that participants develop new cognitive routines during training. These routines are automated cognitive procedures that can be applied to novel tasks sharing the same requirements. Transfer to other tasks will occur if there are common task features (e.g., transfer of WM training to IC after training on complex but not simple span tasks). One drawback is that available models of transfer make only very general and limited predictions about the conditions under which far transfer should occur.
Recent meta-analyses on the effectiveness of training reflect these diverse theories. Gobet (2017, 2019) provided a critical assessment of transfer effects after WM training. They concluded that cognitive training does not enhance general cognition because effect sizes for far transfer are low, and effect size is inversely related to the quality of study design. In contrast, a meta-analysis including numerous cognitive-training interventions (EF training, classroom-based and gamebased activities) demonstrated far transfer across a wide range of domains (literacy, numeracy, language skills, IQ, and psychosocial outcomes; Smithers et al., 2018). These findings echo previous observations that highly contextualized training is most likely to yield transfer effects (Diamond & Lee, 2011). Further, Smithers et al. (2018) found that better quality studies yielded larger effect sizes, again in contradiction to Sala and Gobet's conclusions. Other studies imply that nonspecific training interventions seem to generate more generalized outcomes (Heckman, Pinto, & Savelyev, 2013;Lillard & Else-Quest, 2006), suggesting that more holistic programs, including multidimensional content, better support overall child development and yield broad-based benefits. Below, we offer a theoretical perspective for why this could be the case.

Mechanisms: Establishing a Framework
A core assumption of training studies is that training mechanisms are fundamentally related to outcome measures of interest (for a review, see Noack, Lovden, Schmiedek, & Lindenberger, 2009). For instance, WM capacity (i.e., the maximal amount of information that can be stored and manipulated in WM) correlates highly with general intelligence ( Jaeggi, Buschkuehl, Jonides, & Perrig, 2008). The logical and empirical consequence has been to target WM capacity to increase intelligence. However, as has been argued elsewhere, two correlated variables, such as WM span and fluid reasoning, do not necessarily covary when one is being artificially inflated through training, because training can tap unshared variance between the two constructs (Moreau & Conway, 2014). Moreover, although relationships between WM and intelligence might exist at a latent factor level, this is not necessarily the case at the level of single tasks that are typically used in training studies. Also, EFs are higher-order constructs including different processes. For instance, WM consists of storage, rehearsal, and matching as well as manipulation of information and processing skill. Correlating two tasks does not offer sufficient granularity or direction to identify the true underlying process-based nature of the relationship. Finally, considering task manipulations (i.e., increasing WM span) as tantamount to training for outcome variables is a nontrivial endeavor. For example, it has been shown that it is not WM span per se that is related to intelligence (Unsworth & Engle, 2005) but rather a shared executive attention-control mechanism required for the active maintenance of information in the face of concurrent processing and interference. Increasing WM span may therefore not do much to improve intelligence (Sala & Gobet, 2017).
To remedy these shortcomings, we propose the following: First, we need to understand the true relationship between training mechanisms and outcome variables. This is a challenging endeavor for many reasons, among them the task-impurity problem in the measurement of EFs (Kane & Engle, 2003;Miyake & Friedman, 2012). Although much progress is being made using latent-variable approaches (Könen & Auerswald, 2021), additional approaches may contribute to our understanding of the mechanisms underlying training and transfer effects. Generative computational models allow the parsing of task performance into multiple distinct processes that necessitate different computations as well as into directionality between processes (Sutton & Barto, 2018). Recently, computational models have elucidated processes underlying WM performance (time-based resource-sharing models; Oberauer & Lewandowsky, 2011) and IC (Bayesian ex-Gaussian estimation of reaction times; Matzke, Dolan, Logan, Brown, & Wagenmakers, 2013). For instance, canonical analyses of standard inhibition tasks such as the stop-signal response time task provide a single measure of mean performance. Recent computational frameworks using Bayesian ex-Gaussian estimation of reaction times decompose the signal into µ and σ parameters, which are the mean and standard deviation of the Gaussian component, whereas τ reflects the tail of the distribution (Matzke et al., 2013). It has been argued that whereas mean performance indicates inhibitory capacity, the tail indicates lapses of attention (Schel, Thompson, & Steinbeis, 2020). This can be used to inform the design of cognitive training targeting constituent processes of core EFs. Computational modeling thus offers promise in identifying which training mechanisms need to be targeted to affect specific outcome variables.
Second, to ascertain that appropriate training mechanisms are identified and targeted, we propose to draw on experimental manipulations and not correlations. Experimental manipulations such as dual-task paradigms or priming studies offer a means to establish mechanistic relationships between variables without the cost of full-fledged training interventions. One way to manipulate EFs is to apply dual-task or serial-task paradigms. For example, it has been argued that IC is required to overcome the temptation of keeping resources for oneself and share with anonymous others (Steinbeis, 2018a). After showing that manipulating IC impacts prosocial behavior (not through training but through two lower-cost serial-task paradigms; Figs. 1a and 1b, respectively; Steinbeis, 2018b;Steinbeis & Over, 2017), we could proceed to train IC and test whether this leads to direct changes in prosocial behavior (Steinbeis, 2020;see Fig. 1c). Third, training needs to be delivered across a range of tasks and not just single manifestations in order to show change on the (latent) ability level (Noack, Lovden, & Schmiedek, 2014). Fourth, an appropriate operationalization of training mechanisms in question is required. For instance, IC training often simply reduces the response time window (Enge et al., 2014;Thorell, Lindqvist, Bergman Nutley, Bohlin, & Klingberg, 2009), which might train the speed at which a capacity is deployed but not capacity itself. Much more careful thought needs to be given to how capacities, rather than just task performance, can be improved. We therefore propose, where possible, to (a) employ computational models allowing constructs to be understood in terms of constituent processes, (b) systematically manipulate these processes in dual-task frameworks to assess whether they have an impact on relevant outcome measures, and (c) carefully consider, once a training mechanism has been identified, how it can be manipulated to train the desired outcome variable.

Individual Differences: Personalizing Training
State-of-the-art studies consistently show that individuals respond differently to the same training intervention. These interindividual differences in training gains often show distinct patterns after different types of training, with compensation effects (larger gains in low performers) particularly emerging after process-based training targeting one or more basic cognitive resources (e.g., EFs) and magnification effects (larger gains in high performers) typically appearing after strategybased training, such as mnemonics (Karbach & Verhaeghen, 2014). Training-induced gains also vary as a function of individual differences in such factors as age, baseline ability, motivation, personality, and genetic predisposition (Strobach & Karbach, 2021), indicating that especially low-performing and at-risk individuals can benefit massively from EF training (Karbach, Könen, & Spengler, 2017). And yet these individual differences are often overlooked, and current approaches broadly take univariate statistical approaches, which are unable to identify individual cognitive profiles of performance on the basis of rich multivariate data. Contemporary multivariate analysis methods offer a radical rethink of training and associated transfer (Astle, Bathelt, The CALM Team, & Holmes, 2019; on behalf of the CALM Team, 2018) by focusing on training-related changes in task relationships.
Moreover, we need to consider intraindividual dynamics in training-related performance changes. Intraindividual performance trajectories across training reveal which participants show training effects and when they reach their individual maximum. The fluctuations in these trajectories can be indicative of adaptive processes (e.g., varying strategies) or maladaptive processes (e.g., vulnerability to disturbing influences). Intraindividual couplings of performance fluctuations with other variables (e.g., motivation, affect) can tell us which internal and external factors contribute to individual performances and to what extent participants differ in the strength of these relations (Könen & Karbach, 2015). This seems particularly relevant for studies with heterogeneous samples because variation in intraindividual effects across training may eventually result in interindividual differences in training outcomes. Considering both inter-and intraindividual differences and dynamics is likely to contribute massively to our understanding of training outcomes and can help generate theories regarding the underlying mechanisms.
Finally, these considerations may also help to explain the heterogeneous findings that extend to the level of meta-analyses: Looking at mean group changes in primary studies and averaging across their effect sizes in meta-analyses does not do justice to interindividual and intraindividual differences. We therefore propose to investigate who benefits the most in order to design tailored training studies targeting EFs and the numerous outcomes building on them. As in fields of medicine, which have embraced the necessity of personalizing treatment, researchers in the field of cognitive training need to consider differences in variables such as baseline ability, motivation and affect, genetic predisposition, environmental experience, and lifestyle as well as developmental stage and individually and developmentally relevant goals.

Theory: Bayesian Account of Development
Current theoretical accounts on far transfer lack prediction on what a training intervention must entail to be effective. The interactive-specialization hypothesis on brain development states that cortical circuits specialize over development ( Johnson, 2001( Johnson, , 2011 and that training should be particularly effective during childhood (Wass, Scerif, & Johnson, 2012). We argue that a more finegrained definition of how the training input is perceived is critical for understanding whether transfer occurs. Bayesian learning accounts have recently been used to study developmental plasticity (Fawcett & Frankenhuis, 2015;Stamps & Frankenhuis, 2016). Bayes's theorem provides the most logically consistent way to model an individual's current assessment of conditions in the external environment (the state of the world) using a probability distribution. Such models assume that individuals have naive priors, which are updated as individuals are exposed to a series of potentially informative cues over the course of their lives, yielding a series of posterior distributions. Development unfolds as a function of children's assessment of the state of their world, as reflected by their posterior distributions.
Essential to a Bayesian learning account is the processing of cues, which refers to experiences that are potentially informative about environmental conditions. Cues are primarily assessed in terms of their reliability and informativeness (however, note that cues can be uninformative, unreliable, and misleading). Cue reliability is determined by the likelihood that a specific cue will occur given each possible state of the world. Cue informativeness refers to the extent to which cues are informative by reducing uncertainty about the world (Fawcett & Frankenhuis, 2015). We suggest that cognitive training can be seen as a set of cues, which are assessed in terms of these criteria and thus how representative of ecologically meaningful events and predictive of current or future contexts they are. In sum, in the context of cognitive training, a cue is a stimulus that is informative about the world outside the context of the training. Such information could be engendered by stimuli of specific significance (e.g., highly caloric food for dieters) or by relevant contexts (e.g., learning inhibition in the context of a social interaction).
A prediction of this framework is that training with poor reliability and informativeness in terms of an individual's actual current or future experience of the world is likely to have limited to no impact. Context is therefore of particular relevance. Evidence in support of this comes precisely from the meta-analytic studies presented above, whereby isolated training focusing only on specific aspects of cognition without any embedding led to limited transfer, whereas training that was contextualized in terms of how and where it was delivered was shown to be more effective (Diamond & Lee, 2011). Interestingly, this framework can also account for effective transfer to real-life drinking and eating behavior following from highly specific training cues (i.e., inhibition of response tendencies to alcoholic drinks to reduce alcohol consumption; Jones et al., 2016). Bayesian accounts of learning offer both test bed and guidance for the design of effective training studies.

Conclusion
In this article, we have argued that the ongoing crisis in the field of cognitive-training research may benefit from taking a more fine-grained approach to training studies (Fig. 2). We argue that (a) adopting a more rigorous theoretical framework that builds on a processbased account of the underlying mechanisms of training and transfer, (b) considering the role of individual differences in the responsiveness to training, and (c) drawing on Bayesian models of development may help solve controversial issues in the field and lead the way to