Differential Switch Costs in Typically Achieving Children and Children With Mathematical Difficulties

Children with mathematical difficulties need to spend more time than typically achieving children on solving even simple equations. Since these tasks already require a larger share of their cognitive resources, additional demands imposed by the need to switch between tasks may lead to a greater decline of performance in children with mathematical difficulties. We explored differential task switch costs with respect to switching between addition versus subtraction with a tablet-based arithmetic verification task and additional standardized tests in German elementary school children in Grades 1 to 4. Two independent studies were conducted. In Study 1, we assessed the validity of a newly constructed tablet-based arithmetic verification task in a controlled classroom-setting (n = 165). Then, effects of switching between different types of arithmetic operations on accuracy and response latency were analyzed through generalized linear mixed models in an online-based testing (Study 2; n = 3,409). Children with mathematical difficulties needed more time and worked less accurately overall. They also exhibited a stronger performance decline when working in a task-switching condition, when working on subtraction (vs. addition) items and in operations with two-digit (vs. one-digit) operations. These results underline the value of process data in the context of assessing mathematical difficulties.

Foundational arithmetic competencies refer to basic mathematical skills and knowledge required for success in higher-level mathematics (e.g.understanding numbers and mathematical concepts).These competencies are strongly associated with arithmetic fluency tasks, such as solving as many arithmetic tasks as possible in a limited time (J.I. D. Campbell & Tarling, 1996;Dewi et al., 2021).In our studies, we investigated differential switch costs in typically achieving children and children with mathematical difficulties with a newly constructed computerized arithmetic verification task.Although prevalence rates for dyscalculia-a persistent difficulty in learning arithmetic-vary between 2% and 7% (Devine et al., 2013;Rapin, 2016), the number of children who do not learn basic competencies during primary education is significantly higher at 15% to 20% in North America and Europe (UNESCO Institute for Statistics, 2017).The majority of mathematical difficulties can be attributed to various aspects, for instance environmental factors such as the extent of education or early learning environment.In our studies, we examined children with mathematical difficulties, irrespective of the cause of difficulties, and defined children who achieved below-average scores in mathematical tests compared with the reference group (≤16th percentile) as children with mathematical difficulties.

Development of Foundational Arithmetic Fact Knowledge and Arithmetic Fact Fluency
Arithmetic concepts and, consequently, addition and subtraction skills, develop throughout elementary school, basically following the curriculum-for instance, addition precedes subtraction (Rubinstein et al., 2001).Addition and subtraction both require understanding of additive composition and part-whole relations (Butterworth, 2005;Nunes et al., 2016).Any subtraction problem can be transformed into an addition problem (e.g., 2 + 4 = 6, 6 -4 = 2, 6 -2 = 4).Thus, these operations are complementary both procedurally and conceptually (Robinson, 2017;Robinson & Dubé, 2009, 2012) Similar to the dual-route model of reading (Coltheart, 1978), two possible ways of calculating have been proposed for arithmetic (Amalric & Dehaene, 2019;Dehaene & Cohen, 1997).In the first, semantically mediated route, calculation strategies are necessary to solve the task.Having solved the same calculation task often enough by strategies such as counting, children store numbers and operators (e.g., "+" or "−") together with the result as foundational arithmetic fact knowledge in long-term memory (Dehaene & Cohen, 1995).Retrieving foundational arithmetic fact knowledge directly from long-term memory represents the second route that is far more efficient than the first route.
We examined this efficiency in the present study by means of fluency tasks.Fluency is often used to designate the smooth and effortless production of speech and pronunciation (Chambers, 1997) but the term is used in the field of arithmetic as well in the sense of processing fluency (Vanbinst et al., 2015).Arithmetic fact fluency refers to the automatic retrieval of simple single-digit facts from longterm memory (Zaunmüller et al., 2009).It is strongly associated with overall mathematical achievements, especially during elementary school (Nunes et al., 2012).Jordan et al. (2003) showed that children with difficulties in arithmetic fact fluency in Grade 2 had a higher risk for lower mathematical performance in later school years.
In this study, we focused on the development across elementary school.Therefore, we should note that arithmetic fact fluency develops in several stages.Young children usually concentrate on lower-level strategies, such as counting.At this developmental stage, children often use specific external representations such as fingers or objects to manipulate quantities and perform simple arithmetic tasks (Crollen & Noël, 2015;Geary et al., 1991).With increasing conceptual and procedural knowledge about numbers, higher-order strategies evolve.At this point, children can decompose a presented arithmetic problem into easier and more familiar tasks, e.g., "8 + 5" is decomposed to "(5 + 5 = 10) + 3 = 13"; Laski et al., 2013.This counting and decomposition strategy reduces cognitive load by segmenting the problem into easier substeps, solvable through retrieval from long-term memory.As a result of an arithmetic intervention, children with mathematical difficulties increase their use of decomposition as preferred strategy and decrease their use of counting-based strategies-which are their most common strategies before the intervention (Koponen et al., 2018).Finally, the retrieval of arithmetic fact knowledge becomes more efficient, reflected by higher accuracy rates and faster processing speed (Mabbott & Bisanz, 2003).
When children begin to engage with numbers and arithmetical problems, for example, addition and subtraction tasks, they use counting procedures as their dominant strategy (Bagnoud et al., 2021;Baroody et al., 2006).However, it is not completely clear how the further development of arithmetic skills proceeds.Two theoretical views on children's progress toward proficient processing of arithmetic problems can be contrasted.Retrieval models suggest that counting procedures are more and more replaced by memory retrieval (e.g., Chen & Campbell, 2018;Siegler, 1996).In contrast, simple arithmetic problems can also be solved by using rules (e.g., N + 0 = N rule) and heuristics (children may reason out the products of near-ties by recalling the product of the more easily recalled tie-e.g., 7 × 7 is 49, 8 × 7 is one more seven, so its product is 49 + 7, or 56; Baroody, 1983Baroody, , 1984Baroody, , 1994)).According to the automated counting procedure theory, the development of strategy consists of an acceleration of counting procedures until automatization (e.g., Barrouillet & Thevenot, 2013;Fayol & Thevenot, 2012;Mathieu et al., 2016;Thevenot et al., 2016;Uittenhove et al., 2016).Logan (1988) described the shift from counting to retrieval in terms of his instance theory of automatization.According to this theory, each time children work on an arithmetic task, a single memory trace is created containing the task and the result.These traces are finally stored in long-term memory.When practice continues, more and more memory traces are created using algorithm-based procedures, resulting in a higher probability of using memory retrieval.In the end, children shift completely from counting to retrieval.

The Role of Working Memory and Development of Calculation Strategies
Deficits in working memory are strongly associated with mathematical difficulties (Friso-van den Bos et al., 2013;Raghubar et al., 2010;Schuchardt & Mähler, 2010).All main components of working memory as distinguished in Baddeley's (1986) model of working memory are relevant for completing arithmetic tasks: The central executive guides attention and ensures updating of information (Andersson & Lyxell, 2007;van der Sluis et al., 2004), the phonological loop serves as a storage system for the storage for input, intermediate results and the solution (de Weerdt et al., 2013) and the visuospatial sketchpad serves for visualization and magnitude estimation (D'Amico & Guarnera, 2005;Wilson & Swanson, 2001).
Being faced with a mathematical problem, the central executive controls calculation strategies such as retrieval of solutions or counting strategies.Retrieval of arithmetic fact knowledge demands less resources than using counting strategies (Kaye, 1986).Given that the capacity of working memory is limited, an overload of working memory may result when task difficulty increases (e.g., solving another task simultaneously), resulting in higher error rates or increased processing time (Busch et al., 2013).
One way to reduce the load on working memory is to use more efficient strategies for solving arithmetic problems.Over the course of elementary school, calculation strategies develop continuously (Siegler, 1991;Widaman et al., 1992).From counting with fingers or objects to verbally counting numbers and, finally, automatic representation and manipulation of magnitudes as well as retrieval of fact knowledge and mathematical derivation, calculation strategies become more and more efficient-although children might use different strategies at the same stage of development (and not always the most efficient one; Carpenter & Moser, 1984).Some arithmetic facts seem to be added faster to the fact knowledge, such as tasks with identical addends or tasks resulting in 10 (Gaidoschik, 2010).Generally, large interindividual differences exist in the development of calculation strategies: Whereas some children already use fact retrieval in Grade 2 (Widaman et al., 1992) or even as early as Grade 1 (Carpenter & Moser, 1984), some adults still use counting strategies at least in some cases (J.I. D. Campbell & Fugelsang, 2001).
For number sets above 10 and for subtraction tasks, other calculation strategies must be considered or adapted (Geary et al., 1993;Siegler, 1991), even if counting is still possible.These additional strategies address especially the 10 transition, for example, the isolated handling of singledigit and multi-digit numbers.Children with mathematical difficulties face specific problems using arithmetic strategies and shifting between arithmetic strategies (Rourke, 1993;van der Sluis et al., 2004).These difficulties stem from conceptual problems in understanding underlying magnitude representations of numbers and the storage and retrieval of factual knowledge from long-term memory.For example, children with mathematical difficulties often fail to represent numbers by the underlying quantity and consequently face problems with part-whole-relations (Krajewski & Schneider, 2009).These concepts, which are fundamental for overcoming counting strategies, seem to be poorly represented and poorly interconnected in long-term memory.Consequently, associations between arithmetic tasks and their solutions cannot be established sufficiently (Siegler & Shipley, 1995).Moreover, a hypersensibility for similarities has been described as well, meaning that children with mathematical difficulties mix up different tasks with identical operators more frequently (De Visscher et al., 2015;De Visscher & Noël, 2014).Given these deficits, children with mathematical difficulties find it harder to overcome counting strategies and thus display more error-prone processing and have a higher cognitive load in working memory resources (e.g., counting strategy with or without fingers).They use these strategies longer than typically achieving children (Geary, 2011).

Arithmetic Production and Verification Tasks
In order to measure mathematical skills in terms of arithmetic fact fluency, there are basically two types of possible tasks: (a) production tasks (e.g., presenting a mathematical problem with limited time for solving the task) and (b) arithmetic verification tasks (e.g., "2 + 2 = 4: correct?"; Dewi et al., 2021).In contrast to paper and pencil tests, computer-based measurement can provide exact response latencies on the item-level.Practical aspects can limit the validity, though.The computer-based measurement of processing time can have shortcomings if children have to search for the correct digits on the keyboard or screen for typing the solution.The abilities needed in this process, such as familiarity with the keyboard or motor abilities, differ between children in elementary school, which means that processes irrelevant for arithmetic competencies play a role, too, and affect the time and effort needed to carry out the task (Horkay et al., 2006).In contrast, in arithmetic verification tasks, true or false equations are presented that just have to be classified as "true" or "false" by simply tapping one of two buttons on the screen.As a result, we assume that the most valid and robust way to assess arithmetic fact fluency is by means of computer-based arithmetic verification tasks.
Arithmetic production and verification tasks differ slightly in their underlying cognitive mechanisms.For arithmetic production tasks, three processing stages may be assumed: (a) encoding of the problem, (b) searching for the answer in long-term memory or solving the task, and (c) providing the answer (Ashcraft & Battaglia, 1978).In verification tasks, one additional stage is needed, namely, the evaluation of the presented solution (Ashcraft, 1982;Ashcraft et al., 1984).
Several conditions are known to affect the time needed to solve verification tasks.First, solution times are shorter for true compared with false equations (e.g., Ashcraft & Fierman, 1982;J. I. D. Campbell, 1987).Moreover, subtraction tasks are more difficult to solve than addition tasks, leading to longer solution times (J.I. D. Campbell, 2008;Schneider & Anderson, 2010).Besides that, counting or retrieval processes can be bypassed by using plausibility judgments, which leads to shorter solution times (Reder, 1982).In plausibility judgments, the equation is processed as a whole without performing exact calculations (Zbrodoff & Logan, 1990).Plausibility judgments are more likely (a) if the presented answer differs extremely from the correct answer (Ashcraft & Battaglia, 1978;De Rammelaere et al., 1999;Zbrodoff & Logan, 1990) or (b) if different parities are observed between given and expected answer, such as in 2 + 4 = 7 (Krueger, 1986;Krueger & Hallford, 1984;Lemaire & Fayol, 1995;Lemaire & Reder, 1999;Masse & Lemaire, 2001).Finally, the presented solution in true equations sometimes facilitates the retrieval of the answer (R. N. Campbell, 1978).
In sum, arithmetic verification tasks must be carefully designed to allow meaningful insights into cognitive processes, especially when processing times are used as indicators of arithmetic fluency.However, if successful, arithmetic verification tasks provide an informative and economical possibility for assessing arithmetic fact knowledge in elementary school.Ashcraft et al. (1984) showed that arithmetic production and verification tasks yield converging assessments from Grade 2 on and measure the same skills and constructs.

Research Rationale and Hypotheses
Our aim in this study was to explore the potential of arithmetic verification tasks in elementary school, with a focus of analyzing differences in the processing of arithmetic facts between children with mathematical difficulties and typically performing children.Arithmetic verification tasks have mainly been investigated by using addition and multiplication tasks (e.g., Busch et al., 2013Busch et al., , 2018;;Lemaire & Reder, 1999;Widaman et al., 1992;Zbrodoff & Logan, 1990).Schneider and Anderson (2010) used subtraction verification tasks, however, they only investigated adults and only two-digit numbers.In general, less research has been conducted on processing differences between children with mathematical difficulties and typically performing children in elementary school, compared with research addressing children with reading or writing difficulties, and the available research is often limited to single grades (e.g., Busch et al., 2013).Addressing this research gap seems especially important as mathematical difficulties manifest themselves in elementary school.Valid and economic diagnostic tools are needed for identifying children with mathematical difficulties early and providing them with the necessary support (Chodura et al., 2015).
First, to establish construct validity of arithmetic verification tasks, we examined to what extent performance in our newly constructed arithmetic verification task (Richter et al., 2018) corresponded to performance in a standardized arithmetic production test (Study 1).

Hypothesis 1 (H1):
We expected a strong linear relationship between performance in arithmetic production, based on standardized tests and arithmetic verification task in elementary school children.
Moreover, we examined the accuracy on the classification of children with mathematical difficulties (T score ≤ 40 in standardized tests) by their performance in the arithmetic verification task.To this end, we explored predictive values by receiver operating characteristic (ROC) analyses.
Based on the distinction of strategy usage and fact retrieval, we also focused on determinants of accuracy and processing speed in the arithmetic verification task (Study 2).

Hypothesis 2 (H2):
Children with mathematical difficulties should display lower accuracy and should need more time to solve arithmetic tasks.Hypothesis 3 (H3): The higher cognitive load induced by task-switching should further impede processing, leading to a decrease of accuracy and an increase in processing time.Hypothesis 4 (H4): Since subtraction is acquired later from a developmental perspective, subtraction (vs.addition) items should be more difficult throughout Grades 1 to 4, leading to a decrease of accuracy and an increase in processing time.Hypothesis 5 (H5): Moreover, we assumed that children with mathematical difficulties should display a stronger decline of performance (lower accuracy and more time needed to complete the tasks) for two-digit (vs.onedigit) operations than typically achieving children at the end of elementary school.Hypothesis 6 (H6): Finally, costs in the task-switching condition should be higher for children with mathematical difficulties (= lower accuracy and longer response latencies for subtraction vs. addition items).

Study 1
The main objective of Study 1 was to explore the validity of the newly constructed arithmetic verification task.In a controlled setting in elementary schools, we examined (a) to what extent performance in the arithmetic verification task corresponded to performance in a standardized arithmetic production test and (b) how reliably children with mathematical difficulties could be identified by the arithmetic verification task.

Method
Participants.The sample consisted of 165 students recruited from three elementary schools (Grades 2-4) in Bavaria, Germany.Gender was balanced in Grade 3 (n = 50; 50.0%female), female participants slightly outweighed male participants in Grade 2 (n = 61; 57.4% female) and male participants slightly outweighed female participants in Grade 4 (n = 54; 44.4% female).Due to a fully anonymized data collection, no additional socio-economic data can be reported.In the participating schools, the age range of children varied between 7 and 8 years in Grade 2, 8 and 9 years in Grade 3, and 9 and 10 years in Grade 4. Data collection took place in a period of 2 weeks at the end of the school year in July 2019.
Procedure, Design, and Instruments.Children were tested together in classrooms of the participating schools.The first measure of arithmetic skills was a newly constructed computerized arithmetic verification task.It was presented on a computer tablet (10.1 inch).First, two instruction items were presented visually and with corresponding audio reading the problem aloud to students.Then, a total of 180 arithmetic items were visually presented in nine units in ascending difficulty.Units 1 to 3 included tasks within number set 1 to 10 (e.g., "3 + 5 = 8": correct?).Units 4 to 6 tasks spanned number set 1 to 20 (e.g., "12 − 5 = 7": correct?).The final sets, Units 7 to 9, included tasks within number set −100 (e.g., "36 − 7 = 29": correct?).For the three units in each number set, the first unit only contained addition tasks, the second unit only subtraction tasks and the third unit both addition and subtraction tasks.The students' task was to decide for every single item whether the presented equation was true or false by tapping on a specific area on the tablet (e.g., "2 + 4 = 6"-TRUE; "2 + 4 = 9"-FALSE).Half of the 20 items in each unit were correct.Between every unit there was a short break.The entire administration of the test was limited to 11 min.Because of this time limit, only 22.2% of the children in Grade 4 completed the whole task (12.0% in Grade 3; 3.3% in Grade 2) but all children in all grades completed Units 1 to 3 (number set 1-10) and a majority of 88.9% in Grade 4 completed Unit 6 (number set 1-20).Response accuracy and response latencies from presentation onset to tipping one of the response buttons were recorded.The sum of the correctly solved items within 11 min served as raw score.Internal consistency (Cronbach's α) for the whole arithmetic verification task was α = .98(Grade 2: α = .97,Grade 3: α = 0.96, Grade 4: α = 0.97).
To assess arithmetic skills with a standardized arithmetic test, we administered a commonly used mathematics test for elementary school (Heidelberger Rechentest; HRT1-4; Haffner et al., 2005).Three subtests were carried out: addition (e.g., "5 + 3 = _"), subtraction (e.g., "5 -3 = _") and fill-in-the-blank (e.g., "6 + _ = 7").Each subtest consists of a set of 40 computation tasks in increasing difficulty within a time limit of 2 min.Children were instructed to work on the tasks in the given order and to solve as many tasks as possible within a given time limit.For the sample reported in the manual of the HRT1-4, test-retest reliability (r ≥ .87)and the criterion validity were good (r = .72between HRT1-4 and Deutscher Mathematiktest für vierte Klassen; DEMAT 4; Gölitz et al., 2006).The test score of the HRT1-4 was the sum of correct answers.
The convergent validity between HRT 1-4 and the screening procedure LONDI amounts to r = .772in Grade 2, r = .805in Grade 3, and r = .708in Grade 3. Sensitivity and specificity are high (SN = 85.7%;SP = 93.8%),pointing to a high validity of the screening instrument in diagnosing arithmetic disorders in elementary school.
Statistical Analysis.Ordinary least squares (one-level) models were estimated to predict the HRT raw score with the arithmetic verification task raw score.To examine the accuracy on the forecast of children with mathematical difficulties by their performance in the arithmetic verification task, we defined children with mathematical difficulties as having a score on the standardized test that was one standard deviation below the mean (HRT 1-4 T score ≤ 40; below the 16th percentile, respectively).Thus, 21 children in our sample were defined as children with mathematical difficulties.Likewise, children scoring below the 16th percentile in the arithmetic verification task score at each grade level were defined as children at risk.Given that norm-referenced scores were not yet available, we used the present sample as reference group.Based on these cut-off values, children were divided into four groups (Table 2): (a) Children with below average performance in the predictor and criterion variable (true positive; 18 children), (b) children with at least average performance in the predictor and criterion variable (true negative; 135 children), (c) children with below average performance in the predictor variable and at least average performance in the criterion variable (false positive; 9 children), and (d) children with at least average performance in the predictor variable and below average performance in the criterion variable (false negative; 3 children).Next, we calculated the sensitivity (percentage of actual positives correctly identified as such), specificity (percentage of actual negatives correctly identified as such), and relative improvement over chance (RIOC) index (see Loeber & Dishion, 1983) for Grades 2, 3, and 4 separately.
Moreover, ROC analyses were performed in order to estimate the accuracy on the forecast of children with mathematical difficulties by their performance in the arithmetic verification task.With a value area between 0 and 1 under the ROC curve (with 0.5 as the worst possible value indicating random classification), values near 1 would indicate perfect prediction.
Availability of Data and Materials.All data and analysis scripts are available at the repository of the Open Science Framework (https://osf.io/f8pzk/?view_only=cd81fff16b93 48f5995980d13cc753a6).Materials are available from the authors upon request.

Results
A one-way between subjects ANOVA was conducted to compare the effect of grade level on performance in HRT 1-4 and the arithmetic verification task.There was a significant effect of grade level on both HRT 1-4 raw score, F(2, 162) = 71.45,p < .001,η² = .47,and on the arithmetic verification task raw score, F(2, 162) = 37.19, p < .001,η² = .32.Table 1 provides descriptive statistics by grade level.Differences in mathematical performance were greater between Grade 2 and 3 (production task: d = 1.35; verification task: d = 1.02) than between Grade 3 and 4 (production task: d = 0.80; verification task: d = 0.48).

Prediction of the Standardized HRT Test Score by Arithmetic
Verification Task Raw Score.Linear regression models were estimated to predict the HRT raw score with the arithmetic verification task raw score as predictor.In line with H1, the model explained a significant and considerable proportion of variance in the HRT raw scores in Grade 2, F(1, 59) = 86.80,p < .001,R² = .595,in Grade 3, F(1, 48) = 88.14, p < .001,R² = .647,and in Grade 4, F(1, 52) = 52.16,p < .001,R² = .501.These findings substantiate the construct validity of the arithmetic verification across elementary school.

Study 2
Having established the validity of the newly constructed arithmetic verification task in Study 1, we focused on determinants of accuracy and processing speed in the arithmetic verification task in a larger sample.Given that children can ideally work on tablet-based tasks without supervision, we examined the performance in the arithmetic verification We decided to use this cut-off percentile because the 16th percentile refers to a performance 1 SD below the mean, which is commonly used as the cut-off percentile representing a performance below average.Consequently, 15.6% of the children were defined as having mathematical difficulties (15.4% in Grade 1, 15.9% in Grades 2 and 3, and 14.5% in Grade 4).Girls were descriptively more often affected than boys in Grade 2 (18.1% vs. 13.6%), in Grade 3 (19.4% vs. 12.3%) and in Grade 4 (16.4% vs. 12.8%), whereas boys were more often affected in Grade 1 (16.9% vs. 13.9%).
Since an unexpected gender distribution (more males than females evidenced math difficulties) was observed in the examined sample of Grade 1, we cannot exclude the possibility of selection effects in this sample.Therefore, the findings for Grade 1 should be interpreted with caution.
Design and Instruments.Data collection took place at the end of the school year (July 2020).The tablet-based arithmetic verification task was the same as in Study 1, this time integrated into a screening app for children with learning difficulties (Endlich et al., 2022).The Hessian Ministry of Education and the Arts informed elementary schools in that Federal State of Germany about the possibility to use the screening app and associated trainings for promoting mathematical abilities for free.Thus, the screening app can be downloaded, installed, and freely used at any time in the app store.This offer was made as part of compensatory measures to counteract learning backlogs due to COVID-19 lockdowns in schools during the pandemic.The screening app was intended to represent a low-threshold, voluntary service for all schools.Teachers from elementary schools could encourage their students to download the app and to complete the arithmetic verification task at home or in school on a mobile device.Sample characteristics are provided in Table 3.Given the unproctored nature of the assessment, we are unable to report details regarding the home situation of the children.Additionally, it should be noted that there may have been instances where the task was potentially undertaken by someone other than the intended participant, or where the child may have received external assistance in completing it.Nonetheless, there is reason to believe that such instances were exceptions to the rule and likely occurred infrequently, since the task was presented to the elementary school children by their teacher.
This assumption is supported by our data, which shows that the average scores achieved in the samples of Study 1 and Study 2 did not differ substantially from each other (small effect sizes in favor of Sample 2; d = 0.41 in Grade 2; d = 0.22 in Grade 3; d = 0.16 in Grade 4).

Statistical Analysis and Missing Data.
Responses that were unusually slow or fast (3 SD or more below the item-specific mean and 2 SD or more below or above the personspecific mean after standardizing each item by its item-specific mean) were excluded from the analyses because these responses were likely to be anomalous (comparable to other reaction time studies such as Schindler et al., 2018).Table 4 shows response latencies before and after data exclusion.Given that only very few responses had to be excluded (1,480 or 0.7 % of 204,540 data points), we decided to run the models with data from all participating children, excluding only these unusually slow or fast responses.These exclusions in general did not pose a problem for the analysis, since generalized linear mixed model (GLMM) is robust against missing data.Log-transformed response latencies were analyzed using linear mixed-effects models (LMM: Baayen et al., 2008) with crossed random effects for items nested within participants and participants nested within items, as a considerable amount of variance in the data could be attributed to differences between items and participants (see intraclass coefficients in Tables 5 and 6; Baayen et al., 2008).For accuracy data, GLMM with a logit link function were estimated, which is the method of choice for nested data structures with binary outcomes (Dixon, 2008).All models were estimated with the software package lme4 (Bates et al., 2021;Version 1.1-27) for R (Version 4.1.1).For hypothesis tests, we used the software package lmerTest (Kuznetsova et al., 2020;Version 3.1-3).All significance tests were based on a Type I error probability of .05.At the beginning of elementary school, children have not yet received instruction to cross the 10 barrier and consequently, for the lower grades, a reduced version that included only tasks within the number set up 1 to 10 was applied.Complete data were available only for number set 1 to 10 for Grades 1 to 4. Therefore, two separate models were estimated: One for number set 1 to 10 (Model 1; Grades 1-4) and one for number set 1-20 (Model 2; only Grade 4).Intercepts for persons and items were allowed to vary randomly.The following main effects (fixed effects) were included as dummy-coded predictor variables: foundational arithmetic operations (addition = 0, subtraction = 1), switching (standard condition = 0, switch condition = 1) and mathematical difficulties (control group = 0, mathematical difficulties = 1).For Model 1, grade level was centered around 2.5, the mean class level, to model linear developmental trends from Grades 1 to 4.Moreover, interaction effects were estimated for foundational arithmetic operations (addition vs. subtraction) and mathematical difficulties and for switching and mathematical difficulties.The parameter estimates for the fixed and random effects are provided in Table 5 for Model 1 and in Table 6 for Model 2.
Model 2: Number Set 1 to 20.The GLMM for response accuracy and the LMM for response latency (Model 2) included the 10 crossing as additional predictor.As described above, it was based on only the data of Grade 4, representing the end of elementary school.In addition to the predictor single versus multiple digits, we analyzed the effects of mathematical operation, switching and mathematical difficulties, both for accuracy and response latency as outcome variables.Once again, the interaction between ability level and the other factors was of particular interest, as it represents the surplus in cognitive load in children with mathematical difficulties.

Discussion
The present studies pursued two main goals.First, we examined to what extent performance in our newly constructed arithmetic verification task corresponded to performance in a standardized arithmetic production test (Study 1).Second, we focused on differences between children with mathematical difficulties and typically achieving children with regard to the impact of item-specific characteristics on performance, namely task-switching (switching between arithmetic operations vs. consistent operations), type of arithmetic operation, and number set.We were particularly interested in the interaction between difficultygenerating factors like multi-digit operations with the aptitude of the children.

Validity of the Newly Constructed Arithmetic Verification Task (Study 1)
We observed large differences in mathematical achievement between different grade levels, in production tasks as well as in verification tasks.Considering the cross-sectional study design, our data cannot provide information about individual developmental trajectories.Nevertheless, given that the development of performance was descriptively comparable for the arithmetic production and the arithmetic verification task (e.g., greater improvement between Grade 2 and 3 than between Grade 3 and 4 in the present study), our results may be regarded as the first evidence for the validity of the arithmetic verification task as a developmentally sensitive measure of mathematical skills.In line with H1 and in accordance with results obtained by Ashcraft et al. (1984), performance in the arithmetic verification task was closely related to performance in an established arithmetic production task and explained between 50% and 65% of the variance in this task within Grades 2 to 4.Moreover, children with mathematical difficulties, identified by below-average scores in  arithmetic production tasks (HRT T score ≤ 40), could be reliably identified by their performance in the arithmetic verification task (areas under the ROC curve > .90;RIOC = 82.9%).In sum, the arithmetic verification task raw score can be assumed to be an appropriate estimator for arithmetic performance in elementary school children.
Although arithmetic production and verification tasks differ slightly in their underlying cognitive mechanisms, namely, the additional stage in verification tasks, the evaluation of the presented solution (Ashcraft, 1982;Ashcraft et al., 1984)-the basic stages of processing are very similar: (a) encoding of the problem, (b) searching for the answer in long-term memory or solving the task, and (c) providing the answer (Ashcraft & Battaglia, 1978).The findings of Study 1 concerning the validity of the newly constructed arithmetic verification task support the idea that arithmetic production and verification tasks measure the same skills and constructs.The results of an additionally conducted confirmatory factor analysis also point to this one-dimensionality (see the Online Supplement at https://osf.io/f8pzk/?view_ only=cd81fff16b9348f5995980d13cc753a6).
These findings are of practical importance for the assessment of mathematical skills, as arithmetic verification tasks are far easier to implement in a computer-based fashion than arithmetic production tasks and provide many advantages.Among other things, abilities unrelated to mathematical ability, such as typing skills, play only a minor role in verification tasks, such tasks can be scored automatically and economically and provide not only accuracy data but also precise estimates of response latencies, which may be used as an indicator of processing load.

Performance and Processing Differences Between Children With Mathematical Difficulties and Typically Achieving Children (Study 2)
The results of the LMM (for response latencies) and GLMM (for accuracy data) in Study 2 targeted the impact of complexity-generating factors on accuracy and time consume and particularly their interaction with the aptitude of children.The underlying assumptions imply an excess in workload for children with mathematical difficulties, indicated by an interaction of item factors with person ability.Again, the analysis revealed the expected substantial increase in children's response accuracy from Grades 1 to 4 and a decrease of the time needed to solve the tasks with increasing grade.Older children may not only use counting strategies more accurately and faster (Widaman et al., 1992), but arithmetic fact knowledge is more and more accumulated over the course of elementary school.Older children can rely on fact retrieval more often and more reliably (Carpenter & Moser, 1984), which at the same time relieves working memory and frees capacities for more complex computations.
Overall, children with mathematical difficulties worked less accurately and spent more time on the tasks (H2).As predicted, task switching interfered more with accuracy and speed in children with problems in mathematics than in typically developing children.The task-switching condition introduced cognitive demands that required individuals to switch between different types of arithmetic operations.This placed additional demands on working memory and cognitive flexibility (Busch et al., 2013).Children with mathematical difficulties may have experienced difficulties in effectively managing these cognitive processes, leading to decreased accuracy and increased processing time in the task-switching condition.The children also spent disproportionately more time on subtraction (vs.addition) tasks (H4).In light of findings on working memory deficits in children with mathematical difficulties (e.g., Schuchardt & Mähler, 2010), this fact suggests a cognitive overload that limits mathematical tasks, even if they are rather simple calculations.Even the load imposed by simply switching between operations hinders mathematical processes in these children.Scenarios that require children with mathematical difficulties to flexibly switch back and forth, build situational models, and combine facts from different sources, such as in word problems, may therefore present an insurmountable challenge.
At the end of elementary school, performance in children with mathematical difficulties catches up to typically achieving children, but only for accuracy and not time consume.Thus, they still needed more time for solving the tasks.These results show quite clearly that children with mathematical difficulties suffer from retrieval deficits that require them to invest working resources for problems that can be solved easily by children with a normal level of mathematical skills.For mathematics instruction in the final year of elementary school and even more so in secondary school (which starts in Grade 5 in Germany), this fact poses a challenge.If children with mathematical difficulties need to spend working memory resources for foundational arithmetic operations such as subtraction or simply adapting to a new task, they again lack these resources for more complex mathematical problem-solving activities that become increasingly important in the secondary school.One possible solution would be to make use of focused and comprehensive interventions that are targeted at intense practicing of basic skills of children with mathematical difficulties.These interventions should be applied before children move on to the cognitively more demanding secondary school curriculum.

Conclusions for Assessing Mathematical Difficulties
For assessing mathematical skills, the results imply that reaction times are particularly valuable for identifying children with mathematical difficulties, especially in Grade 4. Neglecting this information can lead to overlooking affected children.The interactions in particular indicate that a highly valuable source of information remains unused if time on task is not recorded.Since children with mathematical difficulties had to spend excessive time for subtraction (vs.addition) tasks (see J. I. D. Campbell, 2008;Schneider & Anderson, 2010), for tasks with two digits (vs.one digit) operations and in the task-switching (vs.standard) condition (H6), these results are predestined to be used in diagnostics.Tests on diagnosing mathematical difficulties should particularly include measures on response latencies, induced by subtraction tasks, two-digit operations and task switching to systematically improve in the diagnoses of mathematical difficulties.At the same time, these effects must be accounted for when tests for mathematical skills involve reaction times, to avoid biases through context effects or differential item functioning.

Limitations and Directions for Future Research
One obvious limitation of this study is that the validity of the newly constructed arithmetic verification task was only shown for Grades 2 to 4 (Study 1).Thus, the results in Study 2 should be interpreted cautiously, especially for Grade 1.The reported similar development of mathematical achievement in production tasks as well as in our verification task between Grades 2 and 4 suggests that the verification task is also valid in Grade 1.Nevertheless, this conjecture needs to be supported in future studies.
As data were assessed online in Study 2, we obviously cannot report details about the situation at children's homes.For example, we cannot exclude the possibility that some children may have had more support than other children or even that another person (e.g., an older sibling) worked on the tasks.The participation rate and the number of students per class mirror the character of that study as an open field study and a free service to the schools.As performance was comparable between the assessments at home (Study 2) and in controlled settings in school (Study 1), it seems plausible to assume that-overall-children worked reliably in the less controlled setting at home.Another limitation of our study is the lack of information about the assignment of participating children to classes or schools.Due to data protection regulations, we were not able to collect further information.As a result, clustering on these levels could not be included in the models.
Regarding the unexpected gender distribution in Grade 1more boys than girls showed mathematical difficulties-we cannot exclude the possibility of selection effects in this sample.However, given the uniformity of results across grades, it seems unlikely that gender affected the central results of this study.
Unfortunately, we did not have the opportunity to measure working memory in the present studies.Future research should measure this important construct as a means to back up the interpretation of the present results in terms of cognitive resources.To address the research question whether children with mathematical difficulties require more time for subtraction tasks because they rely on finger counting or other inefficient strategies, future studies could involve interview or observation studies.Moreover, future studies could consider additional information from the school or the teacher regarding whether or not students evidenced mathematical difficulties.

Implications for Practice
The results indicate that children with mathematical difficulties might benefit from fostering their basic arithmetic skills.Improving these skills could reduce the cognitive load and release cognitive resources for demands such as task switching or more complex arithmetic tasks (Zhu & Zhao, 2023).
In terms of practical application, the screening procedure LONDI presented in the article enables an economical testing of an entire school class regarding potential mathematical difficulties.The tablet-based testing can be conducted within half an hour, with immediate and automated evaluation.The screening is already available (Endlich et al., 2022) and is currently undergoing further development.

Conclusion
Our results are relevant both from the perspective of basic research on mathematical difficulties as well as applied psychometrics of the assessment of mathematic skills.With our, from a technical point of view, very simple diagnostic approach, we were able to gather valuable information on the cognitive processes involved in solving arithmetic tasks.We showed differential effects of task switching, arithmetic operations, and multi-digit calculations in children with mathematical difficulties compared with normally developing children.We assume that these effects are directly linked to underlying deficits in the routinization of arithmetic procedures and the less efficient access to numerical long-term factual knowledge.In that sense, they mirror one of the core problems in children with mathematical difficulties.Thus, process data have a high diagnostic benefit and can further increase the informative value in the construction of test procedures.To our knowledge, these differential effects are not yet widely used to identify children in need for compensatory measures.At the same time, these process data represent a great potential to advance diagnostics, which usually rely only on sum scores of correctly solved items.Future diagnostic procedure could tap on the wealth of process data and especially reaction times and reaction time differences between conditions available in computerbased testing, to further increase the quality of the assessment, beyond gross measures of performance.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Interactions Between (A) Switching Conditions and MD and (B) Arithmetic Operations and MD for Response Latencies.

Table 1 .
Descriptive Statistics and Correlation Coefficients for Production (HRT 1-4) and Arithmetic Verification Tasks at Grade 2, 3, and 4 (Study 1).Note.The reported data represents the raw data of both tests.Correlations calculated between raw scores of HRT 1-4 and the AVT.HRT = Heidelberger Rechentest; AVT = arithmetic verification task.***p < .001.task in an ecologically valid setting: In Study 2, children worked individually at home and the tests were administered online.
children represented the control group.We used the arithmetic verification task raw score (sum of the correctly solved items within 11 min) as an adequate measure for estimating arithmetic competencies, as indicated by our findings in Study 1.Nevertheless, it is worth noting that the sample size in Study 1 was relatively small (n = 165), which should be considered when interpreting the results.

Table 2 .
Arithmetic Verification Task as Predictor for Children With Mathematical Difficulties Assessed With HRT 1-4 (Study 1).
a True positive.b False positive.c False negative.d True negative.

Table 3 .
Sample Characteristics for Study 2.

Table 4 .
Descriptive Statistics for Response Accuracy and Response Latency (Raw Score in ms and Log-Transformed) as Dependent Variables in the Arithmetic Verification Task.
a Proportions.

Table 5 .
Fixed Effects and Variance Components in the Generalized Linear Mixed Model for Response Accuracy and in the Linear Mixed-Effects Model for Response Latency for Grades 1 to 4 (Model 1).

Table 6 .
Fixed Effects and Variance Components in the Linear Mixed-Effects Model for Response Latency and in the Generalized Linear Mixed Model for Response Accuracy for Number Set 1-20 at the End of Elementary School (Model 2).