Structure and Implementation of Novel Task Rules: A Cross-Sectional Developmental Study

Rule-based performance improves remarkably throughout childhood. The present study examined how children and adolescents structured tasks and implemented rules when novel task instructions were presented in a child-friendly version of a novel instruction-learning paradigm. Each miniblock started with the presentation of new stimulus-response mappings for a go task. Before this mapping could be implemented, subjects had to make responses in order to advance through screens during a preparatory (“next”) phase. Children (4–11 years old) and late adolescents (17–19 years old) responded more slowly during the next phase when the next response was incompatible with the instructed stimulus-response mapping. This instruction-based interference effect was more pronounced in young children than in older children. We argue that these findings are most consistent with age-related differences in rule structuring. We discuss the implications of our findings for theories of rule-based performance, instruction-based learning, and development.

. These instructions described the stimulus-response mapping for the go phase of the block (e.g., "© = left, £ = right"). Before subjects could apply these instructions, they had to advance through a next phase (the go and next phases were indicated by the stimulus color). In this phase, stimuli were presented, but their identity could be ignored, and subjects simply had to press the same next key on each trial (which was either the left or right key). Even though the stimulus-response rules had never been applied before, subjects were slower to respond to next stimuli when the next response and the go response were incompatible ("£" requiring a left response in the next phase but a right response in the go phase) compared with when they were compatible ("©" requiring a left response in both phases). This instruction-based interference effect shows that instructions enable "automatic" task performance .
Several lines of research suggest that interference during the task-implementation or execution phases can be reduced by creating hierarchical task structures (Cole, Braver, & Meiran, 2017). In a hierarchical task structure, a task cue (such as stimulus color) or context determines the relevant response rules. Such hierarchical information can shield ongoing tasks (e.g., traveling to the airport) from pending instructions (e.g., walking in London), thereby reducing instruction-or rule-based interference.

The Development of Structuring and Implementing Rules
Rule-based behavior improves remarkably from infancy through childhood and adolescence (Bunge & Crone, 2009;Diamond, 2013). Such developmental improvements might be due to the ability to create and use hierarchical task structures (Bunge & Zelazo, 2006). For example, Amso, Haas, McShane, and Badre (2014) manipulated hierarchical structure (number of subtasks or branches) and number of competing alternatives within a branch independently. Age-related performance differences were primarily influenced by task structure rather than competition between choice alternatives (see also Unger, Ackerman, Chatham, Amso, & Badre, 2016). In other words, the ability to structure rules improved throughout childhood.
Other studies also found age-related differences in the implementation phase. For example, Zelazo, Frye, and Rapus (1996) observed a dissociation between knowing and doing in 3-year-olds. In a simple ruleswitching paradigm, 3-year-olds kept doing the task they started with, even when instructed to perform the other task instead. Importantly, when the children were asked what the task rules were, they could accurately recall them, suggesting they experienced difficulties with implementing (but not remembering) the appropriate rules. The proactive-control literature also suggests that young children are less likely to implement or maintain rules than older children (Munakata, Snyder, & Chatham, 2012). This could be due to increased costs associated with advanced rule implementation. For example, Blackwell and Munakata (2014) showed that adding a secondary task to a card-sorting task particularly impaired performance of young children who tried to maintain task-related information over time (compared with children who did not maintain the rules). Thus, for young children, implementing rules in advance comes with challenges and can produce behavioral costs.

The Present Study
To date, most developmental studies have focused on rule-based performance in situations in which children alternated between well-practiced tasks. This research largely ignores the early stages in which the novel instructions are presented and implemented for the first time (i.e., the first trials or blocks are usually practice and not further analyzed). However, task structures created in the beginning of the experiment determine future task performance (Bhandari & Duncan, 2014). In other words, these early phases are crucial.
The present study examined age-related differences in the task-formation and early implementation stages when novel task instructions were presented. We developed a child-friendly version of the next paradigm (Meiran et al., 2015a). This task combines two elements that are usually studied separately, namely the ability to follow or implement instructions and the use of hierarchical structures to shield pending instructions. At the beginning of each miniblock, we showed the children two cartoon images of their "friends" (taskinstruction phase; Fig. 1). New images were used for each miniblock. Some of their friends lived on the left side of the street, and some of them lived on the right side. In the evening (go phase), they had to bring their friends home by pressing the appropriate left or right key (task-implementation phase). However, in the morning, before they could go home, all friends had to go to school first (next phase), which was located on the left side of the screen for half of the subjects and the right side for the other half. The go and next phase were indicated by morning and evening screen backgrounds, respectively. Children (4-11 years old) and late adolescents (17-19 years old) performed this task.
Hierarchical control is needed in this task, since the next and go phases create two different contexts. As discussed above, the ability to contextualize behavior and structure tasks develops in young childhood. This ability would reduce interference from one context (go) to the other (next). Therefore, the hierarchical-structure account predicts that instruction-based interference effects (i.e., slower responding when the next response is incompatible with the instructed go response) should be more pronounced for younger children than for older children.
Task rules have to be implemented or maintained in a highly accessible state for an instruction-based interference effect to be observed . Theoretical analyses link automatic effects of instructions to proactive control (Cole et al., 2017), but as noted above, young children are less likely to implement rules in advance. Therefore, the advance-implementation account predicts impaired performance in the go phase, but less pronounced interference effects in the next phase for younger children than for older children (contrasting with practice-based interference effects that are typically larger for younger children; e.g., Huizinga, Dolan, & van der Molen, 2006).

Subjects
One hundred seventy-eight children (4-11 years old) from two local schools in Devon (United Kingdom) and 30 late adolescents (17-19 years old) from two local colleges (also in Devon) participated in this experiment (Table 1). We excluded 5 children because they did not complete the experiment and 7 children because accuracy in the go phase was below 60%. In the Supplemental Material available online, we show that excluding  Fig. 1. Experimental design. The top two rows show the four phases of each miniblock (a) and the trial course for next trials (b). The trial course for go trials was very similar to the course for next trials, except that the stimulus disappeared as soon as a response key (correct or incorrect) was pressed. Red, green, and blue (RGB) values are given in (a) for the two background colors. The size of the screen and the stimuli are shown in (c).
these subjects or certain trial types (see below) did not alter the main findings. We aimed to recruit as many children and adolescents as possible. Therefore, we contacted two local primary schools, and all children for whom we obtained parental consent were invited to participate. Because we did not know in advance how many parental consent forms we would obtain, we could not determine the exact target sample prior to the experiment. The decision to stop testing was not influenced by the analyses of the data.
The children received a small prize (a sticker of a cartoon character of their choice and a certificate). The adolescents received monetary compensation (£2.50). The experiment was approved by the local research ethics committee. For the children and underage adolescents, parental informed consent and the subjects' assent were obtained. We obtained written informed consent from the other adolescents.

Procedure
The experiment took place in a quiet room at school (the children) or college (the adolescents) and was run on a 13-in. MacBook Pro using the Psychophysics Toolbox (Brainard, 1997). We tested one subject at a time. Stimuli consisted of cartoon images of various animals, imaginary creatures, and people. We used different stimuli in each miniblock, and they were easily distinguishable from each other. The "a" and "l" keys of the keyboard were the response keys, and we put arrow stickers on them as a reminder. Both keys were used in the go phase. For half of the subjects, the "a" (left) key was the next response; for the others, the "l" (right) key was the next response.
Each miniblock consisted of four phases: an instruction phase, a next phase, a go phase, and a feedback phase (Fig. 1a). In the instruction phase, we presented the novel stimulus-response mappings for the go phase, and a response reminder for the next phase (i.e., a school building on the left or right of the screen, depending on the counterbalancing of the next response). The go information appeared on the top of the screen against a dark-blue background ("evening"); the next reminder appeared on the bottom against a light-blue background ("morning"). The instructions remained on the screen until subjects had pressed a key and at least 3 s had elapsed.
The trial course of the next phase, indicated by a light-blue background, is depicted in Figure 1b. After an intertrial and fixation interval, a stimulus appeared and remained on screen until the correct next key was pressed. Thus, if subjects pressed the incorrect key first (e.g., "l" when the next response was "a"), the stimulus would remain on the screen; it would only disappear once they had pressed the next key. The number of next trials differed between blocks (see below). The go phase, indicated by a dark-blue rectangle, always consisted of two trials. The trial course was the same as in the next phase, except that the stimulus disappeared as soon as a response key (correct or incorrect) was pressed.
In the feedback phase, we presented a "clock" (Fig.  1a). A dark-gray area on the clock face depicted the total response latency for the two go trials. For each incorrect go response, we added a time penalty (indicated by a red area on the clock face). We also played a sound during the feedback phase: If subjects did not make go errors, we presented the sound "yihaa" (if they had responded faster than in the preceding miniblock) or "ok" (if they had responded slower); we presented the sound "oops" if they had made a go error. The feedback remained on the screen for 1.5 s, after which the following miniblock started.
The experiment consisted of a practice phase and an experimental phase. The practice phase consisted of two parts. First, we explained the main task (see Fig.  S1 in the Supplemental Material for the main instructions), and subjects could practice the next and go responses. Then we presented three miniblocks that consisted of the instruction, next, go, and feedback phases. The practice miniblocks consisted of zero, one, or two next trials (each number of next trials occurred once, and the order was randomized).
The experimental phase consisted of 48 miniblocks. Twenty-four miniblocks consisted of one next trial, 16 consisted of two next trials, and 4 consisted of three next trials; in 4 miniblocks, the go phase started immediately (so there were no next trials). We used this trial distribution to make the start of the go phase unpredictable and to encourage preparation. The order of the miniblocks was further pseudorandomized: Two of the first 10 miniblocks were zero-next blocks. Again, this was done to encourage preparation. Subjects received a break after every 12 miniblocks; they could determine the duration of the break themselves. The whole experiment lasted 10 to 15 min (although the youngest children sometimes took a little longer).

Dependent variables and analyses
All data processing and analyses were completed using R software (R Core Team, 2016). Anonymized data files, R scripts, and experiment documentation are available on the Open Science Framework (https://osf.io/ am4yk/). For the next analyses, we focused on the first next (Next 1) trial because the instruction-based interference effect is largest on the first trial (Meiran et al., 2015a), and performance on later next trials could already be modulated by stimulus-specific practice effects. We decided on this before data collection had started. We excluded miniblocks in which subjects made go errors, as these could indicate that subjects did not process the instructions (resulting in a data loss of 17%). We focused on three dependent variables. First, we analyzed the probability of a correct Next 1 trial. Second, we analyzed the latency of the next response with all (correct and incorrect) Next 1 trials included. This response time (RT) analysis was included in order to make the results comparable with those of Meiran et al. (2015a), who did not examine next errors. Furthermore, this measure might be most sensitive, as it combines all trials in which traces of inappropriate motor activity (Everaert, Theeuwes, Liefooghe, & De Houwer, 2014;Meiran, Pereg, Kessler, Cole, & Braver, 2015b) cause interference or, in case the activity is high enough, an incorrect response. Third, we recalculated RTs after exclusion of incorrect Next 1 trials. For both RT analyses, we used a trimming procedure: We excluded trials on which RT was less than 100 ms or greater than 10 s; then we calculated the mean and standard deviation, and we excluded RTs that were 2.5 standard deviations above the mean. This trimming was done for each subject and condition separately. This resulted in an additional data loss of 3%. Table 2 shows the average number of trials for each condition and age group.
For the go analyses, we focused on two dependent variables: accuracy and RT. For the RTs, we excluded incorrect go trials and used the same trimming procedure as the one used for the next analyses (combined, this resulted in a data loss of 15%). For all variables, we analyzed performance using the ezANOVA function (Lawrence, 2016) in R with age (in years) as a continuous between-subjects variable and compatibility (the next analyses) or trial number (first or second trial in the go analyses) as categorical within-subjects variables.
This analysis is very similar to a multiple regression with an interaction term or a standard analysis of covariance (ANCOVA; except that the continuous variable is typically considered a nuisance variable in an ANCOVA, whereas the continuous variable was the main interest in the present study; for a similar approach, see Verbruggen & McLaren, 2017). We performed two sets of analyses. First, we performed the analyses with all subjects included. We grouped all adolescents together and used the same age value for all of them (i.e., 18). Table 3 provides an overview of these analyses. Second, we repeated the analyses without the adolescents in case this "extreme" group had an undue influence on inferential statistics. Table 4 provides an overview of these analyses. Note that the main outcomes of the two sets of analyses were similar.
In a pilot study with adults (N = 29; see the Supplemental Material), we found medium to large instructionbased interference effects (Cohen's d z s = 0.65-1.00). Therefore, we also examined the main effect of compatibility for the different age groups. To increase power and reduce the number of significance tests, we combined the data of the 4-and 5-, 6-and 7-, 8-and 9-, 10-and 11-, and 17-to 19-year-olds, resulting in five groups. Table 5 provides an overview of these analyses.
In the main analysis, we focused on the raw RT data. In the Supplemental Material, we report an analysis of proportional instruction-based interference scores. The main numerical trends were similar to those in the analysis reported below.

Next phase
We found large interference effects in all analyses: Subjects made more errors and responded more slowly on incompatible trials than on compatible trials . This conclusion is supported by the inferential  (Tables 3 and 4). Furthermore, the RT analyses revealed general age-related differences. Most importantly, the RT analyses, which included next responses that came after erroneously pressing the wrong key, also revealed significant interactions between age and compatibility: The intention-based interference effect decreased over age, which is consistent with the hierarchical-structure account but inconsistent with the  advance-implementation account. This decrease can also be seen in Figure 2d, which shows how the intention-based interference effect is influenced by age and overall response speed. For the RT analysis that included only correct next responses, the interaction was not significant (p = .051) when adolescents were included, but it was significant (p = .002) without them (i.e., when the "extreme" group was excluded; see above). The interaction was not significant in both accuracy analyses (ps > .14). Table 5 shows that the instruction-based interference effect was significant for all measures and age groups.

Go phase
The go analyses revealed that error rate and RT decreased over age and that performance was generally worse on the first go trial than on the second go trial. The latter presumably reflects a task-switch cost (for reviews, see Kiesel et al., 2010;Vandierendonck, Liefooghe, & Verbruggen, 2010). The RT cost was larger for the younger children than for the older children and late adolescents, which is consistent with findings reported in the previous literature (Chevalier & Blaye, 2009;Huizinga et al., 2006).

Measure and predictor
Sum-of-squares effect

Exploratory analyses
We also ran an unplanned analysis to explore how the next effect evolved throughout the experiment. The stimulus-response mappings changed in every miniblock, so subjects could not practice the mappings. However, they could learn and practice the application of the overall task structure throughout the experiment. Both "fast" and "slow" learning mechanisms could produce such task-or structure-learning effects (Verbruggen, McLaren, & Chambers, 2014). Therefore, we repeated all next analyses with experiment half (first 24 miniblocks vs. last 24 miniblocks) as an additional withinsubjects variable. Because the number of trials was halved, we had to exclude some extra subjects from the RT analyses because of missing cells after data trimming (1 subject excluded in the next all-RT analysis: 6 subjects excluded in the next correct-RT analysis).
The main RT analysis with all next responses included (Fig. 3b), revealed that the instruction-based interference effect decreased substantially throughout the experiment (first half: next effect = 267 ms; second half: next effect = 141 ms; p = .007, Table 6). A decrease was observed for all age groups, and the three-way interaction was nonsignificant, p = .292. The correct-RT analyses did not reveal any significant interactions between the interference effect and experiment half.
The accuracy analyses also showed that the interference effect decreased during the experiment ( Fig. 3a; p < .001). Interestingly, significant three-way interactions were observed in the analyses with and without adolescents (Tables 6 and 7). Figure 3 shows that the interference effect decreased more for younger children than for older children. This is consistent with the idea that young children have difficulties with the use of a hierarchical structure but that this improves with some practice. However, it also shows that in the second part of the experiment, the effect was numerically largest for the late adolescents. It seems unlikely that this was due to a floor effect or a speed/accuracy trade-off (e.g., error rates were lower for the 11-year olds than for the late adolescents, yet their next RTs were comparable). Instead, this finding could reflect the costs of increased proactive control for the late adolescents. Indeed, go performance was numerically better for the late adolescents. Thus, a possible explanation for these age-related differences is that late adolescents biased the go task to a larger extent than the older children, leading to better go performance but larger costs in the next phase. Throughout the experiment, we used the feedback screens to encourage fast and correct go performance, without mentioning next performance. This could have induced a go bias and, therefore, higher error rates in the next phase. This highlights that proactive control or rule implementation can come with certain costs, even in adolescence.

General Discussion
We examined structuring and implementing novel task instructions in children and late adolescents. We found that subjects' ability to prepare novel tasks improved with age, as seen in go performance. However, this did not result in an age-related increase in intentionbased interference effects: We found interference effects on Next 1 trials for all age groups, but these tended to be largest for the youngest children (4-to 5-year-olds).
These results are consistent with the hierarchicalstructure account. Situations in which multiple rules can be relevant (in our case, the next and go rules) require a hierarchical structure to determine the correct response and to reduce interference between competing task elements. Young children face difficulties with creating or using such structures (Amso et al., 2014;Unger et al., 2016). This could explain the larger instruction-based interference effects for the youngest children. The hierarchical-structure account also receives support from another recent next study (Meiran, Pereg, Givon, Danieli, & Shahar, 2016), which demonstrated that adults who were less successful in the go phase, had poorer fluid intelligence, or were generally slower also had a larger next effect (i.e., adults with poorer working memory might also experience more problems with hierarchical or complex task sets, somewhat similar to children, than adults with better working memory). Meiran et al.'s (2016) findings are also consistent with research on goal neglect, which suggests associations between fluid intelligence and the ability to chunk task knowledge (Bhandari & Duncan, 2014). Our results did not provide much support for the advance-implementation account as described in the introduction. Previous developmental work suggests that young children are less likely to implement task rules in advance than older children, adolescents, and young adults. Therefore, the advance-implementation account predicted that go performance would be impaired but the instruction-based interference effect in the next phase should be absent (or at least be smaller) for the younger children. Instead, we observed the largest interference effects for the youngest children. The presence of the interference effects and decent go performance indicate that even the youngest children in our sample could implement novel stimulusresponse rules in advance. This conclusion is consistent with a study showing that young children engaged in proactive control (i.e., they prepared rules in advance) when the task was more difficult (Chevalier, Martis, Curran, & Munakata, 2015). Here, we used novel stimulus-response mappings in each miniblock. This prevented stimulus-specific practice and the consequent formation of long-term memory traces, which could have encouraged the implementation of the rules during the instruction phase. However, consistent with the results of Blackwell and Munakata (2014), our findings showed that implementing these rules came with a substantial cost in young children (i.e., large interference effects during the next phase).
The exploratory analyses revealed that the instruction-based interference effects (in the accuracy and main RT analyses) decreased throughout the experiment. In the accuracy analyses, this effect was most pronounced for the youngest children. The decrease is consistent with findings in adults (Meiran et al., 2015a). In next experiments, subjects cannot learn specific stimulus-response associations. However, they may gradually get better at "separating" the go phase (indicated by the dark-blue background) from the next phase (indicated by the light-blue background). In other words, we speculate that hierarchical structures (with the context cue modulating the choice options) and their usage further evolved throughout practice, reducing interference between the go and next components of the task. This idea is consistent with other findings in the task-learning literature (Bhandari et al., 2017). By contrasting the hierarchical-structure and advance-implementation accounts, readers may get the incorrect impression that the task-formation and taskimplementation phases are independent. But when people create an inefficient nonhierarchical structure or when they have difficulties managing the contingencies within the structure, more competition between the various choice options occurs (producing larger instruction-based interference effects). Thus, task structure will have knock-on effects on the implementation stage. Interestingly, goal neglect (i.e., the dissociation between knowing and doing) has also been associated with the formation of inefficient task structures (Bhandari & Duncan, 2014). This raises the intriguing possibility that failing to implement or execute a task (i.e., goal neglect: a negative "symptom") and applying the rules when not required (i.e., instruction-based interference: a positive symptom) both arise from a failure to create an efficient task structure. Future research is needed to test how these phenomena are related.
To conclude, we observed intention-based interference effects in all groups, indicating that even the younger children in our sample implemented novel rules at the beginning of each miniblock. We attribute the numerically larger RT costs to age-related differences in the creation of hierarchical task structures. Furthermore, we propose that the next paradigm might be a useful tool to study structuring and implementation of instructions in different age groups and, more generally, the powerful effects that instructions and intentions can have on behavior.