Searching Through Alternating Sequences: Working Memory and Inhibitory Tagging Mechanisms Revealed Using the MILO Task

We used the Multi-Item Localisation (MILO) task to examine search through two sequences. In Sequential blocks of trials, six letters and six digits were touched in order. In Mixed blocks, participants alternated between letters and digits. These conditions mimic the A and B variants of the Trail Making Test (TMT). In both block types, targets either vanished or remained visible after being touched. There were two key findings. First, in Mixed blocks, reaction times exhibited a saw-tooth pattern, suggesting search for successive pairs of targets. Second, reaction time patterns for vanish and remain conditions were identical in Sequential blocks—indicating that participants could ignore past targets—but diverged in Mixed blocks. This suggests a breakdown of inhibitory tagging. These findings may help explain the elevated completion times observed in TMT-B, relative to TMT-A.


Introduction
Recently, we introduced a mobile app version of the Multi-Item Localisation (MILO) task . The MILO task probes the temporal constraints that influence target selection during search through multi-item sequences. Previously, we have used the MILO task to show that when locating a given item in a sequence, both retrospective (i.e., where you have been) and prospective (i.e., where you need to go next) context within a trial affects search performance (Horowitz & Thornton, 2008;Thornton & Horowitz, 2004).
The goal of this study was to examine what happens to these context effects when a trial contains two interleaved sequences. Such an increase in task demands is an important component of the widely used Trail Making Test (TMT), where interleaving sequences is known to systematically increase overall completion time (Bowie & Harvey, 2006;Lange et al., 2005;Rabin et al., 2007;Reitan, 1958;Salthouse, 2011). Here, we use a new MILO task variant that borrows directly from the TMT, allowing us modulate inherent task demands while mapping out the patterns of responses within interleaved trials. Our primary focus is on further understanding the nature of inhibitory tagging mechanisms thought to operate during MILO and related search tasks (e.g., Klein & MacInnes, 1999;Wang & Klein, 2010). However, we also hope to shed new light on exactly why performance deteriorates when sequences are interleaved, observations that may be of clinical relevance when interpreting TMT costs. We begin by briefly introducing the TMT and the MILO task before presenting our new experimental findings.

The TMT
The TMT is frequently administered as part of standard neuropsychological assessment and is also a very common research tool (Bowie & Harvey, 2006;Lange et al., 2005;Rabin et al., 2007;Reitan, 1958;Salthouse, 2011). Usually taken as a pen-and-paper test (although see e.g., Fellows et al., 2017;Salthouse & Fristoe, 1995;Woods et al., 2015), it comes in two variants. In TMT-A, participants are asked to quickly and accurately draw lines between numbered circles on a page without lifting their pen. Each circle contains numbers between 1 and 25, and the instruction is to start at the number 1 and proceed in order until reaching the number 25. In TMT-B, the page contains both numbers and letters, and participants are instructed to alternate in order between them, starting at the number 1, followed by the letter A, then the number 2, the letter B, and so on, until reaching the number 13 (see Bowie & Harvey, 2006 for protocol details). The main dependent measure is total time to complete the test-measured with a stopwatch-although error information can also be recorded (e.g., Klusman et al., 1989;Kopp et al., 2015).
Much of the clinical and research interest in this task centres on the fact that TMT-B is considerably more demanding than TMT-A, giving rise to consistently longer completion times. While both variants place demands on visual search, psychomotor skill, and processing speed (see Sa´nchez-Cubillo et al., 2009 for review), TMT-B is thought to place additional demands on working memory, set-switching, and inhibitory control (Arbuthnott & Frank, 2000;Kortte et al., 2002;Salthouse, 2011;Sa´nchez-Cubillo et al., 2009). The involvement of these cognitive components has been established by observation of clinical subpopulations (for review, see Lange et al., 2005) and in numerous correlation/regression studies, pairing TMT measures with other well-known tasks (e.g., Arbuthnott & Frank, 2000;Kortte et al., 2002;Salthouse, 2011).
Although there continues to be debate about precisely which cognitive components underlie TMT-B costs, there appears to be general agreement that it targets the fluid (Salthouse, 2011) or flexible (Kortte et al., 2002) cognitive abilities associated with executive function (Fellows et al., 2017;Sa´nchez-Cubillo et al., 2009). In this study, our question was whether the need to engage additional cognitive components with interleaved sequences would also influence MILO performance. If so, we hoped that the within-trial resolution and temporal context manipulations available in the MILO task would shed additional light on the nature of the costs involved.

The MILO Task
We developed the MILO task as a computer-based research tool for exploring the temporal context of visual search (Horowitz & Thornton, 2008;Thornton & Horowitz, 2004). In addition to the iPad app version of the task used here , there is also a cross-platform online version that can be previewed at https://maltacogsci. org/MILO/DEMO/. Both versions of the task, along with the source code, may be freely obtained by contacting the authors.
As in TMT, MILO participants are required to search through a specific sequence of targets, such as the letters A through H, or the numbers 1 through 8 in order. Rather than connecting the elements on paper, MILO responses involve clicking directly on targets with a mouse or touching them on a touchscreen. Importantly, in addition to measuring overall completion time, the MILO task also provides a profile of reaction time patterns across all items in a sequence. This is achieved by having participants complete multiple (e.g., 20), short (e.g., 8 item) trials using either randomly generated novel display layouts or fixed patterns, depending on the research question. The use of multiple trials makes it possible to establish within-participant estimates of the time taken to locate each subsequent item in a sequence, a measure we call serial reaction time (SRT; Horowitz & Thornton, 2008;Thornton & Horowitz, 2004. A number of simple manipulations allow exploration of both retrospective (i.e., the influence of previous actions on localisation of the current target) and prospective (i.e., the influence of future plans on the current target) aspects of search behaviour with the MILO task. For example, in our previous work, we were able to show that participants had almost perfect memory for the locations they had already visited during a trial. We did this by introducing a manipulation in which targets either vanished or remained visible once selected. The SRT patterns for these two types of trial were essentially identical (Thornton & Horowitz, 2004) indicating very effective inhibitory tagging (Klein, 1988). This tagging process is location-based rather than object-based, as the Vanish and Remain SRT functions separate as soon as either local or global motion is added to the displays (Horowitz & Thornton, 2008).
We have also used the MILO task to demonstrate that participants consistently plan ahead when engaged in sequential search. Such planning is most obvious at the start of a sequence, reflected in highly elevated first response times (Basoudan et al., 2019). However, using a shuffle manipulation, in which the identities of items ahead of the current target swapped positions, we were able to show planning effects influencing SRT patterns up to four items ahead (Thornton & Horowitz, 2004; see Kosovicheva et al., 2020 for related findings).

Current Study
In this study we modified the basic MILO task by including two sequences on each trial. These sequences mimic the intrinsic load manipulation of the TMT. Our motivation for studying interleaved sequences was to gain further understanding of the nature of inhibitory tagging during MILO Remain trials (Horowitz & Thornton, 2008;Thornton & Horowitz, 2004. In previous studies from our group, having participants perform a secondary task while completing MILO trials, such as listening for comprehension (Luffingham, 2013) or retaining a spatial layout in memory (Zammit, 2017), did not lead to any changes in Remain SRTs relative to Vanish trials. The lack of interference suggested that the inhibitory mechanism might be automatic and encapsulated so as not to require high-level cognitive resources.
Here, we borrowed directly from the TMT and produced an MILO variant that could be performed either with low or high intrinsic load. Figure 1 shows an example display in which there are always 12 items, the letters A-F and the numbers 1-6. During Sequential blocks of trials (low load; corresponds to TMT-A), participants were instructed to touch the letters in order, followed by the numbers, or vice versa (counterbalanced). During Mixed blocks of trials, the instruction was to alternate between letter and number targets (high load; corresponds to TMT-B).
As with TMT, we expected overall trial completion time to be longer in Mixed blocks compared with Sequential blocks. Note that with MILO, display characteristics are identical in the two block types-with the same items appearing on every trial, albeit in random positions-so any difference in timing would only reflect changes in task difficulty. In Sequential blocks, we expected the SRT patterns to be very similar to those observed in our previous studies, the only unknown being the cost of switching sequences after the sixth response. During Mixed blocks, our question was whether the additional cognitive resources needed to interleave two target types within a trial would interact with the need to inhibit past locations on the Remain trials.
Figure 1. Example screen shot from the milo task with two sequences. In sequential blocks of trials, participants touched all of the letters in order before the digits, or vice versa (counterbalanced). In mixed blocks, targets from two sequences were interleaved, so the correct order would be A-1, B-2, and so on, or 1-A, 2-B, and so on, again counterbalanced across participants.

Participants
Twelve participants (10 females; mean age ¼ 24.6 years, standard deviation ¼ 2.5; 2 left handed) from the University of Malta community took part in this study in return for a payment of e10. Sample size was determined prior to data collection. An analysis of 12 previous data sets showed an average observed effect size (partial-eta squared) of 0.72 (standard deviation ¼ 0.2), which indicated a minimum sample size of 9 participants would be sufficient to detect relevant changes in the pattern of SRTs. See  for further details of this power analysis. Participants were randomly assigned to a group that started each trial with a letter or number, with six participants per group.
All participants reported normal or corrected-to-normal vision. Prior to taking part in the study, participants were given written information about the study, and consent forms which were signed. All methods and procedures conformed to the Ethics and Data Protection Guidelines of the University of Malta.

Equipment
The stimuli were displayed on an iPad Air (Model A1474) with screen dimensions of 20 Â 15 cm (24.6 cm diagonal) and an effective resolution of 1,024 Â 768 pixels at 132 ppi. The iPad was placed on a table in front of the participant in landscape mode. As viewing distance could only be approximately estimated at 50 cm, we report stimulus measures in both pixels and degrees visual angle. The MILO Switch app was custom written in objective-C using Xcode and Cocos2d libraries. Source code is available on the Open Science Framework (OSF) page associated with this study at https://osf.io/ugw9n/.

Stimuli
The stimuli are shown in Figure 1. Characters were drawn in black within the context of red and white (numbers) and blue and white (letters) pool balls, which had shading to provide a slight three-dimensional effect. Each ball had a diameter of 85 pixels and subtended approximately 1.8 visual angle. The 12 targets were positioned randomly on each trial within an invisible 4 Â 4 grid that was centred on the screen. Individual targets were randomly jittered by up to 80 pixels horizontally and 30 pixels vertically within the grid to reduce the apparent regularity of the display.

Procedure
The experiment was run in a sound-attenuated booth under low lighting conditions with no overhead lights, in order to minimise screen glare. Following typical TMT procedure, all participants completed the less demanding Sequential block before the Mixed block of trials. As the Vanish and Remain trials were interleaved, this variation was explicitly shown. The experimental session lasted approximately 30 minutes, with participants completing two blocks of 30 correct trials (each containing 15 Vanish and 15 Remain trials), with the Sequential block always preceding the Mixed block. An error would immediately terminate a trial, which was then automatically replaced with a new random version of the same condition. Based on our previous studies, we expected error rates to be extremely low. We include the average number of error trials per block as part of the data figures below, and the raw data are available in the Supplementary Material. However, this dependent variable was not included in our analysis.

Data Analysis
To provide consistency with TMT studies, we begin by reporting overall median completion times. These were analysed using a 2 (Block Type: Sequential/Mixed) Â 2 (Condition: Vanish/Remain) repeated measures analysis of variance (ANOVA).
To more fully capture within-trial patterns of performance, we report the median SRT for each target position, averaged across all trials completed by each participant in each condition. We first present the data for each type of sequence separately, using the same 2 (Condition: Vanish/Remain) Â 12 (Target Item) repeated measures ANOVA, and then, for the sake of completeness, we compare across block type using the full 2 (Block Type: Sequential/Mixed) Â 2 (Condition: Vanish/Remain) Â 12 (Target Item) repeated measures ANOVA.
Violations of sphericity involving the Target factor were corrected by applying the Greenhouse-Geisser adjustments to the appropriate degrees of freedom. Note that full ANOVA results are provided as Supplemental Material, with text reporting focusing on the findings of interest.

Data Availability Statement
The raw data and full summary statistics are available on the OSF page associated with this study at https://osf.io/ugw9n/ Figure 2 shows overall completion times and error rates as a function of Block Type and Condition. Consistent with TMT studies, the completion time data show that having to interleave targets from both sequences is more demanding, giving rise to overall slower responses in Mixed blocks (M ¼ 10.3 seconds, standard error [SE] ¼ 0.5) than Sequential blocks (M ¼ 7.2 seconds, SE ¼ 0.34). Consistent with previous MILO studies, the   Figure 3 show SRTs as a function of Condition and Target Item for the Sequential block of trials. This pattern very closely resembles those we have observed in previous MILO studies. Specifically, there is the expected elevation of the initial response, followed by a linearly decreasing phase for subsequent items in the sequence. This is interrupted by a slower response when the target type switches, then the linear trend returns. Aside from this category switch effect, the most compelling finding from these data is the replication of the complete overlap between Vanish and Remain trials. The only significant effect was the main effect of Target Item, F(2.5, 27.7) ¼ 90.9, MSE ¼ 0.13, p < .001, g 2 p ¼ 0.89. See Supplementary Materials for more details.

Results
The orange/lighter lines in Figure 3 show SRTs as a function of Condition and Target Item for the Mixed block of trials. It is immediately obvious that this pattern is very different from the Sequential block. Specifically, search now appears to proceed in pairs of slow-thenfast responses, giving rise to a distinctive saw-tooth pattern. The effect is amplified for the first response but is clearly visible at all other stages, except the very last two items. From the perspective of MILO, the other very interesting finding here is that SRT patterns for Vanish and Remain no longer overlap, suggesting that the additional cognitive load associated with repeated switching interferes with the ability to ignore past locations. There were main effects of both Condition, F(1, 11) ¼ 33.0, MSE ¼ 0.03, p < .001, g 2 p ¼ 0.75 and Target Item, F(2.7, 29.8) ¼ 40.1, MSE ¼ 0.29, p < .001, g 2 p ¼ 0.79, qualified by the significant Condition Â Target Item interaction, F(5.3, 58.2) ¼ 4.4, MSE ¼ 0.04, p < .01, g 2 p ¼ 0.29. In the analysis directly comparing SRT patterns in the two block types, all main effects and interactions were significant (see Supplementary Materials). Of particular note was the significant three-way Block Type Â Condition Â Target Item interaction, reflecting the very different patterns visible in Figure 3, F(4.7, 51.8) ¼ 2.7, MSE ¼ 0.03, p < .05, g 2 p ¼ 0.20.

Discussion
This study used a variant of the MILO task to examine patterns of search through trials containing two sequences. We replicated the standard TMT finding that Mixed blocks (TMT-B) took consistently longer to complete, as well as the main findings of previous MILO studies-elevated first responses and overlapping Vanish and Remain curves during Sequential (TMT-A) blocks. There were also two novel findings that may help explain how increased cognitive load affects search behaviour when sequences are interleaved. We discuss these MILO findings next, before considering their implications for other tasks, such as TMT.

Novel MILO Findings
The first novel finding is the distinctive saw-tooth pattern of within-trial SRTs during Mixed blocks ( Figure 3). This pattern suggests that rather than fully alternating between the two sequences with each response (i.e., activating the full letter sequence, then the full digit sequence), participants search for successive pairs of targets (i.e., A-1, B-2, etc.), progressing in chunks of two items through both sequences in parallel. Having to repeatedly update the current search template(s) clearly has implications in terms of WM load, implications that we return to shortly. Our suggestion is that this saw-tooth function, with slow responses followed by fast responses, provides further evidence that participants plan ahead during multiple-item search (Horowitz & Thornton, 2008;Kosovicheva et al., 2020;Thornton & Horowitz, 2004. While searching for the first member of the pair, the location of the second item is either explicitly or implicitly coded, leading to more rapid second response. This phenomenon may also be related to parallel programming of action sequences, which is known to occur for both reaching movements (Adam et al., 2000;Lavrysen et al., 2002;Vindras & Viviani, 2005) and saccades (McPeek et al., 2000;McSorley et al., 2019;Walker & McSorley, 2006).
We should note that in a previous study , we did find slightly slower responses to letter sequences than digit sequences. This raises the possibility that, despite counterbalancing, the saw-tooth pattern is driven by category effects. However, this does not appear to be the case. Both in the current data set and two subsequent independent samples, we have found clear evidence of saw-tooth responding regardless of category order.
The second novel finding is that Vanish and Remain SRT patterns diverge during the more demanding Mixed blocks. Our standard finding, replicated in the Sequential blocks, is that the two SRT functions closely overlap, having an identical, accelerating profile (Horowitz & Thornton, 2008;Thornton & Horowitz, 2004. Indeed, we have argued that performance in the Remain trials provided a very compelling demonstration of how inhibitory tagging of past locations plays such an important role in everyday search and foraging behaviour (e.g., Klein, 1988;Klein & MacInnes, 1999;Wang & Klein, 2010).
The current Mixed block findings clearly indicate that increasing inherent task demands has consequences for retrospective aspects of search. The slowing of Remain responses relative to Vanish responses during Mixed blocks suggests that participants are no longer able to effectively ignore past locations and are thus not discounting those locations when searching for future targets.
Previous studies of inhibition of return have implicated a role for WM in maintaining the tagging of past locations, although such effects appear to be highly sensitive to the nature and timing of the memory tasks involved (Castel et al., 2003;Vivas et al., 2010;Zhang & Zhang, 2011). Such sensitivity may explain why previous attempts to use dual-task methodology to disrupt MILO tagging were unsuccessful (Luffingham, 2013;Zammit, 2017). Here, we speculate that some aspect of the need to maintain and dynamically update two WM search templates while moving through the interleaved sequences draws on the same resources needed for inhibitory tagging. Future MILO studies should help to further elucidate the nature of these shared resources.

Implications for TMT and Beyond
Although our primary goal in this study was not to directly compare MILO and TMT performance, nor to champion MILO as a replacement clinical tool, the current findings clearly have implications for tasks such as TMT that involve searching through multiple targets (see also Cain et al., 2012;Hills et al., 2013;Kristja´nsson et al., 2014;Pellicano et al., 2011;Wolfe et al., 2019). At the most general level, we hope we have demonstrated that examining within-trial patterns of reaction time can provide useful insights into performance, over and above examining overall completion time. While such a level of analysis is not possible with pen-and-paper tasks, computer or tablet versions of tests are likely to become more common (Fellows et al., 2017;Salthouse & Fristoe, 1995;Woods et al., 2015).
In hindsight, the idea suggested by the MILO saw-tooth patterns, that participants approach interleaved trials by chunking the sequence into matched pairs, appears obvious. However, we are not aware that this idea has been discussed in the TMT literature. If such a strategy is also used during the TMT-B, then we could attribute part of the TMT B-A difference to the demands of repeatedly and dynamically updating the current search template(s). Impairments in TMT-B performance might thus result from a compromised ability to produce consecutive chunks from the two sequences.
Also from the TMT perspective, we suggest that the Vanish/Remain manipulation could provide a simple way to factor out participant deficits that may be specifically associated with inhibitory control. In all current versions of TMT-whether pen-and-paper or computerbased-old targets remain visible for the duration of the test. The need to inhibit is thus confounded with other task demands. However, Figure 2 shows that Mixed block performance is significantly worse than Sequential block performance, even without the need to use inhibition (compare the two Vanish bars across block type).
When the need to inhibit is also required, during Remain trials, an additional cost is incurred, but only during Mixed blocks. The relative difference between Vanish and Remain completion times during Mixed blocks thus provides a very clear measure of the cost of having to inhibit. Importantly, such a cost can be directly measured within the task itself, without having to rely on correlational designs involving additional paradigms. Here, with normal young adults, this cost appears to be around 1.4 seconds. Introducing this simple Vanish/Remain manipulation to the TMT could provide additional diagnostic power, making it possible to identify individuals who have specific deficits with inhibitory control beyond those to be expected in matched controls. This can be done without the need to compare the full SRT functions in detail, as the difference between overall completions times in Vanish versus Remain trials would suffice.

Conclusions
How do the inherent task demands of interleaving sequences interact with retrospective and prospective context in visual search? Using the MILO task, we found that interleaved sequences result in a characteristic saw-tooth pattern of SRTs, consistent with parallel planning for pairs of upcoming targets. Furthermore, retrospective inhibitory tagging appears to be disrupted. These findings may be relevant for research using the TMT task that inspired this experiment. Patients who experience greater difficulty in completing the TMT-B (relative to TMT-A) may have deficits in chunking future targets, inhibitory tagging, or both. Adding MILO-type manipulations may help differentiate these possibilities.