Digital tools for direct assessment of autism risk during early childhood: A systematic review

Current challenges in early identification of autism spectrum disorder lead to significant delays in starting interventions, thereby compromising outcomes. Digital tools can potentially address this barrier as they are accessible, can measure autism-relevant phenotypes and can be administered in children’s natural environments by non-specialists. The purpose of this systematic review is to identify and characterise potentially scalable digital tools for direct assessment of autism spectrum disorder risk in early childhood. In total, 51,953 titles, 6884 abstracts and 567 full-text articles from four databases were screened using predefined criteria. Of these, 38 met inclusion criteria. Tasks are presented on both portable and non-portable technologies, typically by researchers in laboratory or clinic settings. Gamified tasks, virtual-reality platforms and automated analysis of video or audio recordings of children’s behaviours and speech are used to assess autism spectrum disorder risk. Tasks tapping social communication/interaction and motor domains most reliably discriminate between autism spectrum disorder and typically developing groups. Digital tools employing objective data collection and analysis methods hold immense potential for early identification of autism spectrum disorder risk. Next steps should be to further validate these tools, evaluate their generalisability outside laboratory or clinic settings, and standardise derived measures across tasks. Furthermore, stakeholders from underserved communities should be involved in the research and development process. Lay abstract The challenge of finding autistic children, and finding them early enough to make a difference for them and their families, becomes all the greater in parts of the world where human and material resources are in short supply. Poverty of resources delays interventions, translating into a poverty of outcomes. Digital tools carry potential to lessen this delay because they can be administered by non-specialists in children’s homes, schools or other everyday environments, they can measure a wide range of autistic behaviours objectively and they can automate analysis without requiring an expert in computers or statistics. This literature review aimed to identify and describe digital tools for screening children who may be at risk for autism. These tools are predominantly at the ‘proof-of-concept’ stage. Both portable (laptops, mobile phones, smart toys) and fixed (desktop computers, virtual-reality platforms) technologies are used to present computerised games, or to record children’s behaviours or speech. Computerised analysis of children’s interactions with these technologies differentiates children with and without autism, with promising results. Tasks assessing social responses and hand and body movements are the most reliable in distinguishing autistic from typically developing children. Such digital tools hold immense potential for early identification of autism spectrum disorder risk at a large scale. Next steps should be to further validate these tools and to evaluate their applicability in a variety of settings. Crucially, stakeholders from underserved communities globally must be involved in this research, lest it fail to capture the issues that these stakeholders are facing.

be less accurate than the TD group in false-belief understanding.Additionally, only 75% of the ASD participants completed the game in the first study (Carlsson et al., 2018), compared to 100% in the TD group.
Distrust and deceit: One study (Lu et al., 2019) presented a gamified task on a laptop to assess the ability of ASD and TD groups to distrust (avoid misleading cues) and deceive (provide misleading cues) a computer opponent to gain rewards.The primary metrics were accuracy (proportion of trials in which the child successfully deceived or distrusted the opponent) and the number of trials required to learn the correct response in the game.The ASD group was less accurate in deceiving and distrusting the opponent, especially when they falsely perceived the opponent to be a real person, and took significantly more trials to learn the correct responses required to win the game.
Executive functioning: Two studies used tablet-based gamified tasks to assess executive functioning (EF), specifically matching shapes, categorization, visual search and response inhibition (Chen et al., 2019;Jones et al., 2018).Primary metrics were accuracy, reaction time (latency to first response) or efficiency (ratio of average score to average completion time).One study discriminated groups based on reaction time, the ASD group being significantly slower compared to the TD group (Jones et al., 2018).Conflicting results were reported for the accuracy metric, with one study showing no group differences (Jones et al., 2018), the other showing reduced accuracy in the ASD group (Chen et al., 2019).The mean age of children in the second study (Chen et al., 2019) was slightly lower (55 months) than in the former study (60-64.6

months).
Fine motor: Five studies used tablet-and smartphone-based tasks to assess two kinds of motor abilities -motor planning and control (Anzulewicz et al., 2016;Fleury et al., 2013;Mahmoudi-Nejad et al., 2017;Rafique et al., 2019) and motor imitation (Chetcuti et al., 2019).The ASD group was found to be compromised in both.
For example, pause times in a discontinuous circle drawing task were significantly more variable across trials in the ASD group compared to the TD group (Fleury et al., 2013).Two studies (Anzulewicz et al., 2016;Rafique et al., 2019) using a trace and colour task on different device types (tablet vs smartphone) found greater mean impact force and gesture pressure in the ASD group, as well as greater use of distal parts of the screen and shorter dragging durations.Accuracy in a motor following task (Mahmoudi-Nejad et al., 2017) and a task requiring motor imitation of complex gestures (Chetcuti et al., 2019) was lower in the ASD group.

Video recording of child behaviour
Four studies from one group (Bovery et al., 2018;Campbell et al., 2018;Carpenter et al., 2021;Dawson et al., 2018) used the front camera on the tablet computer to record videos of children's behaviors while they watched age-appropriate videos containing social and non-social stimuli.Machine learning (ML) algorithms were used to automatically detect head position using coordinates of several facial landmarks, benchmarked to the distance from the screen.These head position metrics were subsequently used to estimate a variety of metrics related to the social and motor domains: Social preference and orienting to name: ML algorithms were used to estimate the time children viewed social vs non-social stimuli presented on the left-and righthand sides of the screen (Bovery et al., 2018), and the consistency and latency of head turns towards an assessor calling the child's name from behind (Campbell et al., 2018).No overall group differences were observed in looking time to social vs non-social stimuli (Bovery et al., 2018), in contrast to results reported earlier (Gale et al., 2019;Ruta et al., 2017).Compared to the TD group, the ASD group was found to be less consistent and took longer to orient towards the person calling their name (Campbell et al., 2018).Both these studies assessed differences in overt task engagement as a discriminating metric, defined as the number of frames in which the eyes or faces of children seated in front of a screen could be tracked by an automated algorithm.The ASD group was significantly less overtly engaged in both the tasks compared to the TD group.
Gross motor: ML algorithms were also used to estimate the rate of head movements in children while they watched an age-appropriate video, as a measure of postural head control (Dawson et al., 2018).The ASD group was found to have higher rates of head movement, indicating lower levels of postural control of the head (Dawson et al., 2018).
Facial expressions: Machine learning methods using features from facial landmarks were also used to estimate the type of facial expression made (positive, neutral, other) in response to animated videos presented on the tablet's screen (Carpenter et al., 2021).While watching videos meant to elicit emotions, a higher frequency of neutral expressions was reported in the ASD group.Another study used machine learning to predict the accuracy of imitating facial expressions presented on tablet or smartphone screens (Zhao & Lu, 2020).The ASD group was less accurate in imitating facial expressions, especially those of disgust, surprise, fear, and neutral expressions.

1.2) Toys and digital audio recorders
Intelligent toy car: One study used a toy car implanted with an accelerometer to record its motion in 3 dimensions while the child played with it (Moradi et al., 2017).Data, which comprised accelerations along with their timestamps for the duration of play, could be transferred to a computer or an Android device using Bluetooth or wifi technology.The primary metric was the accuracy of a ML algorithm to predict children's diagnostic classification based on the recorded (acceleration in 3 dimensions with timestamps) and derived (for example: duration of play, correlations between acceleration in two dimensions) data.These data were expected to capture repetitive and/or stereotypical movements often observed in children with autism (American Psychiatric Association, 2013).The algorithm discriminated between the ASD and TD children with moderate accuracy (62%), sensitivity (65%), and specificity (61%).This task took 5 minutes to complete, and was administered in a quiet room in the presence of a research staff who gave minimal instructions.

Digital audio recorder:
One study used a portable digital audio recorder that was placed either in a pocket in the child's clothing or within a meter of the child to record conversations between the index child and other family members (Wijesinghe et al., 2019).The recorder was left with the family for varying durations of 2-10 hours.Data comprised child's utterances segmented out from the entire conversation, which were subsequently used as features in a ML algorithm along with derived variables (for example: total duration, number of segments containing meaningful and meaningless words) to classify children into their diagnostic groups.The algorithm was not effective in discriminating between groups.

Executive functioning (EF):
Three of the nine studies assessed EF, one using a battery of established tasks (Gardiner et al., 2017), one using a novel set of tasks (Aresti-Bartolome et al., 2015), and one using a commercially available game (Veenstra et al., 2012).Primary metrics for the established EF tasks and the commercial game were accuracy (correct trials divided by the total number of trials), omission errors (no response when a response was required), and commission errors (response provided when no response was required).The commercial game also assessed reaction time, repeated number of clicks on the same object, and variability in responses across trials.EF was also assessed using a multi-step planning game, an adaptation of the Tower of Hanoi, as part of the suite of established EF tasks.
The primary metric was the number of moves in each correct trial (Gardiner et al., 2017).As seen in executive tasks presented on mobile devices, results related to accuracy were variable, with one study showing no group differences (Gardiner et al., 2017), the other showing reduced accuracy in the ASD group (Veenstra et al., 2012).The study assessing reaction time found the ASD group slower (Veenstra et al., 2012).
Metrics for the novel EF game were task completion (proportion of participants completing the game) and the number of pre-specified items identified per trial (Aresti-Bartolome et al., 2015).Consistent with observations of reduced task completion described above (Carlsson et al., 2018), the ASD group completed fewer trials (Aresti-Bartolome et al., 2015).They were also more prone to errors, although statistical significance was not determined (Aresti-Bartolome et al., 2015).
Cognitive: One study used a unique gamified task to assess abstract or relational modes of thinking (accuracy in correctly identifying the relationship between two objects as against the perceived form of the objects themselves) (Hetzroni et al., 2019).In this task, the correct response corresponded to the option where a different set of images are presented in the same spatial orientation as in the target image.In comparison to the TD group, the ASD group was compromised on identifying relationships between objects as they were more likely to select the option that contained components of the target image, with little attention to their spatial organization.It remains unclear whether this performance difference resulted from an impairment in relational thinking, a narrow and localized field of attention, or a differing interpretation of verbal instructions.

Social:
The novel EF task (Aresti-Bartolome et al., 2015) also included a component wherein the game stopped randomly in the middle of the trial and the participant was required to interact with the test administrator to resume the game.The primary metrics were the latency to initiate an interaction, and whether eye contact was made during the interaction.The ASD group took significantly longer to initiate the interaction, and were less likely to make eye contact with the test administrator (Aresti-Bartolome et al., 2015).Other studies assessed anthropomorphic bias (proportion of taps on videos with human characters exhibiting biological motion) (Chaminade et al., 2015) and prosocial behavior (proportion of responses to a distressed avatar) (Deschamps et al., 2014).The ASD group showed no preference for biological motion in human characters (Chaminade et al., 2015) as opposed to the TD comparison group.No group differences were reported for prosocial behavior (Deschamps et al., 2014).
Motor: Two studies used desktop computers to assess motor skills.One of them measured motor planning and control skills using a point-to-point movement task where the child was required to draw a line on the vertical plane using a stylus from a start position at the bottom of the screen to a target position at the top.Some trials included distractors near the target endpoint.A range of kinematic variables were estimated including the variability in movement preparation time across trials, and change in response metrics in the presence of a distractor (Dowd et al., 2012).The ASD group showed higher variability in latency (defined as movement preparation time in the study) compared to the TD group and did not adapt their movements in the presence of a distractor (Dowd et al., 2012).The second study assessed eye-hand coordination.The primary metric was Pearson's correlation between eye fixation latency on a target stimulus and reaction time of the hand response to indicate the left-right position of the stimulus on the screen, either using a button box, pressing pre-specified keys on the keypad, or touching the stimuli on the screen using a stick (Crippa et al., 2013).The ASD group demonstrated lower visuomotor coordination (Crippa et al., 2013).

Video recording of child behaviour
Facial expressions: Two studies from the same group (Borsos & Gyori, 2017;Gyori et al., 2018) analyzed facial expressions elicited by a deception and sabotage game to discriminate between groups.A webcam captured videos of the child which were then analyzed by the Noldus FaceReader.The first study exploring differences in the intensity of various emotions averaged over different time intervals found no group differences (Gyori et al., 2018).However, a more granular analysis in the second study, exploring the mean and variance of emotion intensities frame-by-frame, found both the mean and variance of the 'scared' and 'surprised' expressions to be significantly higher in the ASD group compared to the TD group, as was their speed of change to a different expression (Borsos & Gyori, 2017).This result during active gameplay was in contrast to an earlier study using passive viewing of animated videos (Carpenter et al., 2021) which reported more neutral expressions in the ASD group.The second study (Borsos & Gyori, 2017) also assessed the ratio of valid to invalid frames, where invalid frames were defined as those in which the Noldus FaceReader was unable to identify the face, or unable to assign an emotion to the frame (Borsos & Gyori, 2017); no significant group differences were found.
Social: Two studies used computer vision analysis of head (Martin et al., 2018) and eye movements (Li et al., 2020) to discriminate between the TD and ASD groups.
Videos were captured using webcams mounted on the monitor.In the first study (Martin et al., 2018), children were shown social and non-social videos on the desktop screen while the webcam captured their behaviours.Primary metrics were automated assessments of head movements (degrees of pitch, yaw and roll).The ASD group made greater lateral head movements, looking away from social videos.In the second study (Li et al., 2020), children viewed a picture of their mother on the screen.ML methods were used to compute the trajectories of their eye movements as captured by the webcam.The primary metric was the accuracy of a second ML algorithm to classify children into their diagnostic groups using features extracted from the length and angle information of children's eye movement trajectories.
A classification accuracy of 92.6% was achieved, although it is not clear from the report whether this high accuracy was a result of over-fitting to a training set that also had been used for testing.Consistent with other studies (Bovery et al., 2018;Campbell et al., 2018), task engagement (proportion looking time at the screen) was found to be significantly lower in the ASD group in this study (Li et al., 2020), though no differences were reported in the former study (Martin et al., 2018).

Speech and Language
Two studies used picture stimuli presented on a desktop computer to assess speech characteristics (pitch) (Nakai et al., 2014) or acquired vocabulary and comprehension (Lin et al., 2013).A microphone attached to the child's clothing was used to record speech in the former study, which was then used to extract pitch characteristics using an ML algorithm.In the second study, correct or incorrect responses were recorded by key presses on the keyboard.The primary metrics were accuracy in naming and describing objects presented on the screen either in visual or audio format.In the first study, significant group differences were found in the variability of pitch metrics in older (7-9 years) but not in younger (4-6 years) children (Nakai et al., 2014).The second study found better language proficiency (vocabulary, comprehension, homographs and decoding) in the ASD group at younger ages (4-5 years) in most tasks, but the advantage decreased by the time children turned 6 years (Lin et al., 2013).The ASD group was also found to be more receptive to visual stimuli, as they were more accurate in articulating the names and descriptions of stimuli presented visually, as against stimuli presented in audio format (Lin et al., 2013).This visual bias was evident in that the auditory sentence comprehension task was the only one in which the TD group outperformed the ASD group.

2.2) Virtual reality (VR) platforms
Four studies (10.5%) used non-portable technology in the form of virtual reality platforms to assess joint attention (Jyoti & Lahiri, 2020;Shahab et al., 2018), motor imitation (Alcañiz Raya et al., 2020;Shahab et al., 2018) and visuomotor coordination (Jung et al., 2006).These studies used different VR platforms of varying levels of sophistication.The oldest (Jung et al., 2006) used a simple set of devices including a personal computer, projector, screen, infrared reflectors and a digital camera.
On the other hand, one of the more recent studies (Alcañiz Raya et al., 2020) used the highly sophisticated CAVE-Automatic Virtual Environment (CAVE TM ) which includes a semi-immersive room with rear-projected surfaces.In this environment, the participant was not only able to see and hear an avatar, but also smell the food the avatar ate (Alcañiz Raya et al., 2020).The digital cameras used to record child responses included depth information.

Joint attention (JA):
Two of the four studies assessed JA (Jyoti & Lahiri, 2020;Shahab et al., 2018) using a paradigm wherein an avatar directed their eye gaze towards virtual objects, and the child was expected to follow the gaze and provide a response, either by naming the object (Shahab et al., 2018), or by touching the target object on a touch-sensitive monitor (Jyoti & Lahiri, 2020).In the latter study, an avatar provided increasing numbers of cues towards the target object, first by gaze alone, followed up by both gaze and head-turn, then gaze, head-turn and finger-pointing, and finally, sparkling of the target in addition to all of the above cues (Jyoti & Lahiri, 2020).The primary metric was the number of times the target object was identified (Jyoti & Lahiri, 2020;Shahab et al., 2018).In both cases, the ASD group scored lower than the TD group, especially when the cues were limited to gaze and head-turn alone, with performance improving as the number of cues provided increased.One of the studies recorded the reaction time (Jyoti & Lahiri, 2020) (latency between cue provided and target identification) and found the ASD group significantly slower than the TD group.
Motor imitation: Two studies assessed motor imitation (Alcañiz Raya et al., 2020;Shahab et al., 2018) using a VR set-up, one in which the child imitated virtual robots to play the drum and the xylophone (Shahab et al., 2018), and the other where they imitated various actions of an avatar appearing on the screen (Alcañiz Raya et al., 2020).Children were videotaped to record their responses.The primary metrics, respectively, were performance scores (correct imitations of robot or avatar actions) (Shahab et al., 2018), and the accuracy of an ML algorithm to classify children into their diagnostic groups using metrics calculated from the movements of joints (heads, limbs and trunk) across different types of actions (Alcañiz Raya et al., 2020).Both studies found the ASD group to be compromised in motor imitation.
In the second study, the prediction accuracies of the ML methods were highest (89.36% with leave-one-out cross-validation) when using features from head movements alone as compared to using all available features.One of the studies assessed task engagement (defined as the duration for which the child played the game) (Shahab et al., 2018), and found the ASD group to be less engaged, also observed in several studies described above (Bovery et al., 2018;Campbell et al., 2018;J. Li et al., 2020).
Visuomotor coordination: One study used a VR platform to assess visuomotor coordination (Jung et al., 2006).The task involved popping virtual balloons with a real stick.The primary metrics were accuracy, reaction time and the total distance the stick was moved.While no group differences were observed in accuracy, reaction times in the ASD group were slower, as demonstrated by a few studies described above (Jones et al., 2018;Jyoti & Lahiri, 2020;Veenstra et al., 2012).A composite principal-components measure based on the three primary metrics showed that the ASD group was less efficient (popped fewer balloons, took more time to pop each balloon, and moved the tangible stick more) in popping balloons in this task.Tasks took 14.6 min to complete on average (range = 10-20 min).

Functioning
or hearing deficits -did not hear English at home -parents/ guardians did not speak and read English sufficiently to provide informed consent Criteria: expert clinical judgment, ADOS-Toddler Module M-CHAT-R/F to screen for ASD during recruitment Personnel: Licensed clinical psychologist with expertise in ASD

Functioning
of learning disabilities, neurological disorders, or psychiatric conditions -IQ > 70 (both groups) Criteria: DSM-IV-TR, ADI-R, ADOS.ASRS as a measure of symptom severity Keywords not included in the Phase 2 search are highlighted in red in the Phase 1 list