Interpersonal and Human-Automation Trust in an Underwater Mine Detection Task

In target detection tasks false alarms (i.e., indicating a target is present when it is absent) decrease trust more than misses. Furthermore, human advisors providing advice at the same time as automation, may impact how users trust and subsequently rely on automated aids. This study aimed to understand whether the false alarm rate (FAR) of an automated target recognition aid impacts trust in the automated aid, trust in a human teammate, or operator self-confidence in a dual-advisor target detection task. Participants completed a mine detection task while receiving advice from a human and an automated advisor. The FAR of the automation was manipulated between groups and trust in each type of advisor was measured. Automation FAR did not influence trust in the automation. Low FAR automation was associated with higher trust in a human teammate and increasing self-confidence over the course of the experiment.


Background
Automated target recognition systems (ATRs) are used to locate items of interest within a noisy environment.ATRs perform well in defined tasks but reliability can decline in unanticipated situations, so they are often paired with a human operator.In these human-automation teams, even imperfect ATRs can improve performance (Reiner et al., 2017).
One application of ATRs is the detection of underwater mines, which are a prominent threat to maritime operations (Ho et al., 2011).A leading mine countermeasure is the evaluation of SONAR images of the sea floor (Figure 1) by operators to determine whether a mine is present in areas of interest (Hammond et al., 2021).Given the poor quality of SONAR images, relatively rare occurrence of a mine, and the high density of clutter on the seabed, this is a challenging task.To assist operators in the detection and classification of mine-like objects, ATRs are employed to cue operators to the location of mine-like objects for further inspection.However, industry has reported concerns of widespread disuse of these systems.

Trust and Self-Confidence
While disuse is concerning, complete dependence on ATRs would defy the purpose of human-automation teams -the greatest performance is expected when dependence is calibrated to the capabilities of a system (Lee & See, 2004).The decision to depend on an agent is believed to stem largely from trust (Hoff & Bashir, 2015), which is the attitude that an agent will help achieve an individual's goals in a situation characterized by uncertainty and vulnerability (Lee & See, 2004, p. 51).Dependence decisions are believed to arise from a weighing of trust in each involved agent (Merritt et al., 2015).While working with ATRs, the operator is expected to compare their trust in the system with their self-confidence in performing the task and align their decision with the more trusted agent (cf.Williams et al., 2023).Appropriate dependence is then achieved by calibrating trust to each agent's true capabilities (Lee & See, 2004).
The perfect automation schema (PAS) describes a tendency for operators to believe that automated systems perform perfectly, and a subsequent distrust in the system once errors are observed (Dzindolet et al., 2002).Catching blatant mistakes undermines trust in the system and bolsters the operator's self-confidence (Madhaven et al., 2006).Even when errors are not blatant, operators tend to inaccurately attribute poor task outcomes to characteristics of automated systems rather than their own capabilities (Douer & Meyer, 2022).Since highly reliable systems remain imperfect in most applied settings, disuse of automated systems often stems from poorly calibrated trust that is below the systems capabilities.

False Alarm Rate
In target detection (i.e., signal detection) the environment is classified into two possible states: target present or target absent.The combination of an operator or system response, and the true state of the world creates four outcome categories -hit, miss, correct rejection and false alarm (FA; Wickens et al., 2021).While the overall error rate of ATRs is typically limited by technological capabilities, the type of errors made can be adjusted by changing the systems threshold to recognize an object as a target, expressed as the response bias (see Signal Detection Theory; Wickens et al., 2021).While a liberal (low) response bias is expected to produce the highest number of hits (i.e., mines detected, a desired result), the low threshold also results in a greater number of FAs which may be problematic.Frequent FAs may incite a cry wolf effect (Breznitz, 1983) where a belief of low alarm reliability diminishes alarm responses (Bliss et al., 1995).Excessive FAs appear to reduce trust, leading to disuse and neglect of alarms for hits (Culley & Madhavan, 2013;Manzey et al., 2014).

Dual Advisor Scenarios
While there is a significant body of literature on the factors that influence dependence when a single operator interacts with an automated system, there is limited research considering more complex multi-human teams.Merritt et al. examined trust anddependence in a single (2011) anddual-advisor (2015) target detection task.While trust was associated with dependence on the ATRs throughout the single advisor task, in the dual advisor setting this was only true for the first half of the study suggesting trust may evolve differently in dual advisor settings.These findings highlight the need for further research into the role of automated systems in multi-human environments.
Biases to distrust automated systems may be exacerbated by additional teammates.Since humans are expected to commit errors, the literature suggests that no perfect human schema exists (Merritt et al., 2014).When advice is offered from automated and human sources, trust in the automated source may be disproportionately impacted (Madhavan & Wiegmann, 2007).Since there are only two possible recommendations in binary outcome tasks (i.e., target present or absent) disagreements between dual advisors will always place the operators' impression in a 2-1 majority.If dependence decisions are determined by weighing trust in agents, a guaranteed 2-1 majority promotes self-reliance and disuse of all advice (Merritt et al., 2015).Disuse of aids in a dual advisor context, as well as a disproportionate effect of FAs on trust in ATRs, may explain the disuse of ATRs' observed in industry.Few studies have examined the influence of FAs on trust and dependence behaviors in complex environments and no studies to date have examined the effect of FAs in a dual advisor task, nor have they measured trust in each agent and operator self-confidence simultaneously.

Purpose and Hypotheses
This study aimed to understand whether the false alarm rate of an ATRs impacts trust in the ATR, trust in a human teammate, or self-confidence while completing a dual-advisor target detection task.Participants completed a mine detection task with advice from an imperfect ATRs and a human teammate.Critically, the reliability level (i.e., the total number of errors) was constant between the human teammate and ATR; however, the response bias of the ATRS was manipulated between groups (High FAR group: the system made more FA than miss errors; Low FAR group: the system made fewer FA than miss errors).We measured trust in the ATRS and human teammate, as well as self-confidence, at baseline and after each of three experimental blocks.
Stemming from Merritt et al.'s (2015) finding that in a binary outcome task (i.e., target present, or not) the presence of dual advisors leads to advice disuse in favor of self-reliance and decreased trust in both a human and automated advisor, we hypothesized that as participants became more familiar with the task, their self-confidence would increase across the blocks and trust in their human teammate would decrease.As participants completed trials, and observed errors made by the ATR, we predicted that incongruities between a PAS and reality would result in decreased trust in the ATR.Finally, since FAs were expected to be more salient than misses and result in more evident violations of a PAS, we hypothesized that trust in the ATRS would decline more rapidly in the high FAR group compared to the low FAR group.

Methods
Participants 50 people (37 female, 12 male, 1 not disclosed) with an average age of 19.98 years (SD = 1.87) were recruited from a psychology student pool.Participants self-screened for normal or corrected to normal vision.To encourage engagement in the task, participants were told they would receive a 10 CAD honorarium if they were a top performer.However, all participants received a 10 CAD honorarium and credit towards a psychology course regardless of their performance.This study was approved by the Dalhousie University Research Ethics Board [REB 2019-4935] and informed consent was obtained from all participants.

Design
Participants completed an underwater mine detection task assisted by an ATRS and a human teammate (herein known as the co-commander).The simulated task was presented on a custom MATLAB (R2017b) script.Participants were seated in front of a monitor (res.2048x1152) and used a computer mouse to interact with the simulation.
In each trial, a SONAR image of the seabed was displayed on the monitor.Participants were asked to indicate an initial impression of whether a mine was present by either clicking over the area they believed to contain a mine, or a "no mine present" button (Figure 1).Once their initial impression was entered, advice from an ATRS and co-commander displayed at the bottom of the screen.If either agent advised that a mine was present, a blue (ATRs) or red (co-commander) box appeared around the suspected mine location.Participants then indicated their final assessment by clicking on the mine location or the "no mine" button.This technique of two responses (initial impression and final assessment) for each trial has been used previously (Merritt et al., 2015) and aims to separate dependence on aids from personal judgements and aid disuse.The study consisted of 340 trials spread over three blocks (114, 113 and 113 trials respectively).Each trial consisted of one image.Images were presented in a randomized order for each participant.
Participants were assisted by an ATRS whose response bias differed between groups.The ATRS with a high FAR made FA errors on 25% of the trials and miss errors on 5.6% with the error percentages reversed for the low FAR ATR.Importantly, both ATRs were 70% reliable (errors in 30.6% of trials) and differed only in the type of error that was made.Group assignment was randomized with participants blind to their group as well as the FAR and reliability of the ATR.
The human-teammate was framed as a "co-commander" who could make recommendations, and the participant was the "lead commander" who made the final assessment regarding the presence of a mine.The co-commander's advice maintained a reliability similar to the ATRs but had a neutral response bias.FA and miss errors were made in 13.5% of trials each (73% reliable).The FAR of the ATRS were high and low, respectively, relative to each other and to the co-commander.The reliability of both agents reflects a 70% minimal reliability of automated aids that are expected to improve performance compared to manual tasks (Wickens & Dixon, 2007), while maximizing trials in which errors could be observed to differentiate the FAR groups.

Measures
Questionnaires were used to evaluate trust in each agent.Trust in the ATRS was evaluated with Jian et al.'s (2000) Checklist for Trust between People and Automation.Trust in the co-commander and self-confidence were assessed with modified versions of this questionnaire (e.g., "I am suspicious of the system's intent, action, or outputs" was changed to "I am suspicious of my co-commander's ability to identify mines").Similar questionnaire modification was previously used by Knocton et al. (2021).Dependence and performance data, as well as signal detection measures of response bias and sensitivity, were gathered but given the space constraints will be reported in a separate publication.

Procedure
Participants attended a single 1.5-2 hour session.They were told they would be working with another participant; however, the other person was a member of the research team.It was explained that they would be randomly assigned to one of two roles -one teammate would be responsible for determining whether a mine was present (lead commander) and the other would offer advice (co-commander).To present the illusion of randomized roles, participants were asked to draw their role from a hat, which always indicated that they would be the lead commander.The co-commander (confederate) was moved to the adjacent room for the remainder of the study under the premise that another (fictious) researcher was set up so the co-commander could enter their recommendations ahead of the lead commander.They were told the recommendations would be presented on screen to prevent communication between the teammates and to avoid distracting each other.In actuality, the co-commander's recommendations were programmed into the simulation to control the reliability and FAR.
Once separated, the participant read a script that described the mine detection task.Then they completed a 50-trial training block with feedback after each trial but without advice from the ATRs or the co-commander.A second eight-trial training block demonstrated how the advice would be displayed, and the process of indicating an initial impression and final assessment.Participants were advised that they could use the advice as much or as little as they would like to achieve the greatest performance.
Upon completing the training blocks, the three trust questionnaires were completed to gather a baseline measure of their trust in the ATR, trust in their co-commander, and confidence in their own ability to complete the mine detection task.Participants then completed three experimental blocks with advice from the ATRS and the co-commander.After each block they completed the three trust questionnaires (ATR, co-commander, and self-confidence).The task was self-paced; however, they were advised that each block should take no more than 15 minutes.

Data Analysis
Data were formatted in MATLAB and were analyzed with IBM SPSS Software (v26).Items on the trust and self-confidence questionnaires were reversed where so increasing values were associated with greater trust or confidence.Scores were averaged by participant for each questionnaire and a 2 (Group: High FAR, Low FAR) by 4 (Block: baseline, B1, B2, B3) mixed ANOVA was completed for each with group as the between-subjects factor.

Results
All results were corrected for degree of departure from sphericity with Greenhouse-Geisser estimates.

Trust in Co-Commander
There was a significant main effect of FAR Group, F (1, 48) = 4.063, p = .049,η 2 p = .08,where the Low FAR Group had greater trust in their co-commander than the High FAR Group (Figure 3).There was no main effect of Block in the participants' trust in their co-commander, F (2.61, 125.20) = 1.086, p = .352,η 2 p = .02and no interaction between Group and Block, F (2.61, 125.20) = 1.845, p = .142,η 2 p = .04.Although the interaction was not significant, an independent samples t-test was conducted to confirm there was no difference in trust in the co-commander at baseline between the Low and High FAR Groups, t (48) = .636,p = .528,d = .18.

Discussion
The present study examined the effects of a low and high FAR of an ATRS in a dual advisor mine detection task.While self-confidence was expected to increase across the blocks, it increased only in the low FAR group.Trust in the co-commander was expected to decrease across the blocks.Instead, it was higher in the low FAR group and remained steady across the blocks for both groups.As anticipated, trust in the ATRS decreased across the blocks for both groups; in line with the PAS, both misses and FAs eroded trust.While we expected that FAs would be salient and erode trust more rapidly in the high FAR group compared to the low FAR group, there was no effect of FAR group.Madhaven et al. (2006) reported a similar null effect of FAR on trust in an ATRS in a single advisor setting.However, it was surprising that the FAR rate of the ATRS influenced trust in the co-commander, and self-confidence but not trust in the ATR.
Trust in the ATRS may have remained the same between FAR groups due to the description of the system; participants were told that the ATRS compared objects in the images against a database of known mine characteristics and cued operators to objects with these mine-like characteristics.
Although not intended to do so, this information could have acted as justification for why the ATRS may err while still performing as intended.Providing participants with rationale for errors has been noted to maintain trust and dependence following errors (Dzindolet et al., 2002) which might explain why salient FAs did not disproportionately impact trust in the ATRS here.
While FAs did not decrease trust in the system, participants in the low FAR (high miss rate) group may have located mines that were not detected by the ATRS and attributed this to their own abilities, bolstering self-confidence.Similarly, when the co-commander cued participants to areas not cued by the ATRS it may have appeared that the co-commander performed better than the ATR, increasing their trust in the co-commander without disproportionately influencing their trust in the ATR.In this circumstance, objects that were perceived as missed by the ATRS but detected by the co-commander may have resulted in salient ATRS misses in the low FAR group.Conversely, for the high FAR group, if FAs were assumed to be true hits the appearance that the expert system was detecting mines may have been simply interpreted as the co-commander and participant themselves performing as expected, resulting in steady trust and self-confidence throughout.Immediate feedback may have made the errors, and the manipulation, more obvious to the participants.However, if immediate confirmation was plausible in this task there would be little use for the operator or the ATR.Further work will examine dependence (i.e., whether the participant switched their initial impression to agree with one of the advisors), performance, and signal detection measures.To determine whether these findings are unique to the FAR of an ATR, future studies should manipulate the FAR of the co-commander.
As suggested by Merritt et al. (2015) concepts established in single-operator settings do not appear to extend to complex multi-human-automation settings.Additional research is required to understand how automated systems can be integrated into human teams.

Take Aways
• • The false alarm rate of an automated aid may impact the relative trust of other agents involved (i.e., a human-teammate or self-confidence) rather than trust in the system directly.• • Concepts derived from human-automation research may not apply to multi-human automation teams.

References
Bliss, J., Dunn, M., & Fuller, B. S. (1995).Reversal of the crywolf effect: An investigation of two methods to increase alarm

Figure 1 .
Figure 1.Example of an experimental trial.Participants were initially presented with the image without advice and asked to give their initial impression.Advice was then presented as depicted below (square around a potential mine location if the agent indicated a mine was present and text below the SONAR image) and participants entered their final assessment.
Figure error bars show the SEM.

Figure 2 .
Figure 2. Reported trust in the automated target recognition system (ATRs).Blocks one to three were assessed following each experimental block.

Figure 3 .
Figure 3. Reported trust in the co-commander.Blocks one to three were assessed following each experimental block.

Figure 4 .
Figure 4. Reported self-confidence.Blocks one to three were assessed following each experimental block