Health chatbots acceptability moderated by perceived stigma and severity: A cross-sectional survey

Background Chatbots and virtual voice assistants are increasingly common in primary care without sufficient evidence for their feasibility and effectiveness. We aimed to assess how perceived stigma and severity of various health issues are associated with the acceptability for three sources of health information and consultation: an automated chatbot, a General Practitioner (GP), or a combination of both. Methods Between May and June 2019, we conducted an online study, advertised via Facebook, for UK citizens. It was a factorial simulation experiment with three within-subject factors (perceived health issue stigma, severity, and consultation source) and six between-subject covariates. Acceptability rating for each consultation source was the dependant variable. A single mixed-model ANOVA was performed. Results Amongst 237 participants (65% aged over 45 years old, 73% women), GP consultations were seen as most acceptable, followed by GP-chatbot service. Chatbots were seen least acceptable as a consultation source for severe health issues, while the acceptability was significantly higher for stigmatised health issues. No associations between participants’ characteristics and acceptability were found. Conclusions Although healthcare professionals are perceived as the most desired sources of health information, chatbots may be useful for sensitive health issues in which disclosure of personal information is challenging. However, chatbots are less acceptable for health issues of higher severity and should not be recommended for use within that context. Policymakers and digital service designers need to recognise the limitations of health chatbots. Future research should establish a set of health topics most suitable for chatbot-led interventions and primary healthcare services.


Introduction
Primary healthcare is adopting alternative modes of delivery to address the growing demand for high-quality services, transitioning towards telemedicine and behaviour modification. 1 This shift has been accelerated by the COVID-19 pandemic and subsequent social distancing measures facilitating digital innovation to shield vulnerable patients from being exposed to the infection. [2][3] Conversational agents, chatbots or virtual assistants, are a type of digital technology underpinned by artificial intelligence (AI) or machine learning, designed to simulate human-to-human conversation via text or speech, able to understand health queries and provide a specific response in a conversational manner. [4][5] One example is the Babylon "GP at Hand" service which incorporates chatbot functionality to triage patient's health issues and direct them to relevant modes of care. 6 The comparison of the diagnoses by doctors and the 'GP at Hand' service showed that the chatbot was able to diagnose the conditions with similar accuracy to human doctors. 7 Thus, chatbots have the ability to communicate health information to patients aiding the less complex task within primary care.
Research into the acceptability of chatbots as consultation source is limited, with a distinct lack of experimental studies. Similar technologies, such as symptom checkers have been found to be acceptable to patients and staff in the primary healthcare context. 8 Service users are willing to accept chatbots for disease diagnosis, given that they were not substituted for existing healthcare services. [9][10] Chatbot acceptability is influenced by a person's effort expectancy of using a chatbot, facilitating conditions, social influences, price value, habit, compatibility and perceived access to the healthcare system. 10 The quality of the chatbot content, the perceived accuracy of health information, and the sources underpinning the chatbot are associated with acceptability. Their acceptability is low due to perceived chatbot responsibility, liability, 11-13 perceived chatbot competence. 5,10,11 Patient safety has also been identified as an important factor associated with acceptability. 11 Low acceptability of chatbots, despite proven effectiveness, would lead to suboptimal uptake of this intervention. However, research indicates that people are more willing to disclose sensitive health information to chatbots in comparison to health professionals. 14-15 Chatbot acceptability might be higher for stigmatised health issues as they offer greater anonymity than a face-to-face GP consultation. 11 There is a need to establish how this novel technology can be best utilised within healthcare settings. Fadhil (2018) suggested that a GP-chatbot combination may increase chatbots acceptability and facilitate better patientdoctor communication, however, this mode of healthcare delivery has not been previously examined. 16 As such, we aimed to assess how perceived stigma and severity are associated with chatbot acceptability by comparing that with acceptability for chatbot-led, GP-led and GP-chatbot combined consultation sources. The finding would inform the development of chatbot-guided services in primary healthcare.

Design
This study was an online factorial experiment design which included three within-subject factors (the stigma and the severity of the health issue and the consultation source) and six between-subject covariates (prior chatbot knowledge, confidence of chatbot knowledge, average internet usage, age, gender and education level). The outcome variable was participant rated acceptability of each consultation source. The study protocol and analysis plan were pre-registered (https://osf.io/szgma/). Approval was granted by the ethics committee of University College London (ref:14917/001).

Participants and recruitment
The population of interest was any adult who might at some point have access to a chatbot for a health-related consultation. For cost and practical reasons, the study was conducted online using paid Facebook advertisement. All individuals aged 18 years and above with access to the internet, living in the UK were invited to participate. These inclusion criteria were reflected in the Facebook advertisement strategy, which cost £500 in total. Facebook users that fall into these criteria were shown with the study advert on their feed and were able to decide whether to take part in the study. It was unspecified how Facebook algorithm selected users to display its adverts and the researchers had little control over the sampling method. The target sample size was 250 participants, identified as an appropriate range for sufficient statistical power. No formal power calculations were made as the effect size was unknown.
Between May 2019 to June 2019, Facebook users were shown a digital advert inviting them to complete an online survey by clicking on an URL link, with an incentive of a 10 pence donation to a charity upon study completion. The advert asked if people could spare 10 min to fill out an online survey about attitudes towards the use of AI in healthcare. Once participants had accessed the online survey, they were presented with the information page and asked to consent.

Measurements
In total, the survey consisted of seven items. First, participants were asked about the knowledge of chatbots. They were then presented with a video and text transcript describing the chatbot technology and its use in healthcare services and asked to rate their confidence in understanding the concept of a health chatbot. Later, they were presented with the 12 health issues such as 'you have been feeling severely depressed, and having suicidal thoughts', 'you have what you think are headlice' and 'you have been coughing up blood' (see supplementary file A) and asked to indicate the most preferred and acceptable consultation source. In the end, participants were asked demographic questions such as age, gender, educational attainment and internet usage. [17][18] The outcome measure, acceptability of each consultation source, was based on the Acceptability Framework 19 and was operationalised as a proxy measure 'willingness'. Participants were presented with the statement 'I would be willing to use this option to find out what is wrong and recommend treatment' and were asked to rate their willingness to use each of the three consultation sources on a 5-point Likert scale (ranging from 1 = 'not very willing at all'; to 5 = 'very willing') for each presented health issue. Scores under three were interpreted as a less acceptable rating by participants, while scores over three were interpreted as a more acceptable rating by participants.
The three experimental factors were: health issue stigma (more/less), health issue severity (high/low) and iii) consultation source (chatbot, GP, GP-chatbot combination). This resulted in 12 conditions (see supplementary file B).
The participants were randomly presented with three health issues for each more/less stigmatised or high/low severity condition to balance participant fatigue and practice effects. Each participant was asked to rate their acceptability of each of the three consultation sources for each consultation source. Participants were asked to complete 36 acceptability ratings and were blinded to the predicted stigma or severity of the health issue. This was to ensure that the participants retained their interpretation of the stigma and severity of the health issue.

Data analyses
The means (M ) and standard errors (SE) of acceptability ratings for chatbot, GP and GP-chatbot combination were calculated for all categories of participants where comparisons were being made. Both research questions were addressed together in a single mixed-model analysis of variance. The model included all main effects and all 2-way interactions involving consultation source. It also included the 3-way interaction between apparent stigma, severity and consultation source.

Results
A total of 237 participants completed the study ( Table 1). The majority were female (73.4%), aged over 45 years old (65.0%), and educated with a degree or higher (54.9%). Most had no prior knowledge of chatbots (59.5%) but were confident in their understanding of the technology once the concept was explained (70.9%).
A significant main effect for consultation source ( Table 2) F(2, 372) = 33.85, p < .001, partial η 2 = 0.15 was found. GPs were reported as the most acceptable consultation source (M = 3.96, SE = 0.08), followed by a GP-chatbot combination (M = 3.43, SE = 0.11) and chatbots alone (M = 3.08, SE = 0.11). There was a significant interaction between acceptability and severity F(2, 372) = 118.14, p < .001, partialη 2 = 0.38, with GPs (M = 4.42, SE = 0.06), and a GP-chatbot combination (M = 3.44, SE = 0.12) being most acceptable for high severity health issues while chatbots least acceptable (M = 2.68, SE = 0.12). For low severity health issues, GPs were most acceptable (M = 3.51, SE = 0.11), followed by chatbots   There was a significant interaction between stigma, severity of the health issue and the acceptability of the consultation sources ( Table 2 With the between-subject factors, there was insufficient evidence to conclude that any of the factors influenced the acceptability ratings by participants (Table 1). There were non-significant main effects for: age F(1, 186)

Discussion
As far as we know, this is the first study to assess the associations between health chatbot acceptability and perceived stigma/severity of various health conditions. It demonstrated that in comparison to chatbots participants perceived health professionals as the most suitable source of health information. Chatbots were perceived as an unacceptable intervention for discussion health conditions that were perceived as severe. Therefore, this technology as a stand-alone intervention may not be utilised by healthcare customers and should not be used as a substitute for creditable health information source from a health professional. However, chatbots could be considered as an aid for doctorpatient communication for conditions with lower perceived stigma and severity. Future research needs to define the range of these conditions and identify how chatbots may facilitate greater disclosure of sensitive information to health professionals who may then be better equipped to recommend the most relevant healthcare services.

Strengths and limitations
The study had several limitations. As it was advertised online via popular social media, the participants were more likely to be digitally literate and have access to technologies that would enable them to use chatbots. Thus, the views explored in the study may not reflect individuals with limited access to technology. The sample was highly skewed towards middle-aged female participants, which may reflect the majority of Facebook users in the UK at that time. Although Facebook advertising is considered a cost-effective and swift method for study recruitments, the participants are self-referred and may already hold strong views on the researched subject compromising the representativeness and applicability of the findings. The was also a significant percentage of missing data having an impact on the validity of the study. It is not clear why some questions were left unanswered, but this could be due to participants having a poor understanding of chatbots and not willing to respond to highly hypothetical questions. Furthermore, 'willingness to use chatbot' may not be the most precise measure of acceptability having the poor ability to predict motivation towards this technology. Other measures such as intentions or perceived likelihood to use chatbots could produce different results. To increase the representativeness or validity of the findings, future studies should consider offering survey alongside a health chatbot used in primary care, so that users can interact with the technology and provide more meaningful responses about their potential uptake and engagement. As a simulation study, the conditions participants undertook were hypothetical, and the willingness to use chatbots may be different if participants experienced these conditions. Perceptions of severity and stigma may vary, hence the chatbot acceptability could be dependent on health beliefs. Patient groups for specific health conditions may be more or less reluctant to using chatbots. For example, this technology could be appealing to young adults at risk of sexually transmitted infections where disclosure of intimate information is challenging. However, chatbots may not be suitable for those suffering from psychotic episodes who require support and human care. Although none of the demographic characteristics was associated with chatbot acceptability in our study, future research should explore whether there are specific patient groups receptive to receiving health education through this technology.
These results build on existing evidence on chatbot acceptability in healthcare and triangulate the findings of the existing qualitative research into healthcare chatbot acceptability. [9][10]20 The lower chatbot acceptability could be due to the lack of familiarity with the technology and hesitancy towards AI due to poor understanding of automated healthcare services. Only 70% of participants were confident in their understanding of chatbots after being given informational material explaining the functions and uses of health chatbots, suggesting that the concept of a computer service mimicking human responses may be difficult to comprehend. Previous studies also identified moderate acceptability of health chatbots, indicating 'AI hesitancy' in the healthcare context. 10 There is a common misunderstanding that AI drives the function of chatbots. Chatbots can be complex and driven by AI through natural language processing. However, most chatbots are rule-based, i.e. if question A appears, response B should follow, thus more successful when the query is straightforward. Users may overestimate the capabilities of chatbots and be hesitant to use them due to the perception that chatbots could in the future replace healthcare professionals.
The lack of empathy from chatbots was seen as a major barrier to acceptability explaining the preferences for a human role in the type of health consultation. This was also reflected in higher acceptability for GP-chatbot combined consultation as opposed to chatbot alone demonstrating the potential use of this technology. It is also possible that lower chatbots acceptability, particularly in the UK, is associated with a lack of perceived need for such innovation a perception that chatbots could influence the access to health professionals. 12 There is a general reluctance towards new technologies in healthcare, and future developments of chatbots and voice assistants may influence their acceptability and engagement.
Future theoretical development needs to consider how specific health issues influence the acceptability of chatbots for disease diagnosis. For example, the decision to accept a chatbot for diagnosis may be influenced by factors such as the urgency for diagnosis, the person's wellbeing and how the individual feels about their symptoms. Future research should investigate whether user perceptions of usability differ according to countries and regions. Furthermore, it is vital to assess how various beliefs about particular diseases affect acceptability and engagement with medical chatbots. More comprehensive models of chatbot acceptability and engagement in healthcare can facilitate the understanding of their usefulness and applicability.
In conclusion, at present chatbots are not perceived as a desirable health intervention, and more research is needed to identify the level of interaction that is most acceptable to patients. Primary care services could consider chatbots as a signposting tool aiding health professional or improving doctor-patient communication for low severity conditions. Future studies need to explore whether chatbots could facilitate disclosure of sensitive information within the healthcare context; however, perceived confidentiality, privacy and security of the technology need to be ensured. Chatbot-led intervention and service developers need to be aware of moderate acceptability of this technology, taking into account digital literacy and access for the users.
Funding: The author(s) received no financial support for the research, authorship and/or publication of this article.
Informed Consent: Not applicable, because this article does not contain any studies with human or animal subjects.
Supplemental material: Supplemental material for this article is available online.