An Integrative Review of Measurement Instruments Used to Assess the Stigma That Affects People Who Use Drugs

This article aims to review how existing instruments to measure stigma affecting people who use drugs have been developed, which domains of stigma are measured, as well as metrics used to validate these instruments. Using the Whittemore and Knafl’s process for conducting an integrative review, six studies published between January 2002 and April 2019 were systematically analyzed. Overall, all the studies included had good methodological qualities. The results showed that the instruments measured one or more domain of stigma. However, most of these studies use already pre-validated instrument to measure stigma in mental health and adapt to fit the people who use drugs context. Based on the findings we therefore recommend that more studies exploring the experience of people who use drugs regarding stigma, and the perceptions of service provides rendering care to people who use drugs should be undertaken to develop relevant and context-specific stigma instruments.


Introduction and Background
According to the United Nation Programme on AIDS (UNAIDS, 2016), an estimated 246 million people globally use drugs. From that number, 12 million people inject drugs, and 1 in 10 is living with HIV. In the United Stated, 22.5 million people (9.4%) use drugs particularly heroin, whose use has increased among men and women in most age groups and across all income levels (UNAIDS, 2016). In sub-Saharan Africa, heroin use and injecting drug use are increasing (UNAIDS, 2016). According to the South African Community Epidemiology Network on Drug Use (SACENDU, 2016), people who use drugs (PWUDs) exist in all major cities in South Africa (SA). This is observed across all races and different age groups, as well as in different social and economic groups. Cannabis is the most common substance abused by patients in the treatment centers in Gauteng (77%) and . In contrast, in the Western Cape, 32% of patients were admitted for Methamphetamines/Tik, whereas 28% of cannabis use was observed (SACENDU, 2016). Patients admitted for other drugs such as heroin and alcohol are reported but to a lower extends. For instance, 18% for alcohol and 12% for heroin in Gauteng, 10% for alcohol and 15% for heroin in KwaZulu-Natal, and 11% use for both in the Western Cape (SACENDU, 2016). These figures represent the proportion of drug users who are admitted at the treatment centers. However, many people in these populations do not present themselves at treatment centers due to stigma associated with drug use (Kulesza et al., 2013).
The recent stigma studies focus more on groups deemed to have high propensity to contracting or spreading infections such as men who have sex with men (MSM) and female sex workers (FSW) (Baral et al., 2014;Fitzgerald-Husek et al., 2017). However, stigma affecting PWUDs is still under researched especially in sub-Sahara Africa and SA. Just like in HIV, tuberculosis (TB) or mental disorder, stigma toward PWUDs can impact negatively on their health and the uptake of health services (Chidrawi et al., 2016;Jain et al., 2013;Kane et al., 2019;Nyblade et al., 2013). This resulted in research interest on instrument to measure this stigma. These include Brown (2011) Palamar et al. (2011) (the exposure to drug users index; the stigma of drug users scales and the drug use stigmatization scale) in New York; Ha et al. (2012) (Chinese Courtesy Stigma Scale [CCSS]) in rural China.
However, there seems to be limited measurement instruments developed to measure stigma toward PWUDs especially in Africa and SA. The lack of measurement instruments designed to fit the context of Africa has a negative impact on planning and developing appropriate interventions to reduce stigma for PWUD. Therefore, this article presents an integrative review of stigma measurement instruments. The review focuses on the main domains of stigma that are measured. The purpose is to review and summarize how previous existing instruments were developed. In doing so, the review highlights the different domains of stigma being measured as well as how these were validated. Then the review also highlights strength and limitations of the current state of research related to the development of instruments to measure stigma among PWUDs. Finally, recommendations are made for future researches on instruments to measure PWUDs' stigma for evidence based intervention.

Theoretical Framework
In his work on stigma, Goffman (1963) identified three main types of stigma, namely stigma based on physical traits such as a disability; stigma related to character traits such as dishonesty and mental disorder; and stigma related to group traits such as race or religion. The stigma toward PWUDs is part of the second type of stigma related to the character of PWUDs, similar to other key populations. Therefore, this review of measurement instruments of stigma is guided by the HIV stigma framework as developed by Stangl et al. (2012). This framework highlights key domains for program implementation and measurement. These include the drivers of stigma, the stigma marking which in this study is PWUDs stigma, the stigma manifestations, the stigma outcomes, and the stigma impacts. According to these authors, all these domains are related. However, the intervention or measurement domains include the drivers of stigma, stigma marking, and manifestations. The authors further pointed out that among drivers of stigma, the domain for measurement include the fear of contact with PWUDs, the social judgment, and the societal policies (Stangl et al., 2012; Figure 1). Besides, drivers of stigma such as family members, friends, or the policy in place continue to demonize PWUDs and contribute to increase their believe that they are what people said they are. As the result, PWUDs tend to self-isolate. Therefore, acting on the drivers, the stigmatized, as well as the manifestations is important in stigma measurement studies (Stangl et al., 2012).

Drivers of Stigma Toward PWUDs
Stigma can be referred to as a social process that can be manifested by exclusion, rejection, blame, or devaluation as a result of experience or reasonable anticipation of an adverse social judgment (Hargreaves et al., 2016;Stahlman et al., 2017). The growing key populations which includes men who have sex with men, trans genders, sex workers, and PWUDs are constantly stigmatized all over the world because of their chosen lifestyle which some view as "abnormal" according to the "acceptable" social constructs (WHO, 2016). Stigma toward these key populations is fueled by the perception that their practices expose them to a high risk of being infected with HIV or Hepatitis (Fitzgerald-Husek et al., 2017;University of California, 2015;WHO, 2016).
Stigma toward PWUDs in particularly is driven by many factors such as stereotyping from the general population, family, and peers; lack of support; social and structural norms and values which may act as facilitators that perpetrate stigma (Hargreaves et al., 2016). For instance, policies that criminalize drug users may fuel stigma while those that protect the right of these individuals may reduce stigma (Stangl et al., 2012). Similarly, in the conceptual model applied by the World Psychiatric Association, it is highlighted that once a negative characteristic is applied on the person, negative discrimination follows, resulting in more disadvantages that on their own contribute to lower selfesteem and resistance, which in turn increase the vulnerabilities, creating a vicious circle (Sartorius, 2006). According to the International Network of People who Use Drugs (INPUD, 2014b), the unknown facts and the criminalization of drug use fuel stigmatizing behavior toward PWUDs. The network indicated that inaccurate understandings of drugs have fed through into how people who use drugs are seen. The widely held, generalizing, and unscientific position that illicit drugs are "bad" informs the understanding that PWUDs are bad too. In many communities, drug use is viewed as unacceptable and criminal and therefore, PWUDs by default are stigmatized as deviant criminals (INPUD, 2014b). These conceptions which PWUDs nurture result in different manifestations.

PWUDs' Manifestation of Stigma
PWUDs just like any other individual with stigmatized attributes experience three different manifestation of stigma which can be considered separate but have correlating constructs. These include the internalized, perceived, and experienced stigma. Internalized stigma can be thoughts and behavior resulting from individuals' negative perception about themselves (Birtel et al., 2017;Hargreaves et al., 2016). Stigma can also be perceived; that is when PWUDs believe or expect individuals or the societies to have negative attitude toward them (Stahlman et al., 2017). PWUDs can also experience an overt or covert discriminating behavior toward them and this is termed experienced stigma (Birtel et al., 2017). These different manifestations of stigma are not only observed form the public toward PWUDS, but among PWUDs community as well. This is sustained by the INPUD (2014a), which emphases that PWUDs can distance from and stigmatize one another. This stigma among themselves is marked when they don't share the same drug use or use different drugs with different regularity. Consequently, stigma may have a negative impact on the stigmatized individuals as a whole and the society where they live.

PWUDs' Stigma Outcome
Stigma adversely impacts individual health outcomes as well as educational opportunities, employment, housing, and social relationships (Kane et al., 2019). PWUDs face a double challenge in society. They have to manage the primary symptoms of their condition, and face severe stigma attached to their condition. Regardless of the way PWUDs experience stigma, it affects them negatively. In addition, because PWUDs do not received any form of sympathy from the general public, they live in fear of being stigmatized (anticipated stigma). This in turn increases their isolation and alienation from the broader society consequently negatively impacting on their physical as well as mental health and general wellbeing (INPUD, 2014a).

Aim
The purpose of this integrative review is to describe stigma domains and attributes of existing instruments that measure stigma toward PWUDs. The review question is formulated based on the PICO criteria: The review population (P) is PWUDs. The index text or Phenomenon of interest (I) here is the instrument to measure stigma, there is no comparator (C) and the outcome (O) is the different domains of stigma and the characteristics described in the existing instruments. Hence, two review questions are formulated: (a) What are the stigma domains described in the existing instruments to measure stigma among PWUDs? (b) What are the psychometric properties of these instruments?

Design
The researchers used Whittemore and Knafl (2005) updated methodology for integrative review framework to guide the review process. This framework is suitable to summarize past empirical or theory related literatures to provide a comprehensive understanding of different instruments to measure stigma affecting PWUDs. The integrative review methodology as well as any other review began with the identification of the problems and its related concepts which enabled data extraction from the primary empirical or theoretical sources. Then followed the literature search strategies which included the inclusion and exclusion criteria for relevance of primary source, the search terms, then the data were evaluated against standard criteria. Once this was done, the selected primary source were organized into groups and subgroups to prepare for data extraction and reduction. The data were then arranged in a format that will enable the visualization of patterns, relationship, and variation among the groups as the iterative method of qualitative research (Madhani et al., 2014;Whittemore & Knafl, 2005).

Literature Search Strategy
The researchers conducted an initial limited search of MEDLINE and CINAHL followed by an analysis of the text words contained in the title and abstract, and of the index terms used to describe article. A second search using keywords and synonyms was undertaken across all included databases and as per data base search criteria (for instance, Mesh in PubMed, descriptor in PsycArticle). The Boolean operators "AND" and "OR" were used to combine all concepts. The search terms for both levels were: People who use drugs AND stigma AND tools; Drugs users AND stigma AND tools. Using these terms combined, the following databases were searched COCHRANE, PSYCINFO, PUBMED, ENMBASE, Science Direct, SCOPUS, SocINDEX, Academic Search Complete, Eric, SABINET, Health resources, and the World Health Organization (WHO) Global Health Library Regional Indexes ( ). The results were imported into Endnote for further processing. Finally, the reference lists of key articles identified was hand searched to identify further relevant articles (Madhani et al., 2014).

Inclusion Criteria
The studies that were included in this review were primary researches. The selected studies were published in English. All studies directly developing an instrument to measure stigma that affects PWUDs identified were assessed for relevance based on the title and the abstract. These studies were published between January 1, 2002, and April 29, 2019. This period was chosen because more stigma conceptualization and instrument development happened in the years 2000s (Holzemer et al., 2007;Link & Phelan, 2001;Parker & Aggleton, 2003). We anticipated that this period will provide us with relevant articles and recent evidence related to the topic if they exist. Studies that do not directly develop instruments to measure stigma affecting PWUDs were excluded (for instance, studies that measure an event such as delay in HIV testing among PWUDs and attribute its occurrence to stigma without measuring this actual stigma). In addition, there was no restriction regarding the setting or the country where the studies were conducted.

Articles Selection
Overall, there were 562 articles found across all databases, and these were imported into Endnote version 7.1 reference manager software. The first step in the Endnote was to remove all duplicate. Then all (503) irrelevant articles and non-articles were removed. The remaining 59 articles were exported into a rich text format for abstract screening. The abstracts of the 59 remaining articles were assessed based on the PICO criteria and further 53 articles were removed. The remaining six articles were exported into an excel spread sheet for methodological quality assessment using the Joanna Briggs Institute (JBI) critical appraisal tool.

Data Evaluation
The selected studies were mostly cross-sectional; therefore, the JBI (2017) critical appraisal check list for cross-sectional studies was used. It is a list of eight questions with four possible answers (Yes, No, Unclear, and Not applicable). Although the tool did not have a score, we decide to give a score of 1 to all questions with a Yes for an answer and 0 to all question with a no for an answer, and 0.5 for all question with an unclear answer. The not applicable answer was not given any score. The authors decided that studies with score between 0 and 2 were considered as poor quality. The score between 3 and 4 were considered as fair quality and score between 5 and 6 were of good quality. At the end, all six articles assessed for quality were good and were included in the review as indicated in Table 1.

Data Extraction
All six studies with good methodological quality were retained and data extraction was conducted using the JBI extraction tool. All the studies portrayed in the different articles applied a quantitative approach. Therefore, the tool used to extract the data was the JBI-MASTARI. The data extracted include details about the study method, the population, the settings, the outcome of significance to the review question and objectives. That is, the domain of stigma described the psychometric properties of the developed instruments. The limitations and the recommendations were also extracted.

Data Reduction, Representation, and Comparison
At this stage, data were extracted and represented numerically and textually to facilitate the systematic comparison. These data were reduced according to the methodology, the outcome, limitation, and recommendation.

Data Reduction According to the Methodology
The methodological data extracted included the participant, setting, the recruitment plan, sampling data collection tool, tool validity, and the data analysis and interpretation (Table 2).

Data Reduction According to the Findings or Outcomes of the Selected Studies
These data were extracted according to the studies' outcomes, then according to the psychometric properties of the final instruments. These are illustrated in Table 3.

Results
The synthesis of the integrative review consists of the overall quality of the selected studies for the review and the discussion of the answers to the review based on the analysis.

Analysis of the Review Questions
The researchers in this review sought to answer these two questions: (a) What are the stigma domains describe in the existing instruments that measure stigma among PWUDs? and (b) What are the psychometric properties of these instruments?

What Are the Stigma Domains Describe in the Existing Instruments That Measure Stigma Among PWUDs?
With regard to the first review question, all the selected studies developed their instruments to measure one or more stigma domains. Out of the six studies included, two (Luoma et al., 2013;Smith et al., 2016) studies measured self-stigma in substance users. Four studies (Brown, 2011;Ha et al., 2012;Luoma et al., 2010;Palamar et al., 2011) measured perceived public stigma toward substance users. Smith et al. (2016) measured enacted and anticipated stigma. The same article also measured internalized stigma (self-stigma). Brown (2011) also measured social distance, exposure to drug user and negative thoughts toward substance users. The authors grouped these social distance stigma and negative thought as perceived public stigma. Palamar et al. (2011) also reported on stigmatization and exposure to drug users.

What Are the Psychometric Properties Used to Ensure the Validity of These Instruments?
As far as the second review question is concerned, all articles reviewed presented the statistical tests used to analyze the reliability and validity of the developed instruments as well as the methodology followed.

Measure to ensure reliability and validity of the developed instruments
Statistical tests conducted. Each selected study reported more than one statistical measures used to establish correlation between variables and to validate their developed instruments. For instance, all studies included in this review reported that the instruments they developed were tested for  Where the study subject and the setting described in detail?  internal consistency. Five studies calculated the factor analysis which includes the exploratory and confirmatory factor analysis (Ha et al., 2012;Luoma et al., 2010Luoma et al., , 2013Palamar et al., 2011;Smith et al., 2016). Four studies tested their instruments for convergent and discriminant validity (Luoma et al., 2010(Luoma et al., , 2013Palamar et al., 2011;Smith et al., 2016). Two studies tested for structural Validity (Luoma et al., 2013;Smith et al., 2016). Two studies tested the construct validity (Brown, 2011;Smith et al., 2016). Another study tested for incremental validity comparative and incremental fit index (Palamar et al., 2011). Two studies calculated the root mean square error of approximation (Ha et al., 2012;Palamar et al., 2011). Another study calculated the factor loading of item (Ha et al., 2012) as indicated in Table 4.
Review of the scales used in the studies. Of the six studies selected, five reported that they reviewed the items included in their scale with some experts and one did not state that. For instance, in the study by Luoma et al. (2010), review experts were people who had previously published an article in a peer reviewed journal for substance uses. Luoma et al. (2013) reported that their items were reviewed by three judges. Smith et al. (2016) reported that their scale was informed by literature and constant discussion between researcher and providers that serve substances to users. Palamar et al. (2011) reported that their item pool was reviewed by two experts in the psychology of risk behavior. Ha et al. (2012) reported that they requested respondents to comment or provide suggestion regarding the understanding and the wording of each item in their scale.

Data Collection Through Qualitative Interviews.
Out of the six studies selected, Ha et al. (2012) conducted individual interview with parents of students. Luoma et al. (2013) reported that they conducted two focus group discussions with patients in addiction treatment centers and with health professionals. However, none of the authors stated the analysis and result of those interviews.

The Methodology Presented
The quality of the selected studies was good as the methodology processes of the selected design were followed, with limitations and recommendations outlined. This quality was evaluated at the data evaluation stage using the JBI critical appraisal tool. There were two criteria in the tool which were not applicable to the study. Therefore, the final score of the tool were out of 6. Hence, the studies by Brown (2011) (Table 5).

The Setting
Overall, most of the selected studies-Brown (2011)

Participants, Sampling, and Sample Size
There were variations in the selection method and the population included in the selected studies. The population in the selected studies included people who use drugs and non-drug users. More specifically non-drug users were college students from University (Brown, 2011); adult internet sample (Palamar et al., 2011); and middle school children and their parents (Ha et al., 2012). The drug users' population were included in the selected studies regardless of the route of administration, including those in treatment centers. One study (Smith et al., 2016) reported that the population included in their study were from an existing parent study which evaluated the efficacy of a group-based HIV prevention intervention for patient enrolled in a methadone maintenance therapy (MMT). Therefore, the sample for this particular study was HIV negative diagnosed as opioids dependents and enrolled in the MMT. The same   study reported another group of participants which form the second sample selected also from an existing parent study of retention in HIV care. Therefore, the second group of participants for their study was HIV positive patients accessing HIV clinical care and/or buprenorphine for opiate replacement therapy (ORT) (Smith et al., 2016). One study reported that their sample population were men and women who were receiving residential or outpatient substance abuse treatment (Luoma et al., 2013). Another study reported that participants were male and female in treatment for substance use problems at an outpatient or inpatient addictions treatment program (Luoma et al., 2010). In the selected articles, the probability sampling as well as the non-probability sampling techniques were used to recruit participants. One article reported that in their study they randomly selected the school to include in the qualitative and the cross-sectional study. Two studies reported that their participants were recruited from an existing parent study (Smith et al., 2016). One article clearly reported that they used a convenience sampling in the first part of their study and then targeted purposive sampling (Ha et al., 2012). One article reported that participants were alert to the study by staff who were not affiliated with the treatment center and participants who wanted to participate in the study left the group session to go and complete the questionnaire (Luoma et al., 2010). Another study reported that to recruit their participants, staff arrived at the treatment group and asked for volunteer to take part (Luoma et al., 2013). The overall sample size ranged from 178 to 1,048 in the various selected studies.
The selected studied used different designs to develop their instruments. Four used cross-sectional design (Ha et al., 2012;Luoma et al., 2010Luoma et al., , 2013Palamar et al., 2011;Smith et al., 2016). One used survey (internet and paper survey). Luoma et al. (2013) used two focus group discussions with users and with those who provide service to them, whereas Ha et al. (2012) used qualitative individual interview as a data collection method.

Data Collection Tools
The common pattern of the selected studies was the use of different previously validated tools to collect the data in the form of self-report or self-administered questionnaires ( Table  2). Out of the six studies selected for this review, four studies reported that they used previously validated instruments for mental health and adapted them for substance use (Brown, 2011;Luoma et al., 2010Luoma et al., , 2013Palamar et al., 2011). Ha et al. (2012) also used previously validated instruments but did not report what scale was measured. Smith et al. (2016) developed their scale in parallel with the HIV stigma mechanism scale and the stigma framework.

Ethics, Limitations, and Recommendations
Four studies included in this review stated that the protocol used in their studies was approved by their respective review boards (Brown, 2011;Ha et al., 2012;Luoma et al., 2013;Palamar et al., 2011). Two studies did not state whether they obtained approval from their review board prior to commence their studies (Luoma et al., 2010;Smith et al., 2016). All included studies reported the different limitations which could impact the study and some recommendations for improvement.

The Domains of Stigma
The results of this integrative review suggest important points to consider when developing an instrument to measure stigma among PWUDs. That is the domains of stigma to be measured and how to ensure the reliability and validity.
Overall, the studies reviewed measured five domains of stigma. Each study reviewed measured one or more of these stigma domains. This includes self-stigma; perceived stigma, enacted stigma; anticipated stigma; stigmatization. Another study measured the exposure to drug users' index. It is noteworthy to point out that of all the studies reviewed, only one study collectively captured the enacted, anticipate, and internalized stigma in their measurement (Smith et al., 2016). One study measured perceived public stigma, stigmatization, and exposure to drug user index (Palamar et al., 2011). According to article by Smith et al. (2016), stigma affecting PWUDs happens at different levels that can be classified as individual, social, and structural. The authors stress out that structural social stigma experience by PWUDs are ideas that the society does not give any value to PWUDs and therefore develops laws and policies to penalize them. Such laws and policies that criminalize PWUDs contribute to increase the harm that population may experience rather than reducing them. These actions toward PWUDs lead them to believe that they are not worthy of any consideration as human being. Thus, PWUDs are perceived as having a "spoiled identity" (Goffman, 1963). They are faced with labeling, stereotyping, loss of status, separation, and discrimination within the society (Link & Phelan, 2001;Phelan et al., 2014). This is where these individuals begin to feel and believe that they are not worthy and experience also some types of behaviors from others which contribute to increase the negative perception they have about themselves. Therefore, from the social and structural action, PWUDs begin to experience, anticipate, and internalize different types of stigma. Therefore, a study measuring stigma among PWUDS should not focus only on a single domain of stigma, but target as many domains as possible, namely self-stigma; perceived stigma, enacted stigma; anticipated stigma. This is in line with Phelan et al. (2014) who stated that stigma is a macro-level process, its impact on health should be studied intensely and established, covering all domains as further explained below.

Self-Stigma
Self-stigma or internalized stigma was measured by three studies. This domain was important to measure as it is reported that stigma is mostly felt by the affected individuals. It can be negative thoughts and behaviors (internalized stigma) resulting from an individual negative perception about himself or herself (Hargreaves et al., 2016). Internalized or self-stigma of PWUDs means that the PWUDs begin to believe the broader view, misconceptions, and generalization that are made about them. Therefore, they sometimes view themselves as less worthy which negatively impact on their confidence and self-esteem and lead them not to seek any health and social service (Stahlman et al., 2017;Stangl et al., 2019). This in turn increases their isolation and alienation from broader society which they perceived to have negative attitude toward them. As a result, their physical as well as mental health and general well-being are negatively affected (INPUD, 2014a). According to Fuster-Ruizdeapodaca et al. (2014), internalized stigma leads to feelings of blame, self-contempt, hopelessness, low self-esteem, and low social support. Therefore, it can be said that self-stigma arise from actual or anticipate public attitude toward the stigmatized (Hing & Russell, 2017). Kane et al. (2019) in their review highlighted that in the case of HIV, both internalized and experienced HIV related stigma have been associated with increased prevalence of HIV, poor health seeking behavior, and severe depression.

Enacted Stigma
This domain of stigma was measured in one study in combination with anticipated and internalized stigma (Smith et al., 2016). Stigma can also come from a discriminating behavior toward a person being stigmatized (enacted or experienced stigma) (Birtel et al., 2017;Stangl et al., 2019). Moreover, enacted stigma is the individual PWUDs' personal experience of prejudice or discrimination toward them from peers, families, community, and so on. They may be held responsible for their condition by the health service providers or labeled a thief every time something goes missing in the family or the neighborhood (Smith et al., 2016). This stigma experienced by PWUDs could impact on their present as well as their future depending on their ability to cope. They can for instance develop or anticipate certain behaviors as a result of their past and present experience which have a negative effect on their life satisfaction as well as their mental well-being (Hing & Russell, 2017).

Anticipated Stigma
Stigma can also be anticipated: that is perception of PWUDs of their devalued status or the expectation of discrimination based on their status (Hargreaves et al., 2016;Smith et al., 2016;Stangl et al., 2019). For instance, they might anticipate that because of their drugs uses, they might not be taken into consideration if they seek medical attention for a specific health condition. As a result, they end up staying at home with their condition, which can only get worse in some instances (Smith et al., 2016). According to Kane et al. (2019), anticipated, experienced, and internalized stigma have been repeatedly associated with decreased voluntary HIV testing and disclosure of infection.

Perceived Stigma
Perceived stigma is considered to be one of the most important factors that have a negative influence on PWUDS. All types of perceived stigma further exert stress and restrict normal participation in society. This domain of stigma was also measured in four studies (Brown, 2011;Ha et al., 2012;Luoma et al., 2010;Palamar et al., 2011). Stahlman et al. (2017) reported that perceived stigma occurs when PWUDs believe or expect individuals or the society to have negative attitude toward them. These individuals or society are part of the general population and form part of the public. Therefore, perceived public stigma is the perception of the PWUDs of the extent to which the general public may have the negative attitudes, beliefs, and behaviors toward them (Ha et al., 2012;Palamar et al., 2011;Stangl et al., 2019). Consequently, as pointed out above, this stigma results in poor physical and mental well-being. Therefore, perceived stigma can lead to a point where PWUDs actually experience an overt mark of rudeness or discrimination. Kane et al. (2019) reported that perceived stigma can lead to poor adherence to medication. This negatively affects health outcome. For instance, public stigma and self-stigma impact negatively on an individual predisposition to help seeking (Fuster-Ruizdeapodaca et al., 2014). Several aspects of public stigma can be expected to contribute to self-stigma. One aspect is the public characterization of the stigmatized condition (Hing & Russell, 2017).

Stigmatization
This domain of stigma was measured in one study in conjunction with perceived public stigma. According to Palamar et al. (2011), stigmatization can be all the negative behaviors such as labeling, unfair treatment, criminalization, blame, shame, rejection, and exclusion portrayed toward PWUDs by the society. This attitude is important to be measured as it explains the responses of the society toward PWUDs. Mostly because using drugs for non-medical purposes is considered to be deviant regarding certain values and beliefs of the society. Palamar et al. (2011), in their study considered the level of exposure to stigmatization among substance users. They argued that when people are in contact with a stigmatized group such as PWUDs, the misunderstanding about them is clarified; a sense of acceptance is created which lowers the level of stigma. Brown (2011) also measured the level of exposure with substance users in his study by adapting the familiarity questionnaire targeted originally for mental health.

The Psychometric Property
One of the major strengths of a scale development is the reporting of the different psychometric properties of the instruments. In the studies reviewed, the authors explained the steps they undertook to develop the instruments and reported the statistical tests they conducted to establish the reliability and validity of their instruments.

Reliability
The reliability of a research instrument is the extent to which the instrument yields the same result on repeated measures. That means it can be used by several different researchers under stable conditions, and the result will not change (Heale & Twycross, 2015). The articles reviewed used mostly already validated instruments to measure stigma in mental health to design their instruments for PWUDs. By doing so, it is an added advantage as this increases the reliability and the validity of the study. However, using such instruments in the case of measuring stigma in PWUDs implies that their emic perspective is not taken into consideration, whereas the said instruments were developed for them. An example is that Scott and Wahl (2011) emphasized that substance disorders should be viewed as a mental problem. But in most instances, the general public tends to regard it without any form of sympathy than other forms of mental conditions. The authors further pointed out that this may explain the greater societal acceptance of stigmatizing attitudes and behaviors toward those who use drugs. Therefore, instruments to measure stigma in PWUDs should consider their perception which will also increase the true reflection of the design instrument and reliability.
Validity and reliability increase transparency and decrease opportunities to insert researcher bias. Moreover, when a researcher does not assess for reliability and validity of the research, it becomes hard to describe the impact of the instrument error on the variables to be measured or whether to implement the study findings into practice (Heale & Twycross, 2015;Mohajan, 2017). Heale and Twycross (2015) describe three main attributes of reliability which include internal consistency, stability, and equivalence. All the articles included in this review established the internal consistency of their instruments.

Internal Consistency
This is the degree to which all aspects of the instrument measure one construct. The internal consistency is assessed using test such as the split-half reliability, Kuder Richardson coefficient (KR-20) and Cronbach's alpha. It is noted that Cronbach's alpha reliability appears to be the most frequent test used in research. It is expressed in form of correlation coefficient which expresses the relationship between the error variance, the true variance, and the observed score. It varies from 0 to 1. The closer to 1 the coefficient is, the more reliable the instrument is. The score of zero (0) indicates that there is no relationship between variables (Heale & Twycross, 2015;Mohajan, 2017). Alpha values above 0.7 are often considered acceptable, above 0.8 are good, and above 0.9 reflect exceptional internal consistency. In the social sciences, acceptable range of alpha values varies from 0.7 to 0.8 (Mohajan, 2017).
All studies reported an acceptable Cronbach alpha for reliability. These include an overall α = 70 for the Standardized Measure for Substance Use Stigma (Brown, 2011); an overall full scale α = .86 for the substance abuse self-stigma (Luoma et al., 2010); α = .73 for the Perceived Stigma of Addiction Scale by Luoma et al. (2010). High internal consistency was reported with the substance use mechanism scale α = .90-.93 (Smith et al., 2016); Palamar et al. (2011) established the reliability across each type of drugs. For the stigmatization scale, α = .88 for Marijuana; Cocaine: α = .34; Ecstasy: α = .84; Opioids: α = .83 and Amphetamine: α = .84. For the perceived public stigma scale the authors reported α = .82 for Marijuana, α = .77 for Cocaine; α = .78 for Ecstasy; α = .81 for Opioids and α = .79 for Amphetamine. The exposure to drug users' index has also an acceptable reliability for each of these drugs, respectively. This includes α =.79 for Marijuana; α = .79 for Cocaine; α = .77 for Ecstasy; α = .82 for Opioids and α = .82 for Amphetamine. Ha et al. (2012) in their instrument reported an internal consistency of .78 for drug use public stigma. Moreover, to further establish reliability of their instrument some studies also conduct the following test: Split half (Ha et al., 2012); Kurder Richardson (Palamar et al., 2011) calculated the mean correlation of items (Luoma et al., 2013;Smith et al., 2016).

Validity
Validity refers to the degree of appropriateness of the conclusion derived from empirical evidence. Validity is applied to a specific purpose or use and therefore is not valid for all purposes. Validity of the research lies in part on the data collection tool and the steps used to collect the data, analysis, and report of the findings; briefly the overall research process (Heale & Twycross, 2015). Many authors reported three main type of validity important to ensure the accuracy of a research study; this includes the content validity, criterion validity, and construct validity (Brink et al., 2012;Heale & Twycross, 2015;Mohajan, 2017).
Content validity is the level of accuracy of an instrument regarding all the constructs of the instrument (Heale & Twycross, 2015). It ensures that the questionnaire includes adequate set of items that tap the concept. The more the scale items represent the domain of the concept being measured, the greater the content validity (Mohajan, 2017). According to Mohajan (2017), there is no statistical test to determine whether a measure adequately covers a content area. Content validity usually depends on the judgment of experts in the field. The author further stressed that content validity can be grouped into face validity and logical validity. In the selected studies for this review, Ha et al. (2012); Luoma et al. (2010Luoma et al. ( , 2013; Palamar et al. (2011);and Smith et al. (2016) reported having their instrument reviewed by experts.
Construct validity is the accuracy of an instrument regarding a specific construct. This means that it is used to make sure that the instrument measures exactly the construct intended to measure (Heale & Twycross, 2015). It also includes the testing of a scale regarding the hypotheses from the theory in relation to the nature of the construct (Mohajan, 2017). According to Mohajan (2017), construct validity can be tested on one hand using expert who are knowledgeable about the construct since they are able to decide what an item is intended to measure after careful examination of the said item. On the other hand, the correlation analysis, factor analysis, and the multi-trait, multi-method matrix of correlations can also be used to test for construct validity, convergent validity, and discriminate validity. The studies reviewed evaluated the construct validity of their scale as follow: convergent validity and discriminant validity (Luoma et al., 2010(Luoma et al., , 2013Palamar et al., 2011;Smith et al., 2016), exploratory factor analysis (Ha et al., 2012;Luoma et al., 2010Luoma et al., , 2013Palamar et al., 2011), confirmatory factor analysis (Ha et al., 2012;Palamar et al., 2011;Smith et al., 2016). Some studies calculated the factor loading of the item. That relates to how strongly an individual item is associated with an extracted factor and decides which items to eliminate from the instrument (De Vellis, 2003). Another study established the incremental fit index (Palamar et al., 2011); the root square means error approximation (Ha et al., 2012); and Hypothesis testing (Smith et al., 2016).
Criterion validity is the relation of other instruments, which measure the same variable (Heale & Twycross, 2015). It is used to predict future or current performance. It correlates test results with another criterion of interest (Burns et al., 2017). Criterion validity can be checked in a research study by concurrent validity or predictive validity. Criterion validity was done in Luoma et al. (2013).
One study reviewed (Smith et al., 2016) clearly stated the measurement format they used which was a 5-point Likerttype scale with the highest score indicating greater substance use stigma. The measurement format is important in scale development as it quantifies the variables to be measured as pointed out by De Vellis (2003).

Setting
All articles reviewed mentioned the setting where the research was conducted. However, they were no description of such setting. Five studies were conducted in the United State and one in China. The lack of diversity in the area of study within the PWUDs population could be explained by the fact that due to stigma, this key population is neglected and underserved as pointed out by the WHO (2016). Also, the stigma affecting them is still under researched, as they are often held responsible for the choice they made. Consequently, programs serving PWUDs population are often small-scale, and coverage of interventions and services for these communities remains low (WHO, 2016). Parker and Aggleton (2003) emphasized the role of social context in the construction of stigma by arguing that stigma operates at the intersection of culture, power, and difference. This is further sustained by Holzemer et al. (2007) who stated that the stigma process occurs within three contextual factors: (1) the environment, which includes the cultural, economic, political, legal and policy environment; (2) the health care system; and (3) the agent which includes the person, family, workplace, and community. Therefore context-specific stigma studies where the PWUDs, their environment and those who provide services to them are important, to plan sound stigma reduction interventions.

Recommendations
Overall, all the studies included had a good methodological quality. The steps used to design their instruments were highlighted. The authors provided various psychometric tests conducted to ensure the reliability and validity of the designed instruments. It is important to note that these existing instruments to measure stigma in PWUDs are a strong key to plan stigma intervention in this population. The domains of stigma measured by these articles were perceived stigma, enacted stigma, anticipated stigma, self-stigma and stigmatization, as well as exposures to drug users. However, these domains were mostly measured as a single concept in most to the studies. Moreover, the context in which these instruments were developed may not be applicable to other settings. For instance, most studies reviewed were developed in the United State which is among the countries where the economic is in boom, compared with Africa where all the countries are affected by high level of poverty. Therefore, there is a need to develop context-specific instrument which will be in line with the context and culture of a specific region to address the drug use stigma in that region. More specifically recommendation is made about developing a South African specific instrument, as the phenomenon of stigma and drug use may have context-specific differences when considering the culture.
In addition, from the results of this integrative review, it is noted that most of the articles selected for review used prevalidated stigma instruments used in the field of mental health and adapted to drug use stigma. This implied that people who use drugs' experience of stigma was not sought prior to designing the used instruments. Therefore, it is recommended that a research study be conducted to explore the experience of PWUDs regarding the stigma that affect them will provide information needed to generate items which will be included in the instrument to be developed. This will be the start point as the item include in the instrument will be derived from their personal feelings and experiences and thus will reflect a clear picture of the stigma measurement to be used in the population. In doing so, PWUDs will be involved in the development of the instrument intent to measure their stigma from the beginning, allowing inclusion of aspects relevant to their context. Moreover, as the included studies measured some of the domains, of stigma, a recommendation is made to develop a tool measuring all domains of stigma. Finally, since the studies used already existing instruments, a context-specific tool developed from data from service providers and PWUD would provide a more comprehensive tool.

Limitation
This review only retrieved studies which developed instrument to measure drug use stigma and specifically studies published in English. This may have impact in the number of studies found on records.