Violence risk assessment in psychiatric patients in China: A systematic review

Objectives: The aim of this study was to undertake a systematic review on violence risk assessment instruments used for psychiatric patients in China. Methods: A systematic search was conducted from 1980 until 2014 to identify studies that used psychometric tools or structured instruments to assess aggression and violence risk. Information from primary studies was extracted, including demographic characteristics of the samples used, study design characteristics, and reliability and validity estimates. Results: A total of 30 primary studies were identified that investigated aggression or violence; 6 reported on tools assessing aggression while an additional 24 studies reported on structured instruments designed to predict violence. Although measures of reliability were typically good, estimates of predictive validity were mostly in the range of poor to moderate, with only 1 study finding good validity. These estimates were typically lower than that found in previous work for Western samples. Conclusion: There is currently little evidence to support the use of current violence risk assessment instruments in psychiatric patients in China. Developing more accurate and scalable approaches are research priorities.


Introduction
Treatment practice guidelines in many Western countries recommend the assessment of violence risk in individuals with serious mental illness, particularly schizophrenia (American Psychiatric Association, 2004;McGorry et al., 2005;National Institute for Health and Clinical Excellence, 2009). Until late 2012, however, there were no national mental health laws in China and no legislation to mandate the assessment of violence risk in those with a serious mental illness. Article 30 of the new National Mental Health Law, however, provides for the involuntary commitment of mentally disordered persons providing that two conditions are met: (1) the individual is diagnosed with a serious mental illness and (2) the individual poses a risk to either self or others (Shao and Xie, 2013). Both these criteria must be satisfied through a diagnostic and risk assessment (Zhao and Dawson, 2014). Survey data suggest that China has an estimated 173 million psychiatric patients (Phillips et al., 2009), and 728 hospitals as of 2012 (Chinese Health Statistics Yearbook, 2013). The introduction of this new law will therefore have widespread implications.
Traditionally, mental health professionals in China have tended to rely on unstructured clinical judgment when assessing violence risk in psychiatric patients (Ho et al., 2013). In many Western countries, however, structured assessment instruments are commonly used in both forensic and general psychiatric units for violence risk assessment (Archer et al., 2006;Higgins et al., 2005;Khiroya et al., 2009).
Although these tools are rarely used as the sole basis for clinical decision-making owing to their low positive predictive values (PPVs) (Ryan et al., 2010), the way in which the dangerousness criterion is to be operationalized under China's new mental health law is, at present, unclear (Shao and Xie, 2013), leaving the decision as to how to satisfy this requirement open to the discretion of those undertaking the assessment (Ding, 2014). Determining violence risk from structured clinical judgment (SCJ) tools may represent one approach that mental health professionals in China may adopt to satisfy this criterion. More likely, though, these tools are being introduced as part of a range of measures to improve patient care, and identifying high-risk groups could enable targeted interventions to be introduced and resources to be directed toward those at highest risk of adverse outcomes.
These instruments, however, have mostly been developed and validated in Western samples. Given that China's culture, legislation and psychiatry services are different, it has been argued that these violence risk assessment instruments may be associated with lower predictive validity when used in Chinese psychiatric populations (Yao et al., 2014b). A recent review concluded that some SCJ tools provide high levels of reliability and validity in Chinese samples, particularly the Chinese version of the Historical,Clinical, and the Violence Risk Screening-10 (V-RISK-10) (Gu et al., 2014). However, this review was limited in four ways: (1) it focused on mentally disordered offenders rather than general psychiatric patients and offender populations, (2) it did not consider three popular tools currently used to assess violence risk in China (i.e. the Violence Risk Scale-Chinese version [VRS-C], the Psychopathy Checklist-Revised [PCL-R] and the Brøset Violence Checklist [BVC]), (3) it did not compare the predictive validity of Chinesedeveloped instruments to Western-developed ones and (4) the review lacked clear inclusion and exclusion criteria.
We have therefore conducted a systematic review of the use of risk assessment instruments for the prediction of violence to synthesize the evidence base for the reliability and validity of such tools in Chinese samples. Our aim was to examine three main areas: (1) the current state of risk assessment research in China, (2) the instrument most frequently used to assess aggression and violence risk in China and finally (3) whether these instruments are associated with a similar degree of predictive validity as found in Western samples.

Search strategy
Eight computerized databases were searched for studies published between 1 January 1980 and 3 June 2014: Medline, EMBASE, PsycINFO, the Chinese Journal Fulltext Database (CJFD), the Chinese Biomedical Literature Database (CBM), National Science and Technology Library (NST), WANFANG data and the Database Research Center of the Chongqing Branch of the Institute of Scientific & Technical Information of China (CB-ISTIC). Combinations of the following keywords were used to identify relevant studies: aggression OR violence OR psychopathy AND risk assessment OR prediction. Reference lists were also handsearched to identify additional studies.

Inclusion and exclusion criteria
Studies were eligible for inclusion if they were conducted in mainland China and examined the reliability and/or validity of a psychometric tool or risk assessment instrument designed to assess or predict the likelihood of either aggression or violence. Although previous work suggests that the inclusion of studies based on the original calibration sample will lead to effect size inflation (Blair et al., 2008), we nevertheless included such studies as we wished to provide an overview of all instruments, including locally developed instruments, currently used in psychiatric practice in China.
Studies that used violence risk assessment instruments to estimate the prevalence of violence, but did not report data on the reliability or predictive validity of these instruments were excluded (Chen and Zhou, 2012). Where multiple publications used overlapping samples, we included only the study with the largest sample size to avoid double-counting.

Data extraction
Data were extracted by two researchers working independently (J.Z. and X.Z.) using a standardized form, which included information on demographic and descriptive features of the sample, and reliability and validity statistics from each study. Measures of reliability included Cronbach's alpha, the intraclass correlation coefficient (ICC), test-retest reliability, split-half reliability and the inter-rater consistency coefficient. Measures of validity included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity and positive and negative predictive values (PPVs and NPVs). No one measure of reliability or validity was preferred; rather, a combination of statistics should be examined as part of any judgment about the performance of any tool. Additionally, for locally developed tools, information on item content was also extracted. If there were any uncertainties, these were clarified in consultation with one of the co-authors (K.W.).

Characteristics of the included studies
The initial search identified a total of 528 records including 481 in Chinese and 47 in English. Another 8 records (6 in Chinese and 2 in English) were identified after searching reference lists of other reviews. Following application of the inclusion criteria, the number of potentially eligible records was reduced to 89 (64 in Chinese and 25 in English). When exclusion criteria were applied, the final number of records included in this review was reduced to 30 (22 in Chinese and 8 in English) ( Figure 1). Studies were most commonly excluded because they were not concerned with the assessment of violence risk.

Tools for aggression assessment
Six of the 30 primary studies assessed the reliability and validity of tools measuring aggression (Table 1). The instruments used for the assessment of aggression were the Modified Overt Aggression Scale (MOAS; k = 5; 83%) and a locally developed instrument (k = 1; 17%). Half of these studies were conducted in mixed adult forensic and general psychiatric samples (k = 3).

Reliability and validity of tools for the assessment of aggression
None of the six included studies of aggression tools reported information on reliability or validity. Rather, they all investigated risk factors associated with aggression. Substance abuse was most commonly identified as a significant risk factor for aggression in these studies (k = 3; 50%), followed by a previous history of aggression and/or violence (k = 3; 50%), positive symptomatology (k = 2; 33%) and impulsiveness (k = 3; 33%). Demographic factors, such as young age, unemployment and early adverse experiences, were also described as risk factors in three studies.

Reliability and validity of tools for the assessment of violence risk
Of the 24 included studies, 15 (63%) reported information on reliability, which was assessed using the following statistics: Cronbach's alpha, the ICC, test-retest Chinese language papers: k=22 English language papers: k=8 Total papers: k=30 duplicates reliability, split-half reliability and the inter-rater consistency coefficient. Most of the locally developed instruments did not report reliability and validity statistics. A summary of these statistics is provided in Table 3.
Using Cronbach's alpha, there was evidence of good reliability for five instruments: the BVC, PCL-R, HCR-20, V-RISK-10 and the LSI-R, and excellent reliability for two instruments: the VRS and HCR-20. According to the ICC, there was evidence of good reliability for the VRS and HCR-20, and excellent reliability for the V-RISK-10, the PCL-R, the VRS and the BVC. Only one study using the HCR-20 reported the test-retest reliability.
Information on validity was reported in 12 studies (50%) using the following statistics: AUC, sensitivity and specificity and positive and negative predictive values. Validity statistics are also summarized in Table 3. Using the AUC, there was evidence of poor validity for the V-RISK-10, the VRS and the HCR-20 over a 12-month follow-up period. There was evidence of moderate validity for the BVC, V-RISK-10, the HCR-20 over a 6-month follow-up period and the CRAT-P.

Discussion
As China invests more into mental health care, increasing attention will be paid to reducing adverse outcomes in patient groups. One approach that this has taken in many countries is to introduce the routine use of violence risk    assessment instruments to assist in identifying high-risk groups and manage violence risk more actively. In addition, the 2012 new National Mental Health Law may also increase the use of such instruments as an aid to clinical decision-making regarding involuntary treatment in hospital. This systematic review therefore investigated the reliability and validity of structured violence risk assessment instruments in China. A total of 15 risk assessment tools were identified, 7 involving instruments originally calibrated and validated in Western samples and 8 developed in Chinese populations. Data on both reliability and validity of these instruments were extracted from 24 studies involving 15,681 participants. Results of this review have three main implications for research into the assessment of violence risk in China and clinical practice. First, although Western-developed instruments, such as the HCR-20, demonstrated good reliability in this review, predictive validity estimates were often noticeably lower than those found in Western samples (Singh et al., 2011), suggesting there is little evidence to support the use of current instruments for the prediction of future violence risk in China at present. The lower predictive validity of these instruments observed in this review is particularly important as it suggests that these instruments should not be used as sole determinants for eligibility for involuntary detention under Article 30 of China's new Mental Health Law or for other medico-legal decisions in patients.
The lower predictive validity of existing instruments may stem from the inclusion of items within these violence risk assessment schemes that have little salience for the prediction of risk in Chinese samples. Work, for example, suggests that Asian Americans score significantly lower on a number of the historical items on the HCR-20 as compared to Caucasian patients. Instead, violence in Asian American psychiatric patients was more strongly associated with scores on the clinical subscale of the HCR-20 (Fujii et al., 2005). Further work suggests that the AUC of established violence risk assessment instruments cannot distinguish between violent and nonviolent offenders at greater than chance levels for those patients of Middle Eastern descent (Långström, 2004). The improvement of violence risk assessment in China may therefore benefit from the development of evidence-based instruments based on local research. Furthermore, the sheer scale of psychiatric patient numbers in China suggests that scalable instruments need to be developed, rather than those that require external training, take considerable time to implement and require money to use.
A number of investigations included in the review assessed validity using correlation coefficients against tools that assess aggression or psychopathy. These are of limited interest as the violence risk assessment tools considered in this review are intended to be used to predict more serious outcomes. Most included studies investigated predictive validity using the AUC. Predictive validity, however, can be broken down into two components: discrimination and calibration. The AUC, however, captures only discrimination. Given that a goal of violence risk assessment is to correctly stratify individuals into risk categories, the calibration ability of a risk assessment instrument is arguably of greater concern (Cook, 2007). As there are presently no guidelines as to how to combine aspects of discrimination and calibration (Witt et al., 2015), the assessment of predictive validity should employ statistics that adequately capture both discrimination and calibration (Singh, 2013). Recent work, for example, suggests that, at the very least, information on a combination of predictive validity estimates, including: PPVs and NPVs, sensitivity and specificity, number needed to detain (NNDs) and number safely released (NSRs) should be reported (Fazel et al., 2012). PPVs represent the proportion of patients predicted by an instrument to be at risk of violence who ultimately do commit a violent act while NPVs indicate the proportion judged at low risk of violence who do not commit a violent act (Singh, 2013). Greater adherence to existing guidelines for the reporting of clinical risk prediction research may also help to improve the reliability and applicability of work in this area (Bouwmeester et al., 2012).
Finally, we were unable to undertake a meta-analytic summary of the predictive validity of these instruments as the information required to calculate pooled AUCs was not routinely reported in the studies included in this review. While this approach may allow for comparison with the performance of these tools in Western samples, our focus in this paper was to evaluate the extent to which these tools could be used as a basis to justify involuntary treatment under China's new Mental Health Law and for clinical decision-making in Chinese settings. A comparison of the predictive performance of these instruments between countries is beyond the scope of this paper.

Conclusion
Although there are a large number of violence risk assessment instruments that are currently available to assist in the prediction of violence risk, these have almost entirely been developed and validated in Western samples. Presently, there is little evidence to support the use of these Westerndeveloped violence risk assessment instruments in China. The assessment of violence risk in this population should be sensitive to a range of factors, including ease of use, cost and possibly risk factors unique to Chinese populations. Therefore, the development of more accurate and scalable approaches should improve the assessment of violence risk in psychiatric patients in China, and are urgently required.

Declaration of interest
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.  Cronbach's alpha a 0.5 ⩾ α < 0.6 0.6 ⩾ α < 0.7 0.7 ⩾ α < 0.9 α ⩾ 0.9 References for the interpretive cut-points for the reliability and validity statistics used in this table: a Kline (2000). b Landis and Koch (1977). c Swets (1988).