Adequacy of the WHOQoL-BREF to Assess the Quality of Life of Victims of Armed Conflicts

There is a deterioration in the quality of life (QoL) of survivor victims of warlike conflicts. Because there is a need to guarantee the effectiveness of assessment tools for these populations, we studied the adequacy of the World Health Organization Quality of Life Questionnaire (WHOQoL-BREF) to assess the QoL of 1,136 surviving victims of the armed conflict in Colombia. Although this questionnaire has yielded promising results, questions remain about its psychometric suitability for specific populations. We used model modification at the item level, comparisons of models with different factor structures, and dimensionality analysis to address the psychometric problems encountered. Dimensionality analysis using a bifactor model suggests that WHOQoL-BREF total scores might be a more appropriate way of reporting results when model fit adequacy is not reached. Conclusions are offered on the psychometric properties of the WHOQoL-BREF, the evaluation of special populations, and possible strategies to address future questionnaire modifications.


Introduction
Quality of life (QoL) is the ''individuals' perceptions of their position in life in the context of the culture and value systems in which they live and concerning their goals, expectations, standards and concerns'' (World Health Organization [WHO], 1996, p. 5).The study of QoL has gained more relevance in recent years due to a broadening in focus, involving the measurement of health beyond traditional health indicators such as mortality and morbidity (WHO, 1996).However, QoL could have different meanings according to the area of application and this is reflected in the evaluation instruments.Some QoL measurements focus on a single concept, for example, emotional functioning, while others assess individual concepts, such as QoL dimensions (Fayers & Machin, 2001).The latter understand QoL as a multidimensional construct.
One of the most widespread questionnaires on QoL is the WHOQoL-BREF (WHO, 1996).It is the abbreviated 26-item version of the WHOQoL-100 (The WHOQoL Group, 1998;WHO, 1996).Although it was initially designed and used for healthy populations (Hargreaves et al., 2021;Masaeli & Billieux, 2022), it has also been used to evaluate special (Gagliardi et al., 2021;Singh et al., 2022) and clinical populations (Guaracha-Basa´n˜ez et al., 2022;Martini et al., 2022).By special population, we refer to populations not classified as clinical but who have experienced significant contextual situations that drastically affect their QoL, for example, long-term caregivers, refugees, war survivors, and more.
Contradictory results have been reported on whether WHOQoL-BREF is a psychometrically suitable measure for all populations.Lucas-Carrasco (2012) reported successful use of the WHOQoL-BREF to assess psychiatrically and physically affected populations as well as healthy populations.On the contrary, other authors (Benitez-Borrego et al., 2014;Lo´pez Huerta et al., 2017;Ohaeri et al., 2007;Oliveira et al., 2016) found psychometric issues with the WHOQoL-BREF when evaluating similar populations.
Given the current world scenario where multiple armed conflicts affect people's lives, there is a need to ensure tool effectiveness when assessing these populations.More effective studies on the psychological impact of war will support the development of better strategies to improve the victims' QoL.This is particularly relevant in victims of the armed conflict in Colombia (Hewitt-Ramı´rez et al., 2020) where low-to-medium QoL values have been reported (Simancas-Ferna´ndez et al., 2021).
To our knowledge, the WHOQoL-BREF has not been fully assessed in terms of psychometrics for the specific situation of victims of armed conflicts.Thus, this study's aim was to evaluate the use of the WHOQoL-BREF to report QoL in the victims of armed conflict in Colombia.

Psychometric Issues of the WHOQoL-BREF
QoL, evaluated by the WHOQoL-BREF's manual, comprises four domains: physical, psychological, social relations, and environment.The extended 100-item version of the questionnaire (WHOQoL-100) includes two more domains: Level of independence and Spirituality.The factorial structure of the WHOQoL-BREF has been explored because the results persistently suggest a poor model fit.For example, Yao et al. (2002) and Vinaccia et al.(2006) found factorial discrepancies with the four-factor structure when studying Taiwanese and Colombian samples, respectively.Oliveira et al. (2016) found a five-factor model more suitable for a psychiatric sample as the original model raised concerns at the factor and item levels.In addition, these authors excluded Items 12-14 and 24-26 from the WHOQoL-BREF due to low loadings.Ohaeri et al. (2007) found evidence of a five-factor structure, but in this case, without excluding any items.However, after model comparison, the authors concluded that the original four-factor was a better fit.Both Oliveira et al. (2016) and Ohaeri et al. (2007) found clues of a fifth factor analogous to the Level of Independence factor in the WHOQoL-100.Benitez-Borrego et al. (2014) explored the structure of the WHOQoL-BREF in a sample of Spanish speakers, including analyses of cross-country effects.They found the four-factor structure to be the most appropriate, although the composition, that is, the items within each factor, differed from the original configuration.
QoL is usually described as a hierarchical multidimensional construct composed of four dimensions and a general factor (Lee et al., 2005; or six dimensions, if the questionnaire extended version is used), but fit problems of the WHOQoL-BREF could be due to the assumption of dimensionality.It has been described that a bifactorial structure might be suitable in response to WHOQoL-BREF fit issues.Perera et al. (2018) found that a bifactor structure of QoL provided an excellent fit to the data when compared to a correlated four-factor model and a high-order model.
One of the key uses of bifactor models is to determine the adequacy of a total score and what can be gained by scoring the domains; for instance, whether the items are measuring the specific dimensions they are supposed to measure or whether they are just remeasuring the construct (Rodriguez et al., 2016).If adjustment problems persist, the use of total scores could be an alternative to continue using the WHOQoL-BREF even when the factor distribution does not meet psychometric standards.
All this together suggests that a deeper analysis of the properties of the model could benefit the future use of WHOQoL-BREF with special populations.

Model Modifications
Poor model fit and low explanatory power may indicate that the instrument, in this case, the WHOQoL-BREF, is not effectively capturing the goal construct (Weston & Gore, 2006).Model modifications are an alternative that allows researchers to use instruments despite model fit problems.However, model modifications should be made with caution and only when the modifications are theoretically and practically plausible (MacCallum, 1995).We must consider that the WHOQoL-BREF is a well-known and widely used instrument.Post hoc modifications often result in the estimation of data-based models that affect replicability (Weston & Gore, 2006).To avoid this, modifications should go from the least intrusive modification at the item level, to the factorial level, and finally to the dimensional level.
The Modification Index (MI) or Lagrange multiplier is often used for model modification at the item level.MI provides an estimated value at which the model's chi-square test statistic (x 2 ) decreases if a fixed parameter is added to the model and it is freely estimated (Whittaker, 2012).The process is repeated until the addition of any fixed parameter does not significantly reduce the model's x 2 or until none of the statistically significant potential respecifications are theoretically plausible to include in the model (Bollen, 1989).
To address model modifications at the factorial level, multiple models from the literature and/or exploratory analyzes can be set up to compete considering their ancillary fit indices (Jackson et al., 2009).For example, Benitez-Borrego et al. (2014), Ohaeri et al. (2007) and Oliveira et al. (2016) used a combination of exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) to explore new factorial structures of the WHOQoL-BREF and decide whether a different model could be more suitable.The x 2 assesses the fit between the hypothesized model and data from a set of measurement items (the observed variables; Alavi et al., 2020).The Tucker-Lewis Index (TLI) is the proportion of total information explained by a model (Schermelleh-Engel et al., 2003).The Comparative fit index (CFI) corrects a previous basic-flow index (Bentler, 1992).The Standardized Root Mean Square Residual (SRMR) is the square root of the difference between the residuals of the sample's covariance matrix and the hypothesized model (Hu & Bentler, 1999).Finally, the root mean square error of approximation (RMSEA) is the amount of variance not explained by the model-given degrees of freedom (Steiger, 2007).
At the dimensionality level, bifactor models provide an alternative to hierarchical QoL models (Chen et al., 2006), and they are increasingly being used in the psychological field because they provide researchers with a functional and structural representation, and are a valuable psychometric tool (Hammer et al., 2018;Rodriguez et al., 2016).Bifactor models can be used to determine the relevance of including dimension scores in the analysis or just the total score.
The feasibility of using only a total score or including dimension scores when interpreting the measure is assessed using auxiliary indices.Explained common variance (ECV) is the proportion of all common variance explained by the general factor (QoL in this study), while for specific factors it is ECV_S and computes the strength of the factor relative to all explained variance only of the items loading in that specific factor (Stucky & Edelen, 2014).Item explained common variance (IECV) asses unidimensionality at the item level, as it measures the extent to which an item's response accounts for the variation on the latent general dimension alone (Stucky et al., 2013).Overall reliability is informed by omega (v) and for internal reliability omega hierarchical (OmegaH), and relative omega.OmegaH reflects the percentage of systematic variance in raw scores that can be attributed to individual differences in the general factor.Relative omega is OmegaH divided by Omega.Relative omega for the general factor is the percentage of reliable variance in the model due to the general factor.For specific factors, it represents the proportion of reliable variance in the factor that is independent of the general factor (Dueber, 2017).Replicability (H) is a measure that represents the correlation between a factor and an optimally weighted item composite.The percent of uncontaminated correlations (PUC) is the percentage of the covariance.Together with ECV, PUC influences the parameter bias of the unidimensional solution.

The Present Study
We assessed whether the WHOQoL-BREF is psychometrically suitable to evaluate the QoL in victims of armed conflict in Colombia.We planned the modifications following a ''bottom-up'' analysis, aiming to preserve the original model configuration as much as possible.
We started modifications at the item level, up to the factorial structure, and ended in the dimensionality analysis.To avoid data-driven results, we restrained ourselves from omitting items; and only contemplated the alternative model if it showed meaningful improvement when compared with the original one.
We hypothesize that we will find the same WHOQoL-BREF psychometric problems previously reported in the literature.In addition, alternative models will improve the issues at the item and factor level.Finally, we prevent the dimensionality analysis will lean toward the total score of the WHOQoL-BREFF as a more appropriate way of reporting the QoL of the population studied.

Participants
The WHOQoL-BREF was administered to a sample of 1,139 survivor victims of the Colombian armed conflict.The database was handled using listwise deletion, so we only performed the analysis on participants with complete observations.The total of valid cases was 1,136; 805 (70.86%) females and 331 males.At the time of the study, participants' age and education level were as follows: M age = 42.51,SD = 15.39,range = 16-85; 362 (31.90%) had nonformal education, 378 (33.30%) had primary education, 302 (26.60%) had secondary education, and 94 (8.30%) had higher or technological education.

Ethics
This work is part of a research project on psychological well-being, QoL, and social support in the victims of the armed conflict in Colombia.This project was developed by the Neurocognition Research Group (ref. COL00 52341), in the framework of the project financed by CONADI-Universidad Cooperativa de Colombia (Cod.1800).It was also part of a research project on the assessment of memories and psychological disorders associated to trauma in refugees and victims of war, developed by the UCM Research Group on Eyewitness Testimony (ref. 971672), in the framework of the project financed by Santander-Universidad Complutense de Madrid (PR87/ 19-22576).The current project was approved by these institutions' ethics boards.Participants were asked for informed consent, and they were notified that the questionnaire was anonymous and only aggregated data would be reported in publications.

Materials
Participants were assessed with the abbreviated Spanish version of the WHOQoL developed by WHO.This Likert-type cross-cultural questionnaire asks for satisfaction on 26 items.The first two items (Q1 and Q2) refer to general health and general QoL and they should be analyzed independently according to the WHOQoL-BREF's manual.Participants should answer the other 24 items (from Q3 to Q26) concerning the last 2 weeks of their life.QoL scores are distributed among four domains for interpretation purposes: physical health (pain and discomfort, dependence on medication or treatment, energy and fatigue, daily activity, mobility, sleep and rest, and ability to work); psychological health (positive feelings, spirituality, religion and personal beliefs, thinking, learning, memory and concentration, personal image and appearance, self-esteem and negative feelings); social relationships (personal relationships, sexual activity, and social support); and environment (physical security and protection, physical environment, economic resources, opportunities for information/skills acquisition, home, health and social care, and transportation).The results are reported as total scores and scores by domains.

Procedure
We ran a CFA following the structure suggested by the WHOQoL-BREF's instructions to evaluate the adjustment of the original model structure to the current population.Ancillary fit indices indicated whether we had to proceed with model modifications.
We proceed to cross-load items based on MI score values.Modifications were done sequentially and incrementally starting with the highest MI; we kept it if the model fit improved and checked for new modifications.
We compared the modified four-factor WHO structure with the five-factor structure proposed by Oliveira et al. (2016) and the five-factor structure proposed by Ohaeri et al. (2007).The 6-factor structure proposed by Benitez-Borrego et al. (2014) was originally included but resulted in an unidentified model.
Finally, we compared a hierarchical model of QoL with alternative unidimensional and bifactor models.These analyses will determine whether a total score of the WHOQoL-BREF is more suitable than using the factors as subscales.

Data Analysis
We performed the analyses using RStudio (version 4.0.2.) and we use Dueber's calculator tool (Dueber, 2017) to assess the ancillary indices of the bifactor model.
For cross-loading the items, we sought fixed parameters associated with a large MI value (i.e., larger than a x 2 critical value of 3.84, which corresponds with 1 degree of freedom at an a level of .05) to decide whether they would be theoretically plausible to include in the model and be freely estimated (Whittaker, 2012).We ended up with a modified four-factor WHO's structure.
The decision on what model should be conserved was taken according to fit indices and other statistics, conservation of the latent variables, and the avoidance of dataspecific results during model comparisons.We followed the recommendation of using statistical indices that reflect different properties of the structures as a threshold for model fitting (Kline, 2005).The optimal values of model ancillary indices were considered as comparative parameters to determine which of the competing models is the best option (Bentler, 1990;Bollen & Bauldry, 2011;Hu & Bentler, 1998, 1999;Widaman & Thompson, 2003;Yuan, 2005).Indices cutouts indicate a good fit if the TLI value is 0.97 or superior (Schermelleh-Engel et al., 2003); the CFI value is greater than 0.90 (Bentler, 1992); the SRMR is smaller or close to 0.08 (Hu & Bentler, 1999); and the RMSEA value is less than 0.07 (Steiger, 2007) or 0.06 (Hu & Bentler, 1999).
We assessed dimensionality through ancillary indices as suggested by Dueber (2017).Many IECVs with values .0.80 or 0.85 typically indicate unidimensionality (Stucky & Edelen, 2014).If OmegaH is more than 0.80, total scores can be considered unidimensional (Reise et al., 2013).Replicability values more than 0.80 suggest a well-defined latent variable (Hancock & Mueller, 2001).When ECV and PUC are more than 0.70, the relative bias will be slight and the common variance can be regarded as essentially unidimensional (Rodriguez et al., 2016).If PUC values are \0.80,general ECV values are .0.60, and OmegaH of the general factor is above 0.70, the presence of multidimensionality is not severe enough to disqualify the interpretation of the instrument as unidimensional (Reise et al., 2013).

WHOQoL-BREF Adjustment
Descriptive information of the total sample appears in Table 1.Distribution analyses indicated data close to normal (skewness [20.23 to 0.65], kurtosis [21.00 to 0.63]) for the overall sample.Correlations among items were below 0.51, so they did not raise concerns about overlapping items.

Modification Indices Adjustments
We cross-loaded the item Q8 How safe do you feel in your daily life?(MI = 168.09)from the environment domain to the psychological domain; and Q9 How The correlation between psychological and social relations (r = .79)and psychological and physical (r = .86)reduced slightly.The correlation between physical and social relations dimensions remained as it was (r = .72).
Factor physical reliability remained as it was in the original model (a = .672,v = .659).There were small reliability improvements in the Psychological (a = .746,v = .744),and social relations (a = .630,v = .635)factors.Reliability of the environment factor slightly decreased (a = .709,v = .678).
MI values also suggested a relevant error covariance between Q3 and Q4 (MI = 91.05),between Q24 and Q25 (MI = 44.43),between Q23 and Q24 (MI = 63.85),and between Q26 and Q12 (19.08).No modifications are derived from these, but the error covariances will be addressed later in the discussion.

Model Comparisons
Table 3 summarizes the values of the fit indices of the three models to be compared: A five-factor model (5F; Ohaeri et al., 2007), a reduced five-factor model (5F R ; Oliveira et al., 2016), and the four-factor model (4F) derived from the previous section's analysis.
As observed in Table 3, 5F R showed some improvement in all the indices compared to 5F and 4F.However, we must consider that this model has fewer items, and despite its indices and degrees of freedom improved due to fewer items, chi-square was statistically significant.In addition, we did not find strong enough evidence to exclude Q24, Q25, Q12, Q13, Q14, and Q26 from the model.We decided to dismiss the 5F R model as we aimed to preserve the original model as much as possible and the improvement did not justify leaving that many items out of the model.The 5F showed no improvement over the 4F.

Dimensionality Analysis
QoL is described as a hierarchical multidimensional construct composed of four dimensions and a general factor (Lee et al., 2005).The hierarchical, unidimensional and     Note.WHOQoL-BREF = World Health Organization Quality of Life Brief Version.
bifactor models fit were compared.Table 4 and Figures 1, 2, 3 show a summary of the hierarchical, unidimensional and bifactor models.
As shown in Table 4, the bifactor model showed a better fit than the hierarchical and the unidimensional models; suggesting that the QoL model could be improved if a different dimensionality structure is assumed.
To support the use of the unidimensional model in the evaluation of QoL in the present sample, we consulted bifactor measures pertaining to model-based reliability of the total score and the subscale scores.At the item level, 76.92% of the items showed itemexplained common variance (IECV) above 0.50, and 50% of the items showed IECV values above 0.80 suggesting that items were largely accounted for by variation on the general dimension alone (Stucky et al., 2013).The explained of the factor common variance for specific factors (ECV_S) was for the physical ECV_S = 0.12, for the psychological ECV_S = 0.07, for social relations ECV_S = 0.05, and for the environment factor ECV_S = 0.16.Finally, the general factor QoL explained common variance (ECV) was ECV = 0.61.
Reliability analysis showed that 79.30% of the variance of the total scores could be attributed to the individual differences on the general factor (OmegaH = .793).All the reliable variance of the total scores could be attributed to the general factor of QoL (relative omega = .895).Only around 10% (v = .886 2 OmegaH = .793) of the reliable variance in the total scores could be attributed to multidimensionality caused by the factors.
The construct replicability measure showed that the general factor is a well-defined latent variable, H = 0.88.The factors' construct replicability showed less latent variable definition: physical, H = 0.87, psychological, H = 0.45, social relations, H = 0.34, and environment, H = 0.68.
The PUC was above 0.70 (PUC = 0.75), and ECV was close to 0.70 (ECV = 0.61).We can consider the relative bias as slight and the common variance can be considered as essentially unidimensional (Rodriguez et al., 2016).In addition, since the PUC value was lower than 0.80, the general ECV was greater than 0.60 and the OmegaH of the general factor was higher than 0.70; it is suggested that the presence of some multidimensionality is not severe enough to disqualify the interpretation of the instrument as unidimensional (Reise et al., 2013).

Discussion
Our results suggest that the study of QoL in special populations can benefit from considering this construct as unidimensional.Although the literature refers to QoL as a multidimensional construct composed of four latent variables (Lee et al., 2005); we found evidence of the unidimensionality of QoL in the present sample.We also confirmed some psychometric problems with the WHOQoL-BREF previously reported in the literature.By no means do our results oppose the factorial structure reported in the WHOQoL-BREF manual.We suggest that when the factorial structure does not meet the expected fit for the sample, researchers can still benefit from using total scores.Even when the items are not successfully capturing their specific factors information, or factors are not successfully related to the general construct, the items are still effectively evaluating QoL in the form of total scores.
One thing to consider is whether special conditions affect the configuration of the construct.In other words, whether self-perceived QoL might differ depending on, for example, whether the person experiences prolonged trauma, compared with the experience of the general population.We believe this difference could be reflected in dimensionality variations of the latent variables of the construct.
This study was carried out with a population that, we suspect, might have an atypical perception of QoL.This could be why the QoL model found in the general population does not fit our data.The unidimensional behavior of QoL in this study raises questions about the effectiveness of the use of instruments not initially designed for special populations, particularly considering that the factorial structure is part of the interpretation of the results in the form of subdomains.In general, we recommend considering whether subdomain analyses are applicable when using the WHOQoL-BREF in cases where QoL perception alterations are suspected.
Our results may also be worthy of consideration in future revisions of the WHOQoL-BREF.As we confirmed, the reported problems persist at the item and factor level.Some of these are discussed below.
A concern in this study was obtaining sample-specific results that would affect future replicability in similar samples.Although population-tailored models are valuable for understanding variations within populations, replicability and validity remain critical.In our study, we leaned toward preserving the items and the factorial structure, mainly because our modifications did not justify major modifications of a well-established model.
However, we must highlight that the MI values consistently suggested a relevant error covariance between Q3 and Q4, indicating that there is a systematic variance shared by these items that is not explained by any factor.This matches the results found by Lo´pez Huerta et al. (2017) and Theuns et al. (2010) who suggested that further analysis should consider the contribution of Q3 To what extent do you feel that physical pain prevents you from doing what you need to do? and Q4 How much do you need any medical treatment to function in your daily life? to the QoL model.From all the items originally within the physical (Q10, Q15, Q16, Q17, and Q18) domain, these two are the only ones that explicitly ask about medical treatment and pain.The rest refer to energy and capabilities that may have different origins, not only medical, and may be more associated with QoL in populations without specific health needs.The inclusion of item Q26 How often do you have negative feelings such as blue mood, despair, anxiety, or depression? also appears to be debatable because it showed low loadings in all models.Although psychological disorders are relevant for QoL perception, we must consider that there are many potential QoL dimensions and it is impractical to try to assess all these concepts simultaneously in one instrument (Fayers & Machin, 2001).
Based on modification indices, we chose to load Q8 and Q9 into the psychological and environmental domains.As for how this change would affect the theoretical interpretation of the domains, Q8 asks How safe do you feel in your daily life?And it could also be interpreted as psychological as the ''feeling of safety.''Q9 asks How healthy is your physical environment?The item explicitly asks about the physical environment.However, a ''healthy environment'' could also be interpreted as psychological as in a ''working environment.''We cannot leave aside that some properties of the items may change due to translation, evaluation conditions, or cultural differences (Theuns et al., 2010).
Future studies may approach these item-level issues by testing models that leave them out or modify them.We decided to preserve all the items because validation was not the scope of our study and even items with low loadings might be capturing information about the construct.

Conclusion
The use of instruments validated in healthy populations to evaluate special populations is a common practice in Psychology.Although instruments can benefit from the diversity resulting from this practice, special populations may be misrepresented by those measures.QoL responses refer to subjective perceptions that may be affected by life events, such as prolonged stress and trauma.In conclusion, we appeal to the careful use of measures not designed to assess significantly affected populations.This could be of particular interest to researchers who aim to study and help lessen the psychological impact of war.
An alternative to being able to use validated instruments with model fit problems is to consider whether researchers can benefit from using total scores instead of dimensions that may not reflect the behavior of the construct in special populations.We thoroughly considered models that assumed different dimensionalities and given the persistence of item and factor level issues, we explored the viability of using total scores.As we found, there is evidence that researchers can benefit from using total scores.

Table 1 .
Descriptive Information and Items of the WHOQoL-BREF, N = 1,136.
Note.WHOQoL-BREF = World Health Organization Quality of Life Brief Version.a Skewness.

Table 2 .
Modifications to the Original 4-Factor Model Based on Modification Indices With DWLS as Estimation Procedure.The Goodness of Fit Indices Reported Are x 2 (Chi-Squared), TLI (Tucker-Lewis Index), RMSEA (Root Mean Square Error of Approximation), CFI (Comparative Fit Index), SRMR (Standardized Root Mean Square Residual).

Table 3 .
Summary of the Comparison Between Different Factorial Structures With Diagonally Weighted Least Squares (DWLS) as Estimation Procedure.The Goodness of Fit Indices Reported are x 2 (Chi-Squared), TLI (Tucker-Lewis Index), RMSEA (Root Mean Square Error of Approximation), CFI (Comparative Fit Index), SRMR (Standardized Root Mean Square Residual).

Table 5 .
Standardized Loadings of the Unidimensional Model and the Bifactor Solutions With Diagonally Weighted Least Squares (DWLS) as estimation Procedure.