A validation study of the Norwegian version of the Health Literacy Questionnaire: A robust nine-dimension factor model

Objective: This study aimed to undertake a rigorous psychometric evaluation of the nine-scale Norwegian version of the Health Literacy Questionnaire (HLQ) based on data from a sample of people with psoriasis. Methods: Cross-sectional survey data were collected from 825 adults with psoriasis who previously participated in the Norwegian Climate Heliotherapy programme. To investigate the factorial validity of the Norwegian HLQ, confirmatory factor analyses were carried out using Stata. Results: A highly restricted model fit with no cross-loadings or correlated residuals was acceptable for three of the nine scales (‘Feeling understood and supported by health-care providers’, ‘Appraisal of health information’ and ‘Ability to find good health information’). After minor model adjustments of the other scales, one-factor models were acceptable. All scales showed acceptable internal consistency, with Cronbach’s alpha ranging from 0.71 to 0.87. Except for three items, all items had high to acceptable factor loadings. Conclusions: This study of the Norwegian HLQ replicates the original factor structure of the Australian HLQ, indicating the questionnaire has cogent and independent scales with good reliability. Researchers, programme implementers and policymakers could use the Norwegian version of the HLQ with confidence to generate reliable information on health literacy for different purposes.


Introduction
Fifteen million deaths attributed to non-communicable diseases (NCDs) occur between the ages of 30 and 69 years, and people from all age groups are vulnerable to the risk factors that contribute to NCDs [1]. The UN Sustainable Development Goal 3.4 states that by 2030, the mortality rate from NCDs needs to be reduced by one third [2][3][4]. Hence, taking appropriate action is an urgent matter in order to reduce premature mortality related to NCDs. The World Health Organization (WHO) points to health literacy (HL) as an essential factor in the prevention and control of NCDs throughout all stages of life [1], and defines HL as 'The personal characteristics and social resources needed for individuals and communities to access, understand, appraise and use information and services to make decisions about health. Health literacy includes the capacity to communicate, assert and enact these decisions' [1,5].
Low HL has been reported to be associated with increased mortality, hospitalisation, lower use of preventive health-care services, poor adherence to described medication, difficulty communicating with health professionals and poorer knowledge about disease processes and self-management skills among people with NCDs. Furthermore, research also suggests that strengths and weaknesses in HL abilities may explain observed health inequalities among people of different races and with different educational attainment [6]. These results are based on the use of different HL measures with variation in content and psychometric robustness across populations and cultures. However, in order to describe the possible HL challenges within and across groups, cultures and populations accurately, and to develop and evaluate the effect of appropriate interventions, it is crucial to have high-quality HL instruments that have undergone modern and robust psychometric testing [7].
While many different HL measurements exist, a main feature of the measurements is whether they are subjective or objective in scope. In objective measurement, people are challenged by standardised test stimuli to measure an underlying trait. In subjective measurement, people self-report their responses to questions about their experiences and skills. Furthermore, HL measures can be characterised as uni-or multidimensional. However, despite promising recent work developing tools in tandem with definitions, the disconnect between definitions and what the tools measure has been a persistent conceptual stumbling block, which has led to several conundrums that will need to be solved for the field to progress in a coherent manner [7].
There is a need for robust HL instruments to be used for health promotion, the prevention of diseases and in disease management in research and practice. The Health Literacy Questionnaire (HLQ) is a generic, subjective, multidimensional instrument (nine domains) measuring HL, developed in Australia by Osborne et al. [6]. The questionnaire was developed using a 'validity-driven' instrument development approach and is based on the definition of HL from the WHO noted above. The tool seeks to detect a wide range of HL needs of people in communities and to be used for a variety of purposes, from describing the HL needs of the population in health surveys through to measuring outcomes of public health and clinical interventions designed to improve HL. The HLQ has been translated and adapted for application in many countries and underlies the work done across WHO National Health Literacy Demonstration Projects (NDPHL) across europe and in other regions [8].
The HLQ has shown psychometric robustness in the original development process (english language) where nine conceptually distinct areas of HL are defined to assess the needs and challenges of a wide range of people and organisations. Across the european versions of the HLQ (France, Denmark, Germany and Slovakia), confirmatory factor analysis (CFA) has been performed and support the initial nine-factor model [9][10][11][12][13]. These diverse studies with disparate patient populations and health-care systems provide strong evidence that the HLQ captures multiple dimensions of HL. The HLQ was translated and culturally adapted into the Norwegian language following a standardised protocol [14]. Data from application of the HLQ in Norway are now available for validity testing, and the present paper describes the psychometric properties of the Norwegian HLQ using robust CFA procedures in a large population of patients with psoriasis.

Design and sample
Data from a cross-sectional study of 825 adults with psoriasis who had previously participated in the Norwegian Climate Heliotherapy (CHT) programme was used to investigate the psychometric properties of the HLQ. The CHT is one of the therapeutic options available to Norwegian patients with moderate to severe psoriasis. A three-week multidisciplinary programme is provided in the Canary Islands (located in the Atlantic Ocean at 28°N, 16°W), which includes tailored sunlight ultraviolet (UV)b radiation, physical exercise, group discussions and comprehensive education.
A total of 1275 individuals (all patients participating in the CHT programme from 2011 to 2017) were invited to participate in the study. The data collection period was from march to August 2017. Invitations were sent by post, together with a consent form and the survey. At six weeks following the initial post, a reminder letter was sent. Of the 1275 individuals, 825 (65%) returned a completed survey.
In addition to HLQ data, the present paper draws on self-reported data on sociodemographic information (age, sex, marital status and education) and selfreported health status, co-morbidities, number of CHT treatments and duration of the disease. reports on associations between these variables and HL, based on the same sample, are published elsewhere [15].
The regional Committee for medical research ethics for Southern Norway (ID 2016/1745) approved the study. In addition, the administrative leaders of the Section for Climate treatment and the Centre for Privacy and Information Security at Oslo University Hospital approved the study. The study was conducted in accordance with the Declaration of Helsinki.

The HLQ
The HLQ comprises 44 items representing nine independent HL domains. For further information on the development and the questionnaire, see Osborne et al. [6]. The domains and content descriptors are summarised in Table I. each domain comprises four to six items. The first five domains are scored using response options indicating the level of agreement, ranging from 1=strongly disagree to 4=strongly agree. Four domains report on the capabilities, from 1=cannot do or usually difficult to 5=very easy). The domain scores are calculated as the average of the item scores, with higher scores indicating better HL.

Data analysis
Analyses were carried out using Stata v16 (StataCorp, College Station, TX). Descriptive analyses were carried out to describe the sample. Further, Cronbach's alpha was calculated to investigate each scale's internal consistency.
In order to investigate the factorial validity of the Norwegian version of the HLQ, CFAs were performed. As the structure of the original HLQ has been described previously [6], the factor structure was specified a priori. Consequently, confirmatory analyses were done exclusively. Overall, the amount of missing data per HLQ item was low (1.23-3.13%), with an average of 3.33% of respondents having missing values on one of the nine HLQ scales. CFA models were fitted in Stata v16. For model estimation, maximum likelihood was applied -an estimator that is appropriate for use with ordinal data as is the case with the HLQ. model evaluation was based on chi-square tests for model fit and further model fit indices, including the root mean square error of approximation (rmSeA), the comparative fit index (CFI), the Tucker-Lewis index (TLI) and the standardised root mean square residual (Srmr). For model fit to be interpreted as 'acceptable', a rmSeA of <0.05 was considered a close fit, while a rmSeA and a Srmr of up to 0.08 were considered acceptable. Comparing the fit of a target model to the fit of an independent or null model, the CFI has a cut-off for good fit CFI Furthermore, the correlations of residuals to improve model fit when fitting the nine one-factor models were considered. Correlated residuals <0.2 were considered acceptable when fitting the models [16,17]. Potential model adjustments were based on modification indices as provided in the Stata output using the 'estat gof, stats (all)' command.
To obtain a clearer idea of the data and potential problematic items, a full nine-factor model and nine one-factor models were fitted to the data. To test whether modifications, in terms of correlated withinfactor residuals, led to significant model improvement, modification indices were obtained using the 'estat mindices' command in Stata.

Results
Sample characteristics are shown in Table II. The total sample consisted of 47.3% women. Age ranged from 21 to 83 years, with a mean age of 53.3 years (SD=12.4 years). About 61% of respondents had only formal education of up to 13 years, which is the universal benchmark in Norway.
The median disease duration was 27 years, ranging from 0 to 77 years. mean self-assessed health was 3.33 on a scale ranging from 1 to 5, with a higher number indicating better health. The mean number of co-morbidities was 4.4 (SD=2.5).
All factor loadings were high to acceptable (i.e. >0.5; see column 'Standardised factor loading' in Table III Table  III). In order to understand these results further, the authors (A.K.W., Å.H. and m.H.L.) compared the original english items (mentioned above) with the Norwegian items, and examined them again with regard to meaning equivalence at the item and phrase level, with reference to the original english item intent. We could not detect any misunderstandings in the Norwegian wordings compared to the english content. One explanation could be that in Norway, it is somewhat uncommon to ask someone other than your GP for advice and guidance about personal health matters. Consequently, this may have resulted in somewhat lower observed factor loadings.
When fitting the one-factor models, correlated residuals were sequentially added to respective models, which improved each model fit significantly. Table IV shows the results of the CFA separately for the nine individual HLQ scales.
While model fit without modifications was acceptable for 'Feeling understood and supported by healthcare providers', 'Appraisal of health information' and 'Ability to find good health information', one or more correlated residuals were observed in the remaining six scales. Correlated residuals were <0.2 for the majority of the model adjustments, with the exception of one model adjustment (-0.228) in domain 4 ('Social support for health'), three adjustments (0.316, -0.273 and -0.205) in domain 7 ('Navigating the health-care system') and one adjustment (0.280) in domain 9 ('Understanding health information well enough to know what to do'). After respective model adjustments, the one-factor models were acceptable. All nine HLQ scales showed satisfactory to good internal consistency, with Cronbach's alpha ranging from 0.71 to 0.87 (see Table IV).

Discussion
This study of the Norwegian version of the HLQ supports the original Australian nine-dimension HLQ using robust CFA procedures (both full-model and one-model approaches). The model fit (rmSeA and Srmr) for the full nine-factor model is acceptable, although the CFI and TLI is just below 0.85, which is somewhat lower than the original Australian version [6]. However, results showed that after respective model adjustments for some of the scales, the factor models were acceptable. The factor loadings were high to acceptable for almost all items, and the Cronbach's alpha showed acceptable internal consistency across all scales. Although some of the correlated residuals for model adjustments are just above 0.20, they are small and should not indicate a problem with item-factor loadings. The proportion of non-response to items was small (<3.13%), suggesting that items were well understood and had acceptable content. Ahead of our study, the Norwegian version was translated, applying rigorous linguistic and cultural adaptation methods in order to produce a high-quality Norwegian version [14,18]. Consequently, the Norwegian version of the HLQ appears to be robust and might serve as a good foundation for the measurement of HL in Norwegian populations.
Our results are also in accordance with similar studies from other countries investigating the HLQ structure. Despite differences in samples and statistical programmes/procedures, the Danish, German, Slovakian and French versions of the HLQ all demonstrated comparable results [9][10][11][12][13] with regard to the nine-factor model. Despite some possible overlap between scale 7 ('Navigating the health-care system') and scale 8 ('Ability to find good health information') in part 2 of the questionnaire, the HLQ appeared relatively robust. The reason for the good fit found across studies is likely a result of the way the HLQ was developed, translated and adapted [6]. The thorough validity-driven development approach included concept mapping workshops and interviews to identify conceptually distinct domains. Questionnaire items were developed directly from consultation data following a strict process aiming to capture the full range of experiences of people currently engaged in health care through to people in the general population. The psychometric analysis included CFA and item response theory. Cognitive interviews were used to ensure that questions were understood as intended. Items were tested in a calibration sample from community health, home care and hospital settings and then in a replication sample comprising recent emergency department attendees [6].
When adapting the questionnaire into a language-specific context, such as Norwegian, the procedure of translation and adaption has been highly standardised across countries, including transparent translation steps, the use of item intents and cognitive interviews [6,14]. The development of the HLQ reported in 2013 gave rise to global interest. In a follow-up study of the psychometric properties of the HLQ including respondents from a diverse range of community-based organisations, the instrument was found to be highly reliable. All HLQ scales were found to be homogenous, and the factor structure of the HLQ was replicated. With a small number of exceptions involving no invariance of factor loadings, strict measurement invariance was established across the participating organisations and sex, language background, age and educational level of respondents [19]. A study evaluating the measurement properties of the HLQ using rasch analysis among older adults presenting to an emergency department after a fall showed that all nine scales of the HLQ were unidimensional, with good internal consistency [20]. Given this validity-driven approach, the HLQ is likely to be useful in surveys, intervention evaluation and studies of the needs and capabilities of individuals across countries, ages, sex, education and cultures. Today, the HLQ is used in many studies across countries and health contexts, and gives valuable information on HL from a broader and more in-depth perspective compared to traditional and functional measures of HL [15,[21][22][23][24][25][26][27][28][29][30]. Globally, directed by the WHO, and in many countries across the world, including Norway, HL has become an important key to solving health challenges in communities and countries. However, to be able to act on HL, policymakers and clinicians need meaningful and valid tools that generate data to identify HL needs, challenges and strengths across settings. based on the extensive rigorous and ongoing validity testing behind the HLQ, there is mounting evidence that in Norway, the HLQ will provide policymakers, clinicians and researchers with good information for public health and policy decisions. As noted in the introduction, the instrument is useful for both health promotion and disease management settings, since the interpretation of the responses of HLQ can be salutogenic (higher HL is a promotional resource to be strengthened/ maintained for health) and pathogenic (lower HL is associated with risks for disease and needs action). Health-care professionals or health workers may use the instrument to explore what HL supports a wide range of the target groups might need.
Validity testing of instruments such as the HLQ is comprehensive and ongoing work, including qualitative and quantitative techniques in order to build evidence. The present study focuses on psychometric properties of the Norwegian version of the HLQ and includes a large number of respondents (N=825), the use of robust CFA procedures and a well-developed original questionnaire. A weakness maybe the homogenous sample, including only individuals with psoriasis. While the patient group may not represent the broad Norwegian population, participants had an average of more than four other comorbidities and may well be similar to common groups with chronic conditions. The response rate was reasonable high (65%), with good variation with regard to sociodemographic characteristics, and participants were situated in different parts of Norway. Although the present study supports the use of the HLQ in Norway, studies in other NCD groups and in the general population are warranted.

Practice implications
There is mounting evidence that supports the interpretation and use of the HLQ in Western cultures, including Norwegian and other european settings. Our study builds upon growing evidence of the measurement properties of HLQ and that it is a valuable tool for researchers, programme implementers and policymakers to use to explore HL in diverse populations.