Measuring psychosocial factors in health surveys using fewer items

The present study investigated the possibility of reducing length of psychosocial scales, while maintaining validity, using easily manageable techniques. Data were collected 2003–2004 in a Swedish general population; n = 1007, ages 45–69, 50% women. Eight psychosocial scales were reduced from 6–20 to 3–7 items maintaining Cronbach’s alpha >0.7 and correlation coefficients between full and reduced scales > 0.85. Relationships to biomarkers for inflammation, self-rated health and 8-year incidence of coronary heart disease showed no difference between full and reduced scales. It was possible, using these easily manageable methods, to reduce scale length without threatening validity for use in population surveys.


Background
There is a global trend of increased usage of health questionnaires (scales), both in population health surveys and in health care systems, as patient-reported measures. In multipurpose surveys with a need to use several scales on different issues, such as psychosocial factors and health status, these surveys tend to be extensive. Long questionnaires with high respondent and administrative burden have a greater risk of measurement error due to mistakes or 'satisficing', that is, respondents finding shortcuts for completing the questionnaire, for example by choosing the same response option for all items, or choosing 'don't know' if such an option is available (Krosnick, 1991;Peytchev and Peytcheva, 2017).
In surveys where a combination of many scales is required to cover all relevant topics, these different scales and all other measures will compete for space in the survey. In such situations, the choice may be between using a short form of a scale or omitting the scale completely. This has led to a request for shorter scales. For some psychosocial and health status scales there are no official short forms, while for others the existing short forms may not be short enough. Sometimes, researchers have instead produced customised short forms to be used in specific contexts (Lundberg and Nyström Peck, 1995;Schumann et al., 2003) or simply employ single-item screening (Reme et al., 2014).
There is often a reluctance to carry out official updates of established scales. There can be many reasons for this, for example, scales may be seen as a gold standard, or be used for direct comparisons with previous studies that used the scales. People may also refrain from creating official short forms or revising a scale since a scale needs to be validated in every new form and for every new context it is used in, and this validation procedure may be considered time consuming (De Vet et al., 2011). Established scales are therefore perhaps not challenged as often as would be beneficial. For example, a scale used in a study conducted 20-30 years after the scale was developed may need to be modernised or revised to fit the language, people and society of today. Though modern, more advanced methods of scale construction may theoretically be the best choice, they require resources not generally available. Hence, less resource-intensive methods and easily manageable technique also need to be available (Kleka and Soroko, 2018).
The aim of the present study was to evaluate the possibility of reducing the length of some legacy psychosocial scales while maintaining validity, using an easily manageable and available technique.

Study population
The present study is based on the study population from the longitudinal study 'Life conditions, Stress and Health' (LSH), which investigates to what extent psychosocial factors can explain socioeconomic differences in the risk of coronary heart disease (CHD), and if psychobiological pathways can mediate these associations. Therefore, a large range of psychosocial instruments was included in the LSH study. The participants, aged 45-69 years, were randomly selected from the general population and invited consecutively to reach a study population size of n = 1000, evenly distributed by age and sex. All citizens in the given age range and living in any of the catchment areas of ten primary health care centres in the County of Östergötland in south east Sweden at the time of enrolment were eligible for invitation. The participation rate was 62.5%, comprising 502 men and 505 women. Informed written consent was obtained from all individual participants included in the study.
The participants visited their primary health care centre in late 2003 and early 2004 for a brief health examination and for the collection of morning blood samples in a fasting state. In addition, several health questionnaires and psychosocial scales were answered. The sample was representative of the population in terms of educational attainment, employment rates, and immigrant status . In 2012 an 8-year registry follow-up study of determinants for incident CHD was performed (Lundgren et al., 2015).

Psychosocial scales
Several different legacy psychosocial scales were included in the present study, covering both intrinsic psychological resources and psychological risk factors (known to be associated with risk of somatic disease, for example CHD), and extrinsic social factors (social support), which together covered a broad spectrum of psychosocial domains. Hence, when referring to the scales as a group in the present study, the term psychosocial is used.
Selected scales in the present study are: (1) Psychological resources: Mastery, Self-Esteem, Sense of Coherence (SOC), Availability of Attachment (AVAT). Below follows a brief description of the scales.
Mastery. Mastery aims to capture the ability to cope with 'social experiences that adversely penetrate people's emotional life', and includes items concerning the extent to which one regards one's life chances as being under one's control in contrast to being fatalistically governed (Pearlin and Schooler, 1978).
Self-esteem. Self-esteem is based on Pearlin's adaptation of Rosenberg's Self-Esteem Scale and aims to capture the positiveness of one's attitudes towards oneself (Pearlin and Schooler, 1978).
Sense of coherence. SOC describes a perception that life is seen as meaningful, comprehensible and manageable, which has been shown to contribute to good health despite harsh conditions sometimes present in daily life (Antonovsky, 1987). SOC is available in several versions; in the present study, the 13-item version was used.

CES-D.
The CES-D scale was designed to measure depressiveness in a normal population, capturing the major components of depressive symptomatology with an emphasis on the affective component, and is not to be used as a diagnostic tool for clinical depression (Radloff, 1977).
Vital exhaustion. Vital exhaustion measures a person's perception of both mental and physical fatigue. A slight modification of the Maastricht Vital Exhaustion Questionnaire (Appels et al., 1987), based on 19 items instead of the original 21, was used in the present study.
Cynicism. Cynicism is one of the subsets derived from the Cook-Medley Hostility Scale (Barefoot et al., 1989). The items are focused on perception of others' behaviour in everyday life, and aim to capture a negative view of humankind, viewing people in general as unworthy, deceitful and selfish.
AVSI and AVAT. These two scales have their origin in the comprehensive Interview Schedule for Social Interaction (ISSI). AVSI concerns the number of people that the respondent interacts with in daily life during a week and their social network, that is, addresses the quantity of social ties. AVAT (also called emotional attachment or emotional support) concerns what a person's social network provides in terms of emotional support, that is, addresses the quality of social ties (Undén and Orth-Gomér, 1982).

Item reduction procedure
To shorten the psychosocial scales, a reduction procedure was performed based on internal consistency analyses, removing one item at a time. After re-coding of reversed items, Cronbach's alpha coefficients were calculated for each full scale, and thereafter for all reduced versions of each scale after the removal of one item (e.g. a full scale with 10 items yielded 10 different nine-item scales). For each scale, the reduced version that received the highest Cronbach's alpha went on to the next step, where the procedure was repeated (i.e. the nine-item scale from the example above with the highest alpha value was used to yield nine different eight-item scales). Repetition commenced as long as the reduced scales retained Pearson intercorrelation coefficients ⩾0.85 with the full scales and Cronbach's alphas ⩾0.70, and the absolute difference in Cronbach's alpha compared with the full-scale was <0.1. The minimum number of items left in a reduced scale was set to three.

Validation measures
Biomarkers. Two biomarkers of inflammation, Interleukin-6 (IL-6) and Matrix metalloproteinase-9 (MMP-9), with known relationships to psychosocial factors (Marteinsdottir et al., 2017), were used in this study. Blood samples were collected from the participants during their visit to their primary health care centre and stored in a biobank. Biochemical analyses were performed as described in Marteinsdottir et al. (2017). In short, IL-6 and MMP-9 were measured in EDTA plasma using the Luminex® 100™ System and the Biotrak ELISA system, respectively. IL-6 and MMP-9 values that were three standard deviations higher than the mean (mean calculated after initial exclusion of any extreme outliers) were excluded prior to analyses.
Self-rated health. Self-rated health (SRH) was measured using the first item from the well-known generic healthrelated quality of life scale, SF-36 (Sullivan et al., 1995;Ware and Sherbourne, 1992): 'In general, would you say your health is. . . excellent/very good/good/fair/poor'. In the following analyses, the item responses were dichotomised into high (excellent/very good/good) and low (fair/poor).
Eight-year incident CHD. Incident CHD was defined as fatal or nonfatal myocardial infarction and/or an event of symptomatic angina pectoris requiring invasive coronary revascularisation (percutaneous cardiac intervention or coronary artery bypass graft surgery). CHD data for the LSH study population during the first 8 years from baseline were obtained from the National Causes of Death Register and the National Patient Register, both from the Swedish National Board of Health and Welfare.

Validation procedure
Construct validity (De Vet et al., 2011) was tested through two hypotheses relating to the reduced scales: (1) the reduced scales would show the same multiple linear regression coefficients regarding biomarkers as the full scales and (2) the reduced scales would receive similar odds ratios to the full scales in logistic regression analyses with SRH (low/ high; baseline data) or incident CHD (yes/no; 8-year followup data) as the outcome.
All statistical analyses were performed in Stata version 12 (StataCorp, College Station, TX).
The study design was approved by the regional Ethical Review Board, Linköping, Sweden (02-0324). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Results
Study population characteristics can be seen in Table 1. Six people declined having their data retrieved from the Swedish National Health Data Registries (in the present study this pertained to data on incident CHD), and were excluded from analyses in the present study. The mean age of the study population was 57 years. The full psychosocial scales comprised between six and 20 items, and the reduced scales between three and seven items ( Table 2). The longer the full scale, the larger the number of items removed. Cronbach's alpha values for the full scales varied between 0.75 and 0.93, and for the reduced scales between 0.72 and 0.87. Pearson intercorrelation coefficients between mean values of full and reduced scales varied between 0.86 and 0.98 (Table 2).
Construct validity was tested through two hypotheses and, as hypothesised, no significant differences could be found between full and reduced scales in (1) their relationships to biomarkers of stress (Table 3), nor in (2) their relationships to SRH and incident CHD (Table 4). Regression coefficients (Table 3) were in the expected direction (negative regarding psychological resources/social support and positive regarding risk factors), that is, a higher biomarker level is non-beneficial for health. However, for the biomarker IL-6, regression coefficients were all quite low. For AVAT, the coefficient of -0.46 for full scale AVAT is seemingly quite different from +0.07 for the reduced scale regarding MMP-9, but the difference was not statistically significant. Odds ratios (Table 4) were consistently >1 for psychological resources/social support versus higher SRH, as well as for risk factors versus incident CHD; and <1 for risk factors versus higher SRH, as well as for resources/ social support versus incident CHD. Cynicism had odds ratios of about 1 versus incident CHD, but this was similar for the full and reduced scales.

Discussion
The present study of a middle-aged general population found that it was possible to reduce the number of items in the eight tested psychosocial scales in the given context, using an easily manageable technique, while maintaining the validity of the scales. All relationships found in the construct validity tests were in the directions hypothesised.
The rationale behind the present study was the need, in comprehensive surveys, to be able to investigate several different psychosocial domains even with limited survey space. Previously published results from the LSH study have shown that the psychosocial scales are only lowly to moderately intercorrelated, with domains representing distinctive and complementary constructs. This is also demonstrated by the different relationships found to physical, social and mental dimensions of the SF-36, and to health behaviours. (Festin et al., 2017;Nilsson and Kristenson, 2010;Thomas et al., 2020). However, it would be reasonable to assume that the scales sometimes might be used interchangeable, depending on the scope of the research. To conclude, being able to include several specific psychosocial domains is very important. Choosing only one or two scales could lead to a loss of vital information on other specific psychosocial domains, and using scales as proxy measures for other psychosocial domains could, in the worst case, lead to wrong conclusions.
Several authors have discussed that a large number of items on a scale will not necessarily ensure better psychometric properties, except for high Cronbach's alpha values, and furthermore that a (very) high Cronbach's alpha may in fact indicate a high level of item redundancy (Boyle, 1991;Streiner et al., 2015). This may be especially apparent in older scales that were developed when questionnaire length and survey fatigue were not such crucial issues as they are today.
Cronbach's alphas for all the full scales were above, or even well above, the recommended level of 0.70 for use at group level (De Vet et al., 2011), hence item reduction was  an alternative for all scales. Since the mathematical structure of the Cronbach's alpha algorithm leads to longer scales receiving higher alpha values (De Vet et al., 2011;Sijtsma, 2009), it was expected that values in scales with many items removed would drop the most. This was also the case except for CES-D, where the alpha value even increased slightly. This could be due to the fact that among the discarded items were all of the positively worded items, which we in an earlier study had found to have low correlation to both total CES-D score and to the negatively worded items, as well as to health outcomes (Lundgren et al., 2015). Hence, the removal of these items probably lead to greater homogeneity amongst the remaining negatively worded items. This assumption is supported by the result of analysis of correlation between full and reduces scales after deleting these four items from the full CES-D scale, where the correlation is r = 0.90 instead of 0.86. The item reduction in the present study was performed on a mathematical basis. If the purpose of the present study had been to create official short forms, respondents' views on what is important for them would have informed the choice of items, giving highest priority to items that respondents find relevant and easy to understand and respond to (Nilsson et al., 2007). However, the purpose of the present study was to use a method that is easily available to most researchers.
A limitation of the present study is that the respondents only answered full scales. There is a risk that respondents will interpret and answer individual items in a scale differently in the presence or absence of other items. The study design did not give the respondents the chance to answer any reduced scales per se, that is, the present study does not allow for evaluations of possible context effects. Nor does it allow for a test-retest (reliability test) or a test of longitudinal validity (responsiveness, ability to capture all relevant changes). As the context of the present study is psychosocial determinants for CHD by the means of a multipurpose survey and use of data at group level, the validity of using reduced scales for other contexts has not been tested, especially not for use of data at an individual level. While the present study group, drawn from the general population, is a relatively healthy population (as general populations often are), the distribution of scale scores in the study population still covers the possible scale score distributions fairly well (Table 2). One final limitation might be that the generally low regression coefficients for both full and reduced scales regarding IL-6 make it difficult to draw firm conclusions.
A strength of the present study is the range and high number of characteristics and variables in the LSH cohort which makes this very beneficial for validation studies, especially the possibility of using not only other self-report data (SRH), but also biomedical data which are cross-sectional (IL-6, MMP-9) as well as prospective (CHD outcome 8 years later).

Conclusion
In conclusion, we found that it was possible to reduce the number of items in psychosocial scales without threatening the validity for use at group level in population surveys, using an easily manageable technique. See supplementary material for Measuring psychosocial factors in health surveys using fewer items

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental material
Supplemental material for this article is available online.