Sense of Purpose in Life and Subsequent Physical, Behavioral, and Psychosocial Health: An Outcome-Wide Approach

Purpose: Growing evidence indicates that a higher sense of purpose in life (purpose) is associated with reduced risk of chronic diseases and mortality. However, epidemiological studies have not evaluated if change in purpose is associated with subsequent health and well-being outcomes. Design: We evaluated if positive change in purpose (between t0; 2006/2008 and t1;2010/2012) was associated with better outcomes on 35 indicators of physical health, health behaviors, and psychosocial well-being (at t2;2014/2016). Sample: We used data from 12,998 participants in the Health and Retirement study—a prospective and nationally representative cohort of U.S. adults aged >50. Analysis: We conducted multiple linear-, logistic-, and generalized linear regressions. Results: Over the 4-year follow-up period, people with the highest (versus lowest) purpose had better subsequent physical health outcomes (e.g., 46% reduced risk of mortality (95% CI [0.44, 0.66])), health behaviors (e.g., 13% reduced risk of sleep problems (95% CI [0.77, 0.99])), and psychosocial outcomes (e.g., higher optimism (β = 0.41, 95% CI [0.35, 0.47]), 43% reduced risk of depression (95% CI [0.46, 0.69]), lower loneliness (β = −0.35, 95% CI [−0.41, −0.29])). Importantly, however, purpose was not associated with other physical health outcomes, health behaviors, and social factors. Conclusion: With further research, these results suggest that sense of purpose might be a valuable target for innovative policy and intervention work aimed at improving health and well-being.

Cognitive function problem. The HRS cognitive function assessment 4,5 was adapted from the modified Telephone Interview for Cognitive Status (TICS-M). The assessment is a 27-point scale that included an immediate and delayed 10-noun free recall test, a serial 7 subtraction test, and a backward count 20 test. This assessment tool has been shown to have high sensitivity and specificity for cognitive impairment in older adults; cutpoints were derived from previous research conducted on cognitive impairment in HRS. 6,7 Respondents scoring 0-11 on the 27-point scale were classified as having "cognitive impairment," while those scoring ≥12 were classified as "normal" (the reference group). More detailed information about the cognitive assessments can be found in HRS reports. 4,5 Physical functioning limitations. Physical functioning limitations were assessed using items adapted from scales developed by Rosow and Breslau (1966), Nagi (1976), Katz, Ford, Moskowitz, Jackson, and Jaffe (1963), and Lawton and Brody (1969). [8][9][10][11] Participants were defined as having physical function limitations if they reported >4 limitations with physical functioning (i.e., walking several blocks, climbing one flight of stairs, pushing or pulling large objects, lifting or carrying 10 pounds, getting up from a chair, reaching or extending arms up, stooping, kneeling, or crouching, sitting for 2 hours) or activities of daily living (i.e., walking across a room, dressing, eating, bathing, getting in/out bed, using the toilet, picking up a dime).
Those reporting <4 limitations were considered "normal" in the physical function domain and served as the reference group. This criterion was determined by identifying the physical function score where 75% of participants could be considered as having healthy physical function at baseline. running, swimming, aerobics), moderate (e.g., gardening, dancing, walking at a moderate pace), and light (e.g., vacuuming, laundry) activities over the past 12 months. Response categories included daily, >1x/week, 1x/week, 1-3x/month, hardly ever or never.
Sleep problems. Participants completed the 4-item Jenkins Sleep Questionnaire, a validated and widely used screening instrument for sleep complaints, querying insomnia symptoms. 14 Response categories included "most of the time," "sometimes," and "rarely or never." Having sleep problems was defined as reporting: "most of the time" for any of the three negatively worded items (e.g., "How often do you have trouble falling asleep?") and "rarely or never" to the one positively worded item (i.e., "feel really rested when you wake up in the morning").
Participants were considered unhealthy (i.e., having sleep problems) if they reported one or more sleep problems. The sleep questionnaire was only administered every other wave. Thus, sleep data was imputed for half of the sample. Imputed and complete-case analyses showed similar estimates.

Psychological well-being
Positive affect. Positive affect was measured (in 2006 only) with a 6-item scale originally developed for use in the Midlife in the United States Study. [15][16][17] The scale assessed how often the participant felt "cheerful," "in good spirits," "extremely happy," "calm and peaceful," "satisfied," and "full of life" over the past 30 days. Response categories ranged from 1 (all of the time) to 5 (none of the time). Responses were reverse scored, so that a higher score indicated higher positive affect. An overall score was derived by averaging responses across all 6 items (α=0.91 in 2006, range=1 to 5). After the 2006 wave, the HRS switched to a more expansive measure of positive affect based on the Positive and Negative Affect Schedule (PANAS-X). 18 It included the following 13 items: determined, enthusiastic, active, proud, interested, happy, attentive, content, inspired, hopeful, alert, calm, excited. An overall score was derived by averaging responses across all 13 items (α=0.92, range=1 to 5). A limitation of this study is that affect was measured in a different way during only the first wave of the study. However, scores were standardized and both the prior and current measures of affect operate very similarly (e.g., similar correlations with other variables, similar distributions, etc.).

Life satisfaction.
Life satisfaction was assessed with the 5-item Satisfaction with Life Scale (e.g., "In most ways my life is close to ideal"). 19 The scale has shown excellent psychometric properties in prior work. Response categories ranged from 1 (strongly disagree) to 7 (strongly agree). An overall score was derived by averaging responses across all 5 items, with a higher score indicating higher life satisfaction (α=0.88, range=1 to 7).

Optimism.
Optimism was assessed using the Life Orientation Test-Revised (LOT-R). The measure has good discriminant and convergent validity, and good reliability. 20 Using a 6-point Likert scale (from 1 (strongly disagree) to 6 (strongly agree)), participants were asked the degree to which they agreed with six statements such as, "In uncertain times, I usually expect the best." After reverse coding negatively worded items, all items were averaged together to create a composite score, with higher scores indicating higher optimism (α=0.75, range=1 to 6). Lachman and Weaver (1998), and rated on a scale from 1 (strongly disagree) to 6 (strongly agree). The measure has good discriminant and convergent validity, as well as good reliability. 21 Participants were asked the degree to which they agreed with five statements such as, "I can do just about anything I really set my mind to." All items were averaged together to create a composite score, with higher scores indicating higher mastery (α=0.90, range=1 to 6).

Health mastery.
On a 0 to 10 scale where 0 means "no control at all" and 10 means "very much control," participants were asked, "how would you rate the amount of control you have over your health these days?" Financial mastery. On a 0 to 10 scale where 0 means "no control at all" and 10 means "very much control," participants were asked, "how would you rate the amount of control you have over your financial situation these days?"

Psychological distress
Depressive symptoms and depression. Depressive symptoms over the past week were measured using the 8-item Center for Epidemiologic Studies Depression Scale (CESD) 22 (e.g., "Much of the time during the past week, I felt depressed"), and response options included "yes" or "no" for each item. Following HRS protocol, an overall score was derived ranging from 0 to 8, with a higher score indicating higher depressive symptoms. The scale has been previously validated in the Health and Retirement Study 23 and showed high reliability in this sample (α=0.80).
Following prior work, 23 participants with a score of ≥4 were considered as having significant depressive symptoms, or depression. Prior work suggested that the cutoff of 4 would produce comparable results as the 16 symptoms cutoff when using the full 20-item CESD scale. 23 No depression was the reference group.
Hopelessness. Hopelessness was measured with 4 questionnaire items from two previously validated scales 24,25 (e.g., "I feel it is impossible for me to reach the goals that I would like to strive for", "The future seems hopeless to me and I can't believe that things are changing for the better"). Response categories ranged from 1 (strongly disagree) to 6 (strongly agree). An overall score was created by averaging the responses across all items (α=0.87, range=1 to 6).

Negative affect.
Negative affect was measured (in 2006 only) with a 6-item scale originally developed for use in the Midlife in the United States Study. [15][16][17] The scale assessed how often the participant felt "so depressed that nothing could cheer you up," "hopeless," "restless or fidgety," "that everything was an effort," "worthless," and "nervous" over the past 30 days. Response categories ranged from 1 (all of the time) to 5 (none of the time). Responses were reverse scored, so that a higher score indicated higher negative affect. An overall score was derived by averaging responses across all 6 items (α=0.86 in 2006, range=1 to 5). After the 2006 wave, the HRS switched to a more expansive measure of negative affect based on the Positive and Negative Affect Schedule (PANAS-X). 18 It included the following 12 items: afraid, upset, guilty, scared, frustrated, bored, hostile, jittery, ashamed, nervous, sad, distressed. An overall score was derived by averaging responses across all 12 items (α=0.89, range=1 to 5). A limitation of this study is that affect was measured in a different way during only the first wave of the study. However, scores were standardized and both the prior and current measures of affect operate very similarly (e.g., similar correlations with other variables, similar distributions, etc.).

Perceived constraints.
Perceived constraints were assessed with 5 items derived from Lachman and Weaver (1998), and this measure has good discriminant and convergent validity, as well as good reliability. 21 Using a 6-point Likert scale (from 1 (strongly disagree) to 6 (strongly agree)), participants were asked the degree to which they agreed with statements such as, "What happens in my life is often beyond my control." All items were averaged to create an overall score, with higher scores indicating a higher sense of constraints on personal control (α=0.86, range 1-6).

Social factors
Loneliness. Loneliness was measured with three items from the previously validated UCLA Loneliness Scale (i.e., How much of the time do you feel: 1) you lack companionship, 2) left out, and 3) isolated from others). 26 Response categories ranged from 1 (often) to 3 (hardly ever or never). Responses were reverse scored, so that a higher score indicated higher loneliness. An overall score was derived by averaging the responses across the three items (α=0.80, range=1 to 3).
Living with partner/spouse. Participants were asked (yes/no), "Do you have a husband, wife, or partner with whom you live?" Frequency of contact with: children, other family, and friends. Frequency of contact with children, other family, or friends was each queried separately, but in the same way. For example, participants were asked: "On average, how often do you do each of the following?" 1) "Meet up (include both arranged and chance meetings)," 2) "Speak on the phone," 3) "Write or email." For each of these 3 categories of questions, HRS respondents had the option of choosing 1 of the following 6 responses: 1) ≥3x/week, 2) 1x-2x/week, 3) 1-2x/month, 4) every few months, 5) 1-2x/year, 6) <1x/year or never. 27 Because contact of any kind (regardless of medium) was the main point of intertest, the highest value on any of the three modes of contact (e.g., meet up, phone, write/email) was taken. In other words, if the respondent did not meet in person very often with the other person but spoke on the phone very often with that person, contact was operationalized as fairly common, given that they speak on the phone very often. A binary frequency of contact variable was created where <1x/week of contact was considered infrequent contact and >1x/week contact was considered frequent contact and (with this serving as the reference group).

Other factors
Personality. The "Big-5" personality traits (openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism) 28 were measured using 26 items derived from the Midlife Development Inventory Personality scales (MIDI) and International Personality Item Pool (IPIP). Using existing trait inventories, the goal of MIDI was to create the shortest possible collection of items that measured the Big-Five personality traits with high validity and reliability.
In a pilot study conducted among a probability sample of 1,000 adults aged 30-70, items with the highest item-to-total correlations and factor loadings were selected for the MIDI. Forward regressions were then computed to determine the smallest number of items needed to account for more than 90 percent of the total scale variance. As an illustrative example, items on the conscientiousness scale included "organized," "responsible," "hardworking," and "careless." Response categories ranged from 1 (a lot) to 4 (not at all). Responses were reverse scored, so that a higher score indicated higher indication of a given personality trait. An overall score for each personality trait was derived by averaging responses across all items of a given Big-5 Personality variable. where the first equality follows by the no-confounding assumption, the second by consistency, and the third by the statistical model.

Supplementary Table 1. Change in Sense of Purpose in Life from the Pre-Baseline Wave (t 0 ) to the Baseline Wave (t 1 ) a.b
Baseline Wave (t 1 ) Quartile .90*** *p<0.05 before Bonferroni correction; **p<0.01 before Bonferroni correction; ***p<0.05 after Bonferroni correction (the p-value cutoff for Bonferroni correction is p=0.05/35 outcomes=p<0.001)). Abbreviations: CI, confidence interval; OR, odds ratio; RR, risk ratio. a If the reference value is "1," the effect estimate is OR or RR; if the reference value is "0," the effect estimate is β. b An outcome-wide analytic approach was used, and a separate model for each outcome was run. A different type of model was run depending on the nature of the outcome: 1) for each binary outcome with a prevalence of ≥10%, a generalized linear model (with a log link and Poisson distribution) was used to estimate a RR; 2) for each binary outcome with a prevalence of <10%, a logistic regression model was used to estimate an OR; and 3) for each continuous outcome, a linear regression model was used to estimate a β. c All continuous outcomes were standardized (mean=0; standard deviation=1), and β was the standardized effect size. d The analytic sample was restricted to those who had participated in the baseline wave (t 1 ;2010 or 2012). Multiple imputation was performed to impute missing data on the exposure, covariates, and outcomes. All models controlled for sociodemographic characteristics (age, sex, race/ethnicity, marital status, annual household income, total wealth, level of education). These variables were controlled for in the pre-baseline wave (t 0 ;in 2006 or 2008). e The analytic sample was restricted to those who had participated in the baseline wave (t 1 ;2010 or 2012). Multiple imputation was performed to impute missing data on the exposure, covariates, and outcomes. All models controlled for sociodemographic characteristics (age, sex, race/ethnicity, marital status, annual household income, total wealth, level of education, employment status, health insurance, geographic region), pre-baseline childhood abuse, pre-baseline religious service attendance, pre-baseline values of the outcome variables (diabetes, hypertension, stroke, cancer, heart disease, lung disease, arthritis, overweight/obesity, physical functioning limitations, cognitive impairment, chronic pain, self-rated health, heavy drinking, current smoking status, physical activity, sleep problems, positive affect, life satisfaction, optimism, purpose in life, mastery, health mastery, financial mastery, depressive symptoms, hopelessness, negative affect, perceived constraints, loneliness, living with spouse/partner, contact children <1x/week, contact other family <1x/week, contact friends <1x/week), personality factors (openness, conscientiousness, extraversion, agreeableness, neuroticism) and the pre-baseline value of the exposure. These variables were controlled for in the pre-baseline wave (in t 0 ;2006 or 2008). f includes only study participants with no history of diabetes (n=10,032). g includes only study participants with no history of hypertension (n=5,145). For this analysis, we did not control for hypertension in wave 1 because the cell size was too small and the analysis did not converge. h includes only study participants with no history of stroke (n=11,916). i includes only study participants with no history of cancer (n=10,849). j includes only study participants with no history of heart disease (n=9,714). k includes only study participants with no history of lung disease (n=11,676). l includes only study participants with no history of arthritis (n=5,024). For this analysis, we did not control for arthritis in wave 1. because the cell size was too small and the analysis did not converge. m includes only study participants who were not overweight/obese (n=3,752). n includes only study participants who did not have physical limitations (n=9,797). o includes only study participants who did not have cognitive impairment (n=10,413). p includes only study participants who did not have chronic pain (n=8,286). 531 to 8,353) a,b,c,d  .05 before Bonferroni correction; **p<0.01 before Bonferroni correction; ***p<0.05 after Bonferroni correction (the p-value cutoff for Bonferroni correction is p=0.05/35 outcomes=p<0.001). Abbreviations: CI, confidence interval; OR, odds ratio; RR, risk ratio. a If the reference value is "1," the effect estimate is OR or RR; if the reference value is "0," the effect estimate is β. b The analytic sample was restricted to those who had participated in the baseline wave (t 1 ;2010 or 2012). All models controlled for pre-baseline sociodemographic characteristics (age, sex, race/ethnicity, marital status, annual household income, total wealth, level of education, employment status, health insurance, geographic region), pre-baseline childhood abuse, pre-baseline religious service attendance, pre-baseline values of the outcome variables (diabetes, hypertension, stroke, cancer, heart disease, lung disease, arthritis, overweight/obesity, physical functioning limitations, cognitive impairment, chronic pain, self-rated health, heavy drinking, current smoking status, physical activity, sleep problems, positive affect, life satisfaction, optimism, purpose in life, mastery, health mastery, financial mastery, depressive symptoms, hopelessness, negative affect, perceived constraints, loneliness, living with spouse/partner, contact children <1x/week, contact other family <1x/week, contact friends <1x/week), personality factors (openness, conscientiousness, extraversion, agreeableness, neuroticism) and the pre-baseline value of the exposure. These variables were controlled for in the pre-baseline wave (in t 0 ;2006 or 2008). c We used an outcome-wide analytic approach, and ran a separate model for each outcome. We also ran a different type of model depending on the nature of the outcome: 1) for each binary outcome with a prevalence of ≥10%, we ran a generalized linear model with a log link and Poisson distribution to estimate a RR; 2) for each binary outcome with a prevalence of <10%, we ran a logistic regression model to estimate an OR; and 3) for each continuous outcome, we ran a linear regression model to estimate a β. d All continuous outcomes were standardized (mean = 0; standard deviation = 1), and β was the standardized effect size.