Gender Roles and Employment Pathways of Older Women and Men in England

In the context of population aging, the U.K. government is encouraging people to work longer and delay retirement, and it is claimed that many people now make “gradual” transitions from full-time to part-time work to retirement. Part-time employment in older age may, however, be largely due to women working part-time before older age, as per a U.K. “modified male breadwinner” model. This article therefore separately examines the extent to which men and women make transitions into part-time work in older age, and whether such transitions are influenced by marital status. Following older men and women over a 10-year period using the English Longitudinal Study of Ageing, this article presents sequence, cluster, and multinomial logistic regression analyses. Little evidence is found for people moving into part-time work in older age. Typically, women did not work at all or they worked part-time (with some remaining in part-time work and some retiring/exiting from this activity). Consistent with a “modified male breadwinner” logic, marriage was positively related to the likelihood of women belonging to typically “female employment pathway clusters,” which mostly consist of part-time work or not being employed. Men were mostly working full-time regardless of marital status. Attempts to extend working lives among older women are therefore likely to be complicated by the influence of traditional gender roles on employment.


Web appendix A: Missing data
Item missing data are quite low for the variables used in this study. For the variables used in the sequence, 162 out of 14,742 observations (person*wave) are missing (just over 1 per cent of the observations). However, when we update this to having complete data over all six waves (in order to have complete sequences), we retain 14,016 observations (just over 95 per cent of the observations) nested in 2,336 respondents (121 respondents are dropped for having at least one missing value). For the predictors of clusters, there are 50 remaining respondents with missing data on at least one variable (out of 2,336 cases, 2.1 per cent). A bigger problem is the wave-missing data. If we look at individuals who participated in the first wave (and thus potentially could have participated in all six waves) and keeping the other selections we made such as excluding proxy interviews and individuals who were not core members, 28.5 per cent of the observations (or 45.6 per cent of the respondents) were excluded from the analyses.
It is not yet clear how to deal with missing data when looking at sequences. Different solutions have been proposed. The simplest option is of course to only look at complete cases, as we have done in our article and as previous research has done as well (e.g., Fasang what the best way is to deal with missing data, we investigate how individuals who were excluded differed from the individuals who were kept in the analyses with respect to some background variables. Halpin (2012) further suggests that individuals who change states a lot are especially likely to have missing data, making this an important possible bias.
To examine predictors of dropping out we need individuals to not have missing data on the variables predicting the missing data. We use variables from wave 1 to predict missingness as all will have participated in this wave (see above). We excluded individuals with item missing on the first wave predictors, which leads to the exclusion of 168 individuals or 3.7 per cent of the cases. Of these 168 individuals, 92 have not participated in all waves (55 per cent) and 76 did participate in all waves (45 per cent).
When looking at the predictors of not participating in at least one wave of the study, we see that individuals with a higher educational level are less likely to miss a wave. Individuals who were married in the first wave were more likely to miss a wave. Pertaining to income, individuals in the lowest income quartile were most likely to miss a wave. Finally, individuals who were employed in a short part-time job or who were self-employed in a long part-time job were somewhat less likely to miss a wave than individuals who were not employed in the first wave. To conclude, there seems to be some selection bias in terms of the individuals missing a wave of the study.

Web appendix B: More information about variables
This appendix gives some more information about the variables used in this study. The dataset contains a derived variable dividing individuals into being either 'employed' or 'selfemployed'. Subsequent questions depended on this distinction. The documentation of wave 1 describes this derived variable that makes this distinction as follows: If individuals were asked whether they were employed or self-employed, the answer to this question was used. If the respondent was paid a salary or wage by an agency, the person was also considered employed. There were also instances when an individual was considered self-employed, namely if this individual was (1) a sole director of a limited company, (2) running a business or professional practice, (3) a partner in a business or professional practice or working for oneself, and (4) a subcontractor or doing freelance work (p. 59 Questionnaire wave 1).
To divide jobs on the basis of working time, the exact question for individuals who were in employment was: "How many hours a week do you usually work in this job, excluding meal breaks but including any paid overtime?" The exact question for individuals who were self-employed was: "How many hours a week do you usually work, including both the books, VAT and so on?" After this, we looked at reasons for currently not working. If individuals were currently not working, they received the following question: "Can I just check, at any point during the last month were you … READ OUT … 1 … temporary away from paid work, 2 looking for paid work, 3 or, waiting to take up paid work already accepted? 96 None of these". All were considered not working. Some stated they were temporarily away from work, but still answered the question on how much they usually worked. These answers were considered legitimate and kept in the analyses as they were not really not employed (just temporarily away from work). However, if they were temporarily away from work and did not answer the question how much they worked (5 observations) they were not considered employed as they were not employed right now.
The final variable included in the sequence was whether someone died during the observation period. If someone died during the observation period we know their status and hence we treat this as part of a complete sequence. Disregarding individuals who died during the observation period was expected to bias the sample to more active and healthy respondents. This information about death was taken from the index file of ELSA. The information about whether someone died came from a variety of sources: (1) interviewers collected this information during fieldwork, (2) it could have come from communication with relatives or others between waves of fieldwork, and (3) if permission was given to link the data to the National Health Service Central Register (NHSCR) it could also come from this.
The variable "Mortwave" gives the information when someone died in relation to the waves in ELSA. For information about this, please see Phelps and Wood (2013).
On the next few pages, we show the multinomial logistic regression for the tables presented in the paper. In the paper we only show the adjusted (predicted) probabilities and discrete differences as we believe they can be interpreted more effectively. Nevertheless, we present the tables on the following pages to provide further evidence for our interpretation of the results.  Age Educational level Low education - Health  Age

Educational level
Low education - Health

Web appendix D1: Different types of sequence analyses
In this web appendix we investigate two alternative types of sequence analyses to see how sensitive the results are to our choice for the Longest Common Subsequence (LCS).
Specifically, we are looking at the Hamming Distance (HAM) and the Dynamic Hamming Distance (DHD).

Web appendix D1.1: Hamming Distance
Figures D1 and D2 of this appendix show the results of the hamming distance. In general, the same clusters seem to be detected. The only substantial difference is that the phased retirement cluster "Employed full-time à employed long part-time" is no longer found and instead a cluster where respondents are employed in the first wave and then mostly not employed in subsequent waves is found. The phased retirement pathways seem to be incorporated in the "Employed full-time for less than four waves à not working" cluster.

Web appendix D1.2: Dynamic Hamming Distance
Results are in general quite similar as the hamming distance. For the cluster "Employed fulltime for about four waves à not working" employment no longer has to be followed by not working, but may also be followed by part-time work.
To conclude, the more dynamic measured Hamming Distance and Dynamic Hamming Distance do not lead to radically different cluster solutions than the Longest Common Subsequence and are not characterized by more transitions. We believe that the differences in general are small and not an improvement over the Longest Common Subsequence solution. Note: Clusters that are different in substantive meaning are shaded.

Appendix D2 Weighting
As a first robustness check, we weighted the data. Preferably, we would have used the weight "w6lwgt" provided by ELSA. This is a longitudinal weight that exists for individuals that participated in all six waves. However, we kept individuals who died during the observation period in the analysis and obviously, these individuals do not have a longitudinal weight. As all our respondents participated in the first wave, we therefore used the wave 1 crosssectional weight provided by ELSA ("W1wgt"). This weight accounts for non-response in this wave. Although this is not the ideal weight, it should give us an indication to how sensitive our analyses are to corrections to non-random non-response. The sample does not need to be corrected for selection probabilities due to the design (see 'English Longitudinal Study of Ageing (ELSA) User Guide for the Wave 1 Core Dataset Version 3'). There are two places in our analyses that we check for the impact of weighting: the clustering and the multinomial regression.

Appendix D2.1 Weighting and clustering
For information on how to include weighting in PAM-cluster analysis with sequence analysis, please see Studer (2013). Overall, the weighted results are similar to the nonweighted analysis in the article. If we specify 11 clusters, the summary statistics of the cluster solution are similar as found in the version presented in the article. The within variance was 0.16, the between variance 0.74, the within/between variance ratio 0.22, and the average silhouette width was 0.48. When we weight our cluster solution, the within variance is 0.17, the between variance 0.75, the within/between variance ratio 0.23, and the average silhouette width 0.48. Looking more substantively at the sequences that make up the cluster, we see that similar clusters are made but that the n per cluster differs for various clusters. Some clusters appear to be more stable than other clusters, but the same characteristics make up the clusters.

Appendix D2.2 Weighting and multinomial logistic regression
There are two ways to approach this. We could do a weighted version of multinomial logistic regression (1) on the newly weighted clusters or (2) on the old unweighted clusters. To see whether weighting the analysis makes a difference for the predictors, using the old clusters makes most sense. Otherwise, we are comparing two different things. To see whether the difference in number of respondents per cluster matters, it also makes sense to perform this regression on the new cluster. Hence, we do both.
We constructed Table 4 again, but now with weighted multinomial regressions on the old and the new clusters. Because the LR-test is likely not to be valid with the weighting included, we used the χ 2 -test to test of significance of variable.
Let us start with the gender difference in likelihood of being in each cluster. As the table shows, the three versions of the multinomial regression models show similar results. The next couple of pages show the full AME multinomial logistic regression tables.
Differences with regard to the statistical significane of results (using a p = .050 threshold), compared with the results in the paper, are underlined and the shading is vertically striped.     The main conclusion of this has to be that weighting does not lead to drastically different results. In most cases when there is a difference in conclusion of a predictor, that is in conclusion on whether it is significant (using a p=.050 threshold), the predictor was already only marginally significant or became marginally significant. The biggest difference seems to be when we weight both the clustering as the multinomial logistic regression for women for the cluster 'employed long part-time à not employed'. The results that can be found in the article mostly remain. However, the variable 'married' which was marginally significant related to the clusters for men became just non-significant using the standard threshold of p=.050. The results for men of the variable 'being married' on the cluster solution should, therefore, be interpreted with more caution. However, it should be stated that our conclusion in the paper was already that marriage seemed more important for women than for men.

Appendix D3 Ward clustering instead of PAM clustering
In the analysis we presented in the article, we used PAM clustering. The reason we chose this type of clustering is that it does not assume a hierarchical structure and it is considered a more robust version of k-means clustering. However, Ward-clustering is probably the most often used type of clustering. As a robustness check we see whether we would have ended up with qualitatively different clusters than with the method used in the article. If clusters are rather stable and clear, one would expect the same clusters to come out.
The summary statistics look rather similar, regardless of the clustering method. In the model presented in the main article, the within variance was 0.16, the between variance 0.74, the within/between variance ratio 0.22, and the average silhouette width was 0.48. When we use Ward-clustering instead of PAM-clustering, the within variance is 0.15, the between variance 0.74, the within/between variance ratio 0.21, and the average silhouette width 0.48.
Again, when we look more substantively at the sequences that make up the cluster, we see that similar clusters are constructed but that the n per cluster differs. In some cases, this is a small difference, in other cases a bigger difference. Regardless, however, the same characteristics make up the clusters.

Appendix D4 Different seed
The PAM algorithm to find clusters starts with looking for an initial set of medoids, which has a random component (see the explanation in 'package 'cluster' ' (2015) and 'Clustering using the pam algorithm in R' (2011)). Consequently, results may differ when running the pam algorithm several times. This is considered a sign of having found an unstable cluster solution ('Clustering using the pam algorithm in R', 2011). To ensure this is not the case, we re-ran the analyses 5 times, each time with a different seed to ensure that we ended up with the same clusters. To check this, we looked at the 11 cluster solution and checked whether we had the same medoids and the same n per cluster. This was indeed the case.

Appendix D5 Different working hours cut-offs
Thus far, we have treated individuals who worked 35 hours or more per week as full-time workers (regardless of whether they were in employment or self-employed). Some tax benefits, however, can be obtained when working at least 30 hours a week ('Working Tax Credit' 2015). As a robustness check, we therefore check whether results would differ if we consider working 30 hours/week or more instead of 35 hours/week or more as full-time employment. Again, we investigated whether we ended up with similar clusters if we specified we were interested in finding 11 clusters.
The first thing to notice is that the cluster solution seemed to fit the data less well than the cluster solution based on 35 hours/week as cut-off point. In the model presented in the main article, the within variance was 0.16, the between variance 0.74, the within/between variance ratio 0.22, and the average silhouette width was 0.48. When looking at 30 hours/week as cutoff point, the within variance was 0.16, the between variance 0.71, the within/between variance ratio 0.22, and the average silhouette width 0.41.
This time, we also see clear differences between the clusters. Although some clusters remain fairly stable, there are also some differences. The cluster mostly part-time selfemployed seems to be more cluttered now by other types of self-employment, leading to a new cluster with at least some self-employment. Moreover, the cluster employed full-time à employed long part-time no longer comes up as a separate cluster. This makes sense as individuals were already not very likely to be in this cluster and we diminished the number of people having the state 'employed long part-time' in favour of 'employed full-time'. Instead, the cluster mostly not employed is now separated in a cluster where all individuals are not employed in any of the six waves and a cluster where individuals were not employed in four or five waves. Despite these differences it is important to note that most clusters are still the same and are still characterised by no transitions or one transition.