Methods of analysis for survival outcomes with time-updated mediators, with application to longitudinal disease registry data

Mediation analysis is a useful tool to illuminate the mechanisms through which an exposure affects an outcome but statistical challenges exist with time-to-event outcomes and longitudinal observational data. Natural direct and indirect effects cannot be identified when there are exposure-induced confounders of the mediator-outcome relationship. Previous measurements of a repeatedly-measured mediator may themselves confound the relationship between the mediator and the outcome. To overcome these obstacles, two recent methods have been proposed, one based on path-specific effects and one based on an additive hazards model and the concept of exposure splitting. We investigate these techniques, focusing on their application to observational datasets. We apply both methods to an analysis of the UK Cystic Fibrosis Registry dataset to identify how much of the relationship between onset of cystic fibrosis-related diabetes and subsequent survival acts through pulmonary function. Statistical properties of the methods are investigated using simulation. Both methods produce unbiased estimates of indirect and direct effects in scenarios consistent with their stated assumptions but, if the data are measured infrequently, estimates may be biased. Findings are used to highlight considerations in the interpretation of the observational data analysis.

1 Simulation study: parameter values used Table 1: Parameter values used to generate simulated data for the reference scenario. To scale the hazards so that the majority of survival times were < 4, the generated hazard was multiplied by a scaling factor, ξ.

Simulation study: generation of truth
True values of the estimands were generated using a large simulated dataset as described in the main text. Figure 1 shows plots of the true values for the three survival curves for the reference scenario: In the NoDE sub-scenario (upper-left), the total effect equals the indirect effect and, therefore, Similarly, in the NoIE sub-scenario (upper-right), because none of the total effect goes through the mediator, the total effect equals the direct effect and S A(1),M (1) (t) = S A(1),M (0) (t). In the DE+IE sub-scenario (bottom-left), the dotted black line representing the survival curve S A(1),M (0) (t) is distinct from the other two curves indicating that both a direct and an indirect effect exist.

Simulation study: results from infrequent mediator measurement scenarios
In the scenario where A positively affected M , (F1), both methods produced biased results for both the direct and indirect effect estimates in the DE+IE and NoDE scenarios (Table 5). This led to a corresponding under-estimation of the DE. A similar pattern was seen in the NoDE sub-scenario. When the direction of the effect of A on M was reversed (F2), the bias in the estimates of DE and IE was even larger for both methods (Table 6). No bias was seen in the estimation of total effect for either method in either scenario. Because the frequency of the mediator measurements is inconsequential when there is no indirect effect, results from the NoIE sub-scenarios are not shown.          6 Simulation study: impact of conditioning on survival to first mediator measurement Because we chose to use a population restricted to those who had survived to the first mediator measurement in our main analysis, we also assessed the impact of not estimating effects conditional on survival to the first mediator measurement using the method of Vansteelandt to analyse the reference scenario data with and without individuals having events prior to the first mediator measurement. We compared estimates from an analysis of the reference scenario conditional on survival to t =1 with an unconditional analysis using the method of Vansteelandt. Figure 3 shows the effect estimates for the conditional analysis (orange) and the unconditional analysis (green). Because there were no events prior to t =1 in the conditional analysis, the TE, DE and IE were all equal to 1.0 at t =1. In contrast, in the unconditional analysis, some events have occurred prior to the first visit time and, because the exposure negatively affects survival in our simulation, the TE and DE estimates were less than one at t =1. Over time, the estimated DE and TE from the unconditional analysis continues to be lower than the corresponding estimates in the conditional analysis. However, as shown in Figure 3-top left, the indirect effect estimates coincide between the two analyses. Although proportion mediated was not studied in detail here, it is a commonly reported measure and Figure  3 (bottom right) shows that the difference in estimated proportion mediated is large between the conditional and unconditional analyses. At t =1.5, for example, the conditional analysis suggests that approximately 60% of the total effect is mediated but the unconditional analysis suggests that less than 40% is mediated.
We found that the estimates of TE and DE will differ between an analysis conditional on survival to the first mediator measurement and an unconditional analysis but the estimates of IE will be the same. As long as there is a non-zero DE and TE and there are events prior to t =1, their estimates will necessarily be different between the conditional and unconditional analyses. Because the IE cannot be estimated prior to measuring the mediator, these estimates coincide. Therefore, if the goal is to quantify the IE only, performing an analysis conditional on survival to the first mediator measurement should not change the results. If the goal is to estimate the proportion mediated, these two analyses will yield different results because the denominator of the proportion mediated is the total effect.

Application to CFRD: dataset creation
To construct the mediation analysis dataset, we vertically stacked age-specific datasets for ages 18-50 years. We first assumed that all data measurements were taken at integer-valued ages, a, using data from the annual review that most closely preceded each individual's a th birthday. The age-specific dataset for age a comprises individuals who are at risk at age a and have either been diagnosed with CFRD within the past year (the exposed) or have not been diagnosed with CFRD (the unexposed). Individuals contribute data when unexposed to multiple age-specific datasets but will only contribute once as an exposed person. Figure 4 illustrates the creation of age-specific sets of data for two hypothetical individuals A and B from ages 23 to 26 where FEV1 is the repeatedlymeasured mediator. In this example, the earliest data available for A and B is at age 23 so we use this data for baseline measurements in the age a =24 dataset to ensure proper causal ordering from Z 0 → A → M 1 . For example, the age a =24 dataset for person A is created by setting the start time to 0 at age 24, and then adding start and stop times to indicate the range of times (where age 24 equals time 0) over which the mediator measurement is valid. The age a is added to each row of the dataset for adjustment. An event indicator is set to 1 to indicate an event occurred at the associated stop time and the CFRD indicator will be 1 in the age-specific dataset in which the individual was diagnosed. A contributes data to the ages a =24, 25, 26 age-specific datasets. Individual B was diagnosed with CFRD at age 25, therefore, B does not contribute to the a =26 dataset because we only use data from the first age at which a diagnosis of CFRD occurs. The age-specific data for all individuals is then vertically stacked to form one analysis dataset.  Figure 4: Construction of the mediation analysis dataset. On the left, a table representing the raw data, formatted with one row per person per integer age. On the right, a table showing the analysis dataset with age-specific data for each individual. This data is formatted with start and stop times indicating the valid time for the mediator measurement. Individual A contributes 3 age-specific sets of data (ages= 24, 25, 26) and individual B contributes 2 age-specific sets of data (ages=24, 25). Only people newly diagnosed with CFRD and people without CFRD contribute data.