Abstract
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
References
| 1. | Cox, DR . Regression models and life-tables. J Royal Stat Soc Ser B (Methodological) 1972; 34: 187–220. Google Scholar |
| 2. | Cox, DR . Partial likelihood. Biometrika 1975; 62: 269–276. Google Scholar | Crossref | ISI |
| 3. | Andersen, PK, Gill, RD. Cox’s regression model for counting processes: a large sample study. Ann Stat 1982; 10: 1100–1120. Google Scholar | Crossref | ISI |
| 4. | Little, RJA, Rubin, DB. Statistical analysis with missing data, 2nd ed. New York, NY: Wiley, 2002. Google Scholar | Crossref |
| 5. | Rubin, DB . Multiple imputation for nonresponse in surveys, New York, NY: Wiley, 1987. Google Scholar | Crossref |
| 6. | Rathouz, PJ . Identifiability assumptions for missing covariate data in failure time regression models. Biostatistics 2007; 8: 345–356. Google Scholar | Crossref | Medline | ISI |
| 7. | Robins, JM, Rotnitzky, A, Zhao, LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994; 89: 846–866. Google Scholar | Crossref | ISI |
| 8. | Wang, CY, Chen, HY. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 2001; 57: 414–419. Google Scholar | Crossref | Medline |
| 9. | Qi, L, Wang, CY, Prentice, RL. Weighted estimators for proportional hazards regression with missing covariates. J Am Stat Assoc 2005; 100: 1250–1263. Google Scholar | Crossref | ISI |
| 10. | Kang, JDY, Schafer, JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 2007; 22: 523–539. Google Scholar | Crossref | ISI |
| 11. | Long, Q, Hsu, C-H, Li, Y. Doubly robust nonparametric multiple imputation for ignorable missing data. Stat Sinica 2012; 22: 149–172. Google Scholar | Crossref | Medline | ISI |
| 12. | Rubin, DB . Statistical matching using file concatenation with adjusted weights and multiple imputation. J Business Econ Stat 1986; 4: 87–94. Google Scholar | ISI |
| 13. | Rosenbaum, PR, Rubin, DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985; 39: 33–38. Google Scholar | ISI |
| 14. | White, IR, Royston, P. Imputing missing covariate values for the Cox model. Stat Med 2009; 28: 1982–1998. Google Scholar | Crossref | Medline | ISI |
| 15. | Qi, L, Wang, YF, He, Y. A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Stat Med 2010; 29: 2592–2604. Google Scholar | Crossref | Medline | ISI |
| 16. | Marshall, A, Altman, DG, Royston, P, et al. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 2010; 10: 7–7. Google Scholar | Crossref | Medline | ISI |
| 17. | Efron, B . Bootstrap methods: another look at the jackknife. Ann Stat 1979; 7: 1–26. Google Scholar | Crossref | ISI |
| 18. | Nielsen, SF . Proper and improper multiple imputation. Int Stat Rev 2003; 71: 593607–593607. Google Scholar |
| 19. | Robins, JM, Rotnitzky, A, van der Laan, M. Comment on profile likelihood. J Am Stat Assoc 2000; 95: 477–482. Google Scholar |
| 20. | Tan, M, Yu, D. Molecular mechanisms of erbB2-mediated breast cancer chemo-resistance. Adv Experiment Med Biol 2007; 608: 119–129. Google Scholar | Crossref | Medline |
| 21. | Hsu, C-H, Long, Q, Li, Y, et al. A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data. J Biopharmaceut Stat 2014; 24: 634–648. Google Scholar | Crossref | Medline |
| 22. | Hsu, C-H, He, Y, Li, Y, et al. Doubly robust multiple imputation using kernel-based techniques. Biometric J 2016; 58: 588–606. Google Scholar | Crossref | Medline |
| 23. | Fay, R . Proceedings of the section on survey research methods, Washington, DC: American Statistical Association, 1992, pp. 227–232. Google Scholar |
| 24. | Meng, XL . Multiple imputation inferences with uncongenial sources of input. Stat Sci 1994; 9: 538–573. with Discussion. Google Scholar | Crossref | ISI |
| 25. | Rubin, DB . Multiple imputation after 18 years. J Am Stat Assoc 1996; 91: 473–490. Google Scholar | Crossref | ISI |
| 26. | Bartlett, JW, Seaman, SR, White, IR, et al. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Meth Med Res 2015; 24: 462–487. Google Scholar | SAGE Journals | ISI |
| 27. | Carpenter, JR, Kenward, MG, White, IR. Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Stat Meth Med Res 2007; 16: 259–275. Google Scholar | SAGE Journals | ISI |

