Abstract
Multiple imputation (MI) is one of the principled methods for dealing with missing data. In addition, multilevel models have become a standard tool for analyzing the nested data structures that result when lower level units (e.g., employees) are nested within higher level collectives (e.g., work groups). When applying MI to multilevel data, it is important that the imputation model takes the multilevel structure into account. In the present paper, based on theoretical arguments and computer simulations, we provide guidance using MI in the context of several classes of multilevel models, including models with random intercepts, random slopes, cross-level interactions (CLIs), and missing data in categorical and group-level variables. Our findings suggest that, oftentimes, several approaches to MI provide an effective treatment of missing data in multilevel research. Yet we also note that the current implementations of MI still have room for improvement when handling missing data in explanatory variables in models with random slopes and CLIs. We identify areas for future research and provide recommendations for research practice along with a number of step-by-step examples for the statistical software R.
References
|
Aguinis, H., Culpepper, S. A. (2015). An expanded decision-making procedure for examining cross-level interaction effects with multilevel modeling. Organizational Research Methods, 18(2), 155–176. doi:10.1177/1094428114563618 Google Scholar | SAGE Journals | ISI | |
|
Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage. Google Scholar | |
|
Allison, P. D. (2012). Handling missing data by maximum likelihood. In Proceedings of the SAS Global Forum. Retrieved from http://support.sas.com/ Google Scholar | |
|
Andridge, R. R. (2011). Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials. Biometrical Journal, 53, 57–74. doi:10.1002/ bimj.201000140 Google Scholar | Crossref | Medline | ISI | |
|
Asparouhov, T., Muthén, B. O. (2010). Multiple imputation with Mplus (Technical Appendix). Retrieved from http://statmodel.com/ Google Scholar | |
|
Bartlett, J. W., Seaman, S. R., White, I. R., Carpenter, J. R. (2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462–487. doi:10.1177/0962280214521348. Google Scholar | SAGE Journals | ISI | |
|
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling: A Multidisciplinary Journal, 15, 651–675. doi:10.1080/10705510802339072 Google Scholar | Crossref | ISI | |
|
Carpenter, J. R., Goldstein, H., Kenward, M. G. (2011). REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. Journal of Statistical Software, 45(5), 1–14. doi:10.18637/jss.v045.i05 Google Scholar | Crossref | ISI | |
|
Carpenter, J. R., Kenward, M. G. (2013). Multiple imputation and its application. Hoboken, NJ: Wiley. Google Scholar | Crossref | |
|
Cheung, M. W.-L. (2007). Comparison of methods of handling missing time-invariant covariates in latent growth models under the assumption of missing completely at random. Organizational Research Methods, 10, 609–634. doi:10.1177/1094428106295499 Google Scholar | SAGE Journals | ISI | |
|
Collins, L. M., Schafer, J. L., Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351. doi:10.1037/1082-989X.6.4.330 Google Scholar | Crossref | Medline | ISI | |
|
Drechsler, J. (2015). Multiple imputation of multilevel missing data—Rigor versus simplicity. Journal of Educational and Behavioral Statistics, 40, 69–95. doi:10.3102/1076998614563393 Google Scholar | SAGE Journals | ISI | |
|
Enders, C. K. (2008). A note on the use of missing auxiliary variables in full information maximum likelihood-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 434–448. doi:10.1080/10705510802154307 Google Scholar | Crossref | ISI | |
|
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford. Google Scholar | |
|
Enders, C. K., Mistler, S. A., Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21, 222–240. doi:10.1037/met0000063 Google Scholar | Crossref | Medline | ISI | |
|
Erler, N. S., Rizopoulos, D., van Rosmalen, J., Jaddoe, V. W. V., Franco, O. H., Lesaffre, E. M. E. H. (2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Statistics in Medicine, 35, 2955–2974. doi:10.1002/sim.6944 Google Scholar | Crossref | Medline | ISI | |
|
Gelman, A., Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press. Google Scholar | Crossref | |
|
Gibson, N. M., Olejnik, S. (2003). Treatment of missing data at the second level of hierarchical linear models. Educational and Psychological Measurement, 63, 204–238. doi:10.1177/0013164402250987 Google Scholar | SAGE Journals | ISI | |
|
Goldstein, H., Carpenter, J. R., Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.1111/rssa.12022 Google Scholar | Crossref | ISI | |
|
Goldstein, H., Carpenter, J. R., Kenward, M. G., Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173–197. doi:10.1177/1471082X0800900301 Google Scholar | SAGE Journals | ISI | |
|
Gottfredson, N. C., Sterba, S. K., Jackson, K. M. (2016). Explicating the conditions under which multilevel multiple imputation mitigates bias resulting from random coefficient-dependent missing longitudinal data. Prevention Science. Advance online publication. doi:10.1007/s11121-016-0735-3 Google Scholar | ISI | |
|
Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 10, 80–100. doi:10.1207/S15328007SEM1001_4 Google Scholar | Crossref | ISI | |
|
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. doi:10.1146/annurev.psych.58.110405.085530 Google Scholar | Crossref | Medline | ISI | |
|
Graham, J. W., Olchowski, A. E., Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. doi:10.1007/s11121-007-0070-9 Google Scholar | Crossref | Medline | ISI | |
|
Graham, J. W., Taylor, B. J., Olchowski, A. E., Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323–343. doi:10.1037/1082-989X.11.4.323 Google Scholar | Crossref | Medline | ISI | |
|
Grund, S., Lüdtke, O., Robitzsch, A. (2016a). Multiple imputation of missing covariate values in multilevel models with random slopes: A cautionary note. Behavior Research Methods, 48, 640–649. doi:10.3758/s13428-015-0590-3 Google Scholar | Crossref | Medline | ISI | |
|
Grund, S., Lüdtke, O., Robitzsch, A. (2016b). Multiple imputation of multilevel missing data: An introduction to the R package pan. SAGE Open, 6(4), 1–17. doi:10.1177/2158244016668220 Google Scholar | SAGE Journals | ISI | |
|
Grund, S., Lüdtke, O., Robitzsch, A. (in press). Missing data in multilevel research. In Humphrey, S. E., LeBreton, J. M. (Eds.), Handbook for multilevel theory, measurement, and analysis. Washington, DC: American Psychological Association. Google Scholar | |
|
Hofmann, D. A., Gavin, M. B. (1998). Centering decisions in hierarchical linear models: Implications for research in organizations. Journal of Management, 24, 623–641. doi:10.1177/014920639802400504 Google Scholar | SAGE Journals | ISI | |
|
Hox, J. J., van Buuren, S., Jolani, S. (2016). Incomplete multilevel data. In Harring, J., Stapleton, L. M., Beretvas, S. N. (Eds.), Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications (pp, 39–62). Charlotte, NC: Information Age. Google Scholar | |
|
Keller, B. T., Enders, C. K. (2016). Blimp Software Manual (Version Beta 6.6) [Computer software]. Retrieved from http://www.appliedmissingdata.com Google Scholar | |
|
Kim, S., Sugar, C. A., Belin, T. R. (2015). Evaluating model-based imputation methods for missing covariates in regression models with interactions. Statistics in Medicine, 34, 1876–1888. doi:10.1002/sim.6435 Google Scholar | Crossref | Medline | ISI | |
|
Kreft, I. G. G., de Leeuw, J., Aiken, L. S. (1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1–21. doi:10.1207/s15327906mbr3001_1 Google Scholar | Crossref | Medline | ISI | |
|
Little, R. J. A., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley. Google Scholar | Crossref | |
|
Lüdtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., Muthén, B. O. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203–229. doi:10.1037/a0012869 Google Scholar | Crossref | Medline | ISI | |
|
Lüdtke, O., Robitzsch, A., Grund, S. (2017). Multiple imputation of missing data in multilevel designs: A comparison of different strategies. Psychological Methods, 22, 141–165. doi:10.1037/met0000096 Google Scholar | Crossref | Medline | ISI | |
|
Lunn, D. J., Thomas, A., Best, N., Spiegelhalter, D. (2000). WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337. doi:10.1023/A:1008929526011 Google Scholar | Crossref | ISI | |
|
McNeish, D. M. (2016). Using data-dependent priors to mitigate small sample bias in latent growth models: A discussion and illustration using Mplus. Journal of Educational and Behavioral Statistics, 41, 27–56. doi:10.3102/1076998615621299 Google Scholar | SAGE Journals | ISI | |
|
Mehta, P. D. (2013). xxM (Version 0.6.0) [Computer software]. Retrieved from xxm.times.uh.edu Google Scholar | |
|
Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9, 538- 558. doi:10.1214/ss/1177010269 Google Scholar | ISI | |
|
Mistler, S. A. (2013). A SAS macro for applying multiple imputation to multilevel data. In Proceedings of the SAS Global Forum. Retrieved from http://support.sas.com/ Google Scholar | |
|
Mistler, S. A. (2015). Multilevel multiple imputation: An examination of competing methods (Doctoral dissertation). Retrieved from http://repository.asu.edu/ Google Scholar | |
|
Muthén, L. K., Muthén, B. O. (2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén. Google Scholar | |
|
Newman, D. A. (2009). Missing data techniques and low response rates. In Lance, C. E., Vandenberg, R. J. (Eds.), Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 7–36). New York, NY: Routledge. Google Scholar | |
|
Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17, 372–411. doi:10.1177/1094428114548590 Google Scholar | SAGE Journals | ISI | |
|
Plummer, M. (2016). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling (Version 4.2.0) [Computer software]. Retrieved from http://sourceforge.net/projects/mcmc-jags/ Google Scholar | |
|
Preacher, K. J., Zyphur, M. J., Zhang, Z. (2010). A general multilevel SEM framework for assessing multilevel mediation. Psychological Methods, 15, 209–233. doi:10.1037/a0020141 Google Scholar | Crossref | Medline | ISI | |
|
Quartagno, M., Carpenter, J. R. (2016). Jomo: A package for multilevel joint modelling multiple imputation (Version 2.3-1) [Computer software]. Retrieved from http://CRAN.R-project.org/package=jomo Google Scholar | |
|
R Core Team . (2016). R: A language and environment for statistical computing (Version 3.3.0) [Computer software]. Retrieved from http://www.R-project.org/ Google Scholar | |
|
Rabe-Hesketh, S., Skrondal, A., Zheng, X. (2012). Multilevel structural equation modeling. In Hoyle, R. H. (Ed.), Handbook of structural equation modeling (pp. 512–531). New York, NY: Guilford. Google Scholar | |
|
Rasbash, J., Charlton, C., Browne, W. J., Healy, M., Cameron, B. (2015). MLwiN (Version 2.34) [Computer software]. Bristol, UK: University of Bristol, Centre for Multilevel Modelling. Google Scholar | |
|
Resche-Rigon, M., White, I. R. (2016). Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Statistical Methods in Medical Research. doi:10.1177/0962280216666564 Google Scholar | SAGE Journals | |
|
Robitzsch, A., Grund, S., Henke, T. (2016). Miceadds: Some additional multiple imputation functions, especially for mice (Version 1.7-8) [Computer software]. Retrieved from http://CRAN.R-project.org/package=miceadds Google Scholar | |
|
Royston, P. (2004). Multiple imputation of missing values. Stata Journal, 4, 227–241. Google Scholar | SAGE Journals | |
|
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592. doi:10.1093/biomet/63.3.581 Google Scholar | Crossref | ISI | |
|
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley. Google Scholar | Crossref | |
|
Schafer, J. L. (2003). Multiple imputation in multivariate problems when the imputation and analysis models differ. Statistica Neerlandica, 57, 19–35. doi:10.1111/1467-9574.00218 Google Scholar | Crossref | ISI | |
|
Schafer, J. L., Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. doi:10.1037//1082-989X.7.2.147 Google Scholar | Crossref | Medline | ISI | |
|
Schafer, J. L., Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437–457. doi:10.1198/106186002760180608 Google Scholar | Crossref | ISI | |
|
Seaman, S. R., Bartlett, J. W., White, I. R. (2012). Multiple imputation of missing covariates with non-linear effects and interactions: An evaluation of statistical methods. BMC Medical Research Methodology, 12(1), 46. Retrieved from http://www.biomedcentral.com/1471-2288/12/46 Google Scholar | |
|
Shin, Y., Raudenbush, S. W. (2010). A latent cluster-mean approach to the contextual effects model with missing data. Journal of Educational and Behavioral Statistics, 35, 26–53. doi:10.3102/1076998609345252 Google Scholar | SAGE Journals | ISI | |
|
Snijders, T. A. B., Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage. Google Scholar | |
|
Stubbendick, A. L., Ibrahim, J. G. (2003). Maximum likelihood methods for nonignorable missing responses and covariates in random effects models. Biometrics, 59, 1140–1150. doi:10.1111/j.0006-341X.2003.00131.x Google Scholar | Crossref | Medline | ISI | |
|
Taljaard, M., Donner, A., Klar, N. (2008). Imputation strategies for missing continuous outcomes in cluster randomized trials. Biometrical Journal, 50, 329–345. doi:10.1002/bimj.200710423 Google Scholar | Crossref | Medline | ISI | |
|
van Buuren, S. (2011). Multiple imputation of multilevel data. In Hox, J. J. (Ed.), Handbook of advanced multilevel analysis (pp. 173–196). New York, NY: Routledge. Google Scholar | |
|
van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press. Google Scholar | Crossref | |
|
van Buuren, S., Groothuis-Oudshoorn, K. (2011). MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. doi:10.18637/jss.v045.i03 Google Scholar | ISI | |
|
Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33, 213–239. doi:10.1111/j.0081-1750.2003.t01-1-00131.x Google Scholar | SAGE Journals | ISI | |
|
Vermunt, J. K., Magidson, J. (2013). Latent GOLD (Version 5.0) [Computer software]. Belmont, MA: Statistical Innovations. Google Scholar | |
|
Vermunt, J. K., van Ginkel, J. R., van der Ark, L. A., Sijtsma, K. (2008). Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology, 38, 369–397. doi:10.1111/j.1467-9531.2008.00202.x Google Scholar | SAGE Journals | ISI | |
|
Vink, G., van Buuren, S. (2013). Multiple imputation of squared terms. Sociological Methods & Research, 42, 598–607. doi:10.1177/0049124113502943 Google Scholar | SAGE Journals | ISI | |
|
von Hippel, P. T. (2009). How to impute interactions, squares, and other transformed variables. Sociological Methodology, 39, 265–291. doi:10.1111/j.1467-9531.2009.01215.x Google Scholar | SAGE Journals | ISI | |
|
Wu, L. (2010). Mixed effects models for complex data. Boca Raton, FL: CRC Press. Google Scholar | |
|
Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 366, 2389–2403. doi:10.1098/rsta.2008.0038 Google Scholar | Crossref | Medline | ISI | |
|
Zhang, Q., Wang, L. (2016). Moderation analysis with missing data in the predictors. Psychological Methods. Advance online publication. doi:10.1037/met0000104 Google Scholar | Crossref | ISI | |
|
Zinn, S. (2013). An imputation model for multilevel binary data (NEPS Working Paper No. 31). Retrieved from http://www.neps-data.de/ Google Scholar |
