Abstract
Obtaining predictions from regression models fit to multiply imputed data can be challenging because treatments of multiple imputation seldom give clear guidance on how predictions can be calculated, and because available software often does not have built-in routines for performing the necessary calculations. This research note reviews how predictions can be obtained using Rubin’s rules, that is, by being estimated separately in each imputed data set and then combined. It then demonstrates that predictions can also be calculated directly from the final analysis model. Both approaches yield identical results when predictions rely solely on linear transformations of the coefficients and calculate standard errors using the delta method and diverge only slightly when using nonlinear transformations. However, calculation from the final model is faster, easier to implement, and generates predictions with a clearer relationship to model coefficients. These principles are illustrated using data from the General Social Survey and with a simulation.
References
|
Allison, Paul D. 2002. Missing Data. Thousand Oaks, CA: Sage. Google Scholar | Crossref | |
|
Allison, Paul D. 2009. “Missing Data.” Pp. 72–89 in The Sage Handbook of Quantitative Methods in Psychology, edited by Millsap, Roger E., Maydeu-Olivares, Alberto. Sage. Google Scholar | Crossref | |
|
Blau, Peter M., Duncan, Otis Dudley. 1967. The American Occupational Structure. New York: The Free Press. Google Scholar | |
|
Enders, Craig K. 2010. Applied Missing Data Analysis. New York: The Guilford Press. Google Scholar | |
|
Graham, John W., Olchowski, Allison E., Gilreath, Tamika D.. 2007. “How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory.” Prevention Science 8:206–13. Google Scholar | Crossref | Medline | ISI | |
|
Greene, William H. 2008. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson/Prentice Hall. Google Scholar | |
|
Honaker, James, King, Gary. 2010. “What to Do about Missing Values in Time-series Cross-section Data.” American Journal of Political Science 54:561–81. Google Scholar | Crossref | ISI | |
|
IBM Corp . Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp. Google Scholar | |
|
R Core Team . 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Google Scholar | |
|
Rubin, Donald B. 1996. “Multiple Imputation after 18+ Years.” Journal of the American Statistical Association 91:473–89. Google Scholar | Crossref | ISI | |
|
SAS Institute, Inc . 2013. SAS/STAT, Version 9.4. Cary, NC: SAS Institute, Inc. Google Scholar | |
|
Schafer, J. L. 1999. “Multiple Imputation: A Primer.” Statistical Methods in Medical Research 8:3–15. Google Scholar | SAGE Journals | ISI | |
|
Schafer, Joseph L., Graham, John W.. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods 7:147–77. Google Scholar | Crossref | Medline | ISI | |
|
StataCorp . 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP. Google Scholar | |
|
Strang, Gilbert . 1980. Linear Algebra and Its Applications. 2nd ed. New York: Academic Press. Google Scholar | |
|
Von Hippel, Paul T. 2007. “Regression with Missing Ys: An Improved Strategy for Analyzing Multiply Imputed Data.” Sociological Methodology 37:83–117. Google Scholar | SAGE Journals | ISI | |
|
White, Ian R., Royston, Patrick, Wood, Angela M.. 2011. “Multiple Imputation Using Chained Equations: Issues and Guidance for Practice.” Statistics in Medicine 30:377–99. Google Scholar | Crossref | Medline | ISI | |
|
Young, Rebekah, Johnson, David R.. 2010. “Imputing the Missing Y’s: Implications for Survey Producers and Survey Users.” Paper presented at 64th Annual Conference of the American Association for Public Opinion Research, May 13-16, Chicago. Google Scholar |
